darrka
1 Nickel

PowerEdge C6100 unexpected reboot / power cut off

Hello

We have 3, few years old C6100 machines with 4 nodes in each.

The problem is unexpected reboot of the node in every server. Time between reboots could be from 12 hours to 45 days. Some nodes rebooting more often like others. But all nodes are doing the same job – transcoding mpeg2 to h264

Reboots happening no matter:

Servers are in different data centers.
With or without UPS.
One or two PSU is enabled
IPMI is enabled or disabled
No matter which bios settings we choose (default, and various custom)
No matter which OS is installed (Debian, Centos)
No matter CPU load 15% to 80%

Server room temperature is stable ~20C, and power voltage is stable.
IPMI sensors show that everything is OK.

NO LOGS are left before rebooting. I mean OS not informing about ANY unusual activity. It just rebooting.

Problem exists from the beginning when we bought second hand servers. And now is 1,5 year. We have try dozens of different ideas but without any luck.

Finally we have manage to get some logs from IPMI

  25 | 06/09/2016 | 23:37:38 | System ACPI Power State ACPI Pwr State | Legacy OFF state | Asserted
(Power OFF)
  26 | 06/09/2016 | 23:37:48 | System ACPI Power State ACPI Pwr State | Legacy ON state | Asserted
(Power ON)
  27 | 06/09/2016 | 23:38:23 | Unknown #0x81 |  | Asserted
  28 | 06/09/2016 | 12:52:51 | System Event #0x85 | OEM System boot event | Asserted
(OS booted)

How can power off to one of 4 nodes? From one of the many servers in Data center?

Additional info:

Servers nodes are with dual Intel(R) Xeon(R) CPU  X5650  @ 2.67GHz , RAM~ 16GB
OS: Debian GNU/Linux 8
Soft: ffmpeg version 2.2 or ffmpeg version 3.0.2

Board Product         : PowerEdge
 Board Serial          : CN0D61XP7475117E0332A02
 Board Part Number     : 2817NP2203
 Product Serial        : (removed for tos)

 Board Product         : PowerEdge
 Board Serial          : CN0D61XP7475117E0097A02
 Board Part Number     : 2817NP2015
 Product Serial        : (removed for tos)

 Board Product         : PowerEdge
 Board Serial          : CN0D61XP7475106H0010A00
 Board Part Number     : VL06MP0756
 Product Serial        : (removed for tos)

0 Kudos
7 Replies
Moderator
Moderator

RE: PowerEdge C6100 unexpected reboot / power cut off

Hi,

Is it always the same nodes that are having this issue? It doesn’t seem to show any other hardware errors. Do you have debug logging enabled in the OS? 

Thanks,
Josh Craig
Dell EMC Enterprise Support Services
Get support on Twitter @DellCaresPRO
0 Kudos
darrka
1 Nickel

RE: PowerEdge C6100 unexpected reboot / power cut off

Good question.
This happening on ALL nodes in all servers

0 Kudos
Moderator
Moderator

RE: PowerEdge C6100 unexpected reboot / power cut off

With it happening on multiple systems in multiple locations and all of the other things you have tried it most likely is not a hardware issue. Are you able to try a different encoder? 

Thanks,
Josh Craig
Dell EMC Enterprise Support Services
Get support on Twitter @DellCaresPRO
0 Kudos
darrka
1 Nickel

RE: PowerEdge C6100 unexpected reboot / power cut off

I have tried many versions of ffmpeg. In 3 year , but with no luck..   I also tried changing Power management option in BIOS - the results the same.

The logging is enabled on OS, but there is no log related to random reboot. It seems, that power is off and then on  :

  25 | 06/09/2016 | 23:37:38 | System ACPI Power State ACPI Pwr State | Legacy OFF state | Asserted 
(Power OFF) 
  26 | 06/09/2016 | 23:37:48 | System ACPI Power State ACPI Pwr State | Legacy ON state | Asserted 

maybe this is the reason why I cant see the logs.. 

0 Kudos
Moderator
Moderator

RE: PowerEdge C6100 unexpected reboot / power cut off

Are you able to take any of the systems out of production to do additional testing?

Thanks,
Josh Craig
Dell EMC Enterprise Support Services
Get support on Twitter @DellCaresPRO
0 Kudos
darrka
1 Nickel

RE: PowerEdge C6100 unexpected reboot / power cut off

Yes, I can and I'm ready. 

0 Kudos
Moderator
Moderator

RE: PowerEdge C6100 unexpected reboot / power cut off

I would try running with just one node and see if it still reboots. You may also want to try to boot to our live image and see if that reboots. www.dell.com/.../DriversDetails

Thanks,
Josh Craig
Dell EMC Enterprise Support Services
Get support on Twitter @DellCaresPRO
0 Kudos