Opgailey
Bronze

Re: Random Reboot R740

Same thing is happening to me with one of my three new R740s. All 3 machines are nodes of a 2012 R2 failover cluster, running about 15 VMs.

Node 1 and 2 have been perfectly stable for a week. Node 3 has randomly rebooted twice in the last 7 days, both at different times of day (1am and 9pm). No Windows dump file, nothing in event viewer other than Event-Power event ID 41 and then events relating to restart. KB4088875 not installed.

It's not our UPS as all 3 R740s run off the same UPS and this issue is only affecting one of them.

OpenManage Server Admin hardware / ESM log just shows: 'OEM software event' and 'C: boot completed'.

I don't have the iDRAC configured. I see others reporting CPU related errors via their iDRAC logs. Before I try changing my BIOS System Profile to 'Performance' and disabling C1E/C states, I'd like to know if I am receiving these CPU errors as well. 

Does the iDRAC log show more information than the OMSA ESM / hardware log?

UPDATE:

I enabled the iDRAC and am receiving the same CPU errors as others.

2018-04-04 01:11:38 SYS1001 System is turning off.
2018-04-04 01:11:38 SYS1003 System CPU Resetting.
2018-04-04 01:11:21 RAC0703 Requested system hardreset.
2018-04-04 01:11:20 CPU0000 Internal error has occurred check for additional logs.

UPDATE 2:

I changed our system profile to 'Performance' (which disables C1E/C states of the CPUs), as others have recommended earlier in this thread. 

I think this fix has done the trick. 6 days without any reboots. Fingers crossed it stays this way.

0 Kudos
tabletrtd
Copper

Re: Random Reboot R740

Opgailey,
hello friend! we have the same trouble. please tell me, node (after turn on max perf and off c1e) still works yet without any reboots? if yes then how days already?
0 Kudos

Re: Random Reboot R740

The Performance mode shows different bahaviour on older BIOS versions so be sure you're on the latest version.

For example:

v1.1.7 only disables C1E state

v1.3.7 disables both C and C1E states

Running stable for about 5 weeks now

0 Kudos
Opgailey
Bronze

Re: Random Reboot R740


@tabletrtdwrote:
Opgailey,
hello friend! we have the same trouble. please tell me, node (after turn on max perf and off c1e) still works yet without any reboots? if yes then how days already?

 

 

I can confirm that since I made this change, our 3 x R740 servers (acting as failover cluster nodes) have been stable. No more random reboots. 

Stable for almost a month now. Smiley Happy

0 Kudos
Opgailey
Bronze

Re: Random Reboot R740

Just another update in case anyone comes across this thread and wants to know.

I can confirm that since I made this change, our 3 x R740 servers have remained stable. No more random reboots. 

Perfectly stable for 6 months now.

Re: Random Reboot R740

I am also experiencing this with two R740s in a failover Cluster with 10 VMs, both servers are rebooting randomly and this has been affecting the cluster. I have changed the profile to "Performance" as recommended and have my fingers crossed that it will work.
kimse
Copper

Re: Random Reboot R740

update - no random reboots since ‎03-23-2018 after changing to performance