Unsolved
This post is more than 5 years old
1 Rookie
•
5 Posts
0
42512
Random Reboot R740
Hello,
We currently have a Windows server 2016 Datacenter server failover cluster with two PowerEdge R740 nodes.
The hardware configuration of each node is as follows:
2x Intel (R) Xeon (R) Silver 4116 CPU @ 2.10GHz Model 85 Stepping 4
RAM 196608 MB
Nvidia Tesla M60 Video Card
SAS connection with a PowerVault® 3420 SAN
Video cards are used in Discrete Device Assignment by virtual machines
We encounter a problem of brutal random reboot of nodes without error message in logs other than an event id 41 Kernel-Power "The system has rebooted without cleanly shutting down first".
EventData
BugcheckCode 0
BugcheckParameter1 0x0
BugcheckParameter2 0x0
BugcheckParameter3 0x0
BugcheckParameter4 0x0
SleepInProgress 0
PowerButtonTimestamp 0
BootAppStatus 0
Checkpoint 0
ConnectedStandbyInProgress false
SystemSleepTransitionsToOn 0
CsEntryScenarioInstanceId 0
The reboot of the nodes is not simultaneous and occurs in a totally random way.
We have no errors in hardware testing and no explicit events in Open Manage.
Do you have any idea what caused this problem ?
Best Regards
DELL-Daniel My
Moderator
Moderator
•
6.2K Posts
0
November 7th, 2017 10:00
Hello
If there are no errors or warnings then look at what happened just before the system shut down. Check the hardware log at the time of the shutdown. The hardware log should state what initiated the shutdown. If there is nothing in the hardware log that states what initiated the shutdown then this is a hardware issue.
Thanks
Eluich
1 Rookie
1 Rookie
•
5 Posts
0
November 7th, 2017 11:00
Hi Daniel
Thank you for your update
When you said "Check the hardware log at the time of the shutdown", how I can verify the hardware log ? By Open Mange Essential, iDRAC,...?
Thank you in advance for your answer
Best Regards
Eluich
1 Rookie
1 Rookie
•
5 Posts
0
November 7th, 2017 12:00
When the reboot occurs, i have only this hardware logs:
OEM software event.
C: boot completed.
So that means there's a hardware problem ?
Best Regards
DELL-Daniel My
Moderator
Moderator
•
6.2K Posts
0
November 7th, 2017 12:00
It is under the log section of the iDRAC. It is called the System Event Log in the iDRAC. It is not the same as the operating system's System Event Log. In OpenManage it is listed as the Hardware Log.
Eluich
1 Rookie
1 Rookie
•
5 Posts
0
November 7th, 2017 23:00
On iDRAC, in Lifecycle Logs i have this event before the reboot
dafoxx
1 Rookie
1 Rookie
•
48 Posts
0
November 8th, 2017 07:00
Just an idea but set the power options in the bios to Max proformance
Does it hapen when the when the GPUs/ systems are underloading?
Eluich
1 Rookie
1 Rookie
•
5 Posts
0
November 8th, 2017 07:00
Hi
Today I made the changes in the BIOS configuration because the servers had Watt Performance Optimization Profile Settings (DAPC) as the profile settings.
Now the configuration of each node is in custom mode with maximum performance and disables C1E and C-state options.
I hope that will solve the problem.
Best Regards
DELL-Daniel My
Moderator
Moderator
•
6.2K Posts
0
November 8th, 2017 08:00
No, those are normal messages that occur during system shutdown and startup. You need to review all of the software and hardware logs and cross-reference them at the time the events occur. Until you find something in the logs or diagnostics to indicate what is happening it is just guess work.
Make sure you turn off automatic recovery on failure in the operating system. If the OS is faulting it automatically restarts the system by default.
Thanks
dafoxx
1 Rookie
1 Rookie
•
48 Posts
0
November 8th, 2017 09:00
Assumeing the nodes are windows Vm's? if so, set those to high power mode in the OS AND host OS, are they on the latest firmware?
Also look at the Idrac power/graph see if the systems are useing too much power.
Fabrice TATON
1 Message
0
November 20th, 2017 06:00
Same issue with a single T430 / Windows Server 2016. No hardware errors, sometime the server reboots two or three time within 5 minutes, sometimes it is ok for ours. For the moment , i just installed the OS, no users, no activity !!!
DELL support asked for hardware test, no problem, so no other answer.
dafoxx
1 Rookie
1 Rookie
•
48 Posts
0
November 20th, 2017 07:00
Assuming the firmware is on the latest?
Do you have any add-in cards?
try running the system on OS power optermised or max proformance.
Install open manage
System > main System Chassie
Power managment.
Managment > Profile
Choose
OS power control then apply.
icee_mike
1 Message
0
January 30th, 2018 07:00
Eluich,
I am having the same problem with R740s that are randomly rebooting. These are out of the box servers that I've applied the latest.drivers from Dell's website. Did changing the BIOS config to max performance resolve your issue?
Thanks,
Mike
PowerEdgeR740
8 Posts
0
March 14th, 2018 03:00
We have the same issues on a Citrix cluster of R740's (dual Xeon 6136/128GB)
BIOS: 1.3.7
iDRAC 3.15.17.15
The strange thing is there is like no BSOD or critical in the eventlog on the host. There is also no load. We can't find a way to trigger it since it happens randomly, even an hour of 3dsmax/vray rendering wont do the job.
We took them out of our production environment for now.
ThelmaCottage
2 Posts
0
March 15th, 2018 10:00
Same problem here with a R430. Only started yesterday - reboots for no reason. No indication in logs at all. Not sure if this is a coincidence but did coincide with a Windows update ?
sfederowicz
1 Message
0
March 22nd, 2018 05:00
I have the same problem with a R740 that was just added to a Hyper-V failover cluster. I currently have a case open with Dell.