Same issue with a single T430 / Windows Server 2016. No hardware errors, sometime the server reboots two or three time within 5 minutes, sometimes it is ok for ours. For the moment , i just installed the OS, no users, no activity !!!
DELL support asked for hardware test, no problem, so no other answer.
Assuming the firmware is on the latest?
Do you have any add-in cards?
try running the system on OS power optermised or max proformance.
Install open manage
System > main System Chassie
Managment > Profile
OS power control then apply.
I am having the same problem with R740s that are randomly rebooting. These are out of the box servers that I've applied the latest.drivers from Dell's website. Did changing the BIOS config to max performance resolve your issue?
We have the same issues on a Citrix cluster of R740's (dual Xeon 6136/128GB)
The strange thing is there is like no BSOD or critical in the eventlog on the host. There is also no load. We can't find a way to trigger it since it happens randomly, even an hour of 3dsmax/vray rendering wont do the job.
We took them out of our production environment for now.
Same problem here with a R430. Only started yesterday - reboots for no reason. No indication in logs at all. Not sure if this is a coincidence but did coincide with a Windows update ?
Same issue here, new R740 with Windows Server 2016 std w HyperV Role, all firmware and drivers updated and keeps crashing... The system doesnt have the update KB4088875, only see three updates: KB4088787, KB4049065 and KB3192137...
2018-03-22 13:27:25 SYS1003 System CPU Resetting.
2018-03-22 13:27:18 PWR2271 The Intel Management Engine has encountered a Exception Event.
2018-03-22 13:27:18 SYS1003 System CPU Resetting.
2018-03-22 13:27:18 SYS1000 System is turning on.
2018-03-22 13:27:10 SYS1001 System is turning off.
2018-03-22 13:27:10 SYS1003 System CPU Resetting.
2018-03-22 13:26:55 RAC0703 Requested system hardreset.
2018-03-22 13:26:54 CPU0000 Internal error has occurred check for additional logs.
same problem here, new R640 with Windows Server 2016 std, Hyper-V Cluster Node
everything up to date... nothing special in the iDRAC logs and neither in Windows Event Viewer
after a call with Dell Support, I should deinstall the Update KB4088875 and it could be a OS-problem...
I changed the profile in BIOS to performance... do not know if that really solves the problem...
Since the iDRAC log pointed to CPU 0000 internal errors, we've changed our system profile to Performance now which disables de C1E/C states of the CPU's. Seems like the HLT instruction triggered those reboots.
The R740's are running fine now for about 2 weeks, a bit early to call it victory yet but at least a start