We have an R730 that has been powercycling on it's own with a recurring amber front LCD with a CPU0704 CPU1(and CPU2) Machine check error detected. Power cycle system" message.
This first occured after the initial Windows Server 2012r2 load, Dell BIOS/Drivers/Firmware updates, McAfee, and Windows updates. We initially saw this once a week and it has occured more and more frequently, currently several times a day.
In the Windows Systems Event log these are registering as kernel power failures and the Lifecycle Controller logs show multiple(dozens) instances of "An OEM diagnostic event occured." within a couple of seconds preceded by the same CPU1/CPU2 machine check error.
Upon contacting Dell support we were asked to provide a DSET or TSR several times (despite a search returning that DSET has been retired and and TSR has been replaced with SupportAssist). Due to our environment limitations, the only way we are able to connect externally is via proxy and iDRAC doesn't support proxy configurations. We have tried running a Dell Tech recommended Support Live Image, and only to have our system reboot before being able to complete diagnostics.
Requesting CPU and motherboard replacements via Dispatch after referring efforts to the initial Service Request were denied asking us to update BIOS and selected firmware (which we did initially) and again asking for a DSET.
We're currently in communications to let support know we've already done this and feel like we've exhausted our options at this point. A search in forums indicate that BIOS updates or CPU replacements have worked in most cases -- as the BIOS is current. Just wondering if anyone else has come across this issue and has any recommendations.
Would you clarify the processors you have installed? Are they the Intel E5-26xx V4 or Intel E5-26xx V3 versions, if so would you verify in the BIOS if C states are also enabled?
Let me know and we can go from there.
Thanks for the response: They are the Intel E5-2643 v4 and we're using the default Performance per Watt (DAPC) System Profile setting -- C states are enabled.
Thank you. There is a BIOS update it sounds like is in the works to resolve this, but there is a temporary work around that we can try temporarily until the update comes out.
Try the following;
1. Change BIOS profile to Performance
2. Select Custom profile, set C State to disable and C1E to enable.
Let me know if that, for the time being, resolves the issue for you.
We are going through this exact same issue, on a T620 with dual Xeon E5-2630's. We are getting CPU0704 CPU2 Machine check error. I've have updated BIOS and firmware with no change, as well as change the performance setting in BIOS. It is rebooting very frequently...about every 40 to 90 minutes. Did you ever get a resolution on this? I've contacted support and sent a support assist log. They suggested swapping the CPUs to see if the error followed the CPU. I have not done this yet. But may have to if no other solution is known. Any information you have would be appreciated. Thanks in advance!
KaizenPhilo or Copper,
Do you have a solution now?
At the moment we have 5 servers (PowerEdge T640) with the same problem. "CPU machine check error" randomly on CPU1 or 2 about once a week, at random time. The server restarts without Windows logging. BIOS / Fireware / drivers are all up-to-date. Is a Windows 2016 server with Hyper-V enabled. BIOS profile = Performance (C-state is disabled).
Dell support has replaced three system cards. The problem exist.
It looks like the error is sometime come up when a virtual machine is starting up in the morning. I can’t reproduce the problem manual.
Dell is searching for a solution, but it is slow. Someone any idea?
I'm having/had the same issue on an R740, Server 2016 Standard with Hyper-V role. CPU Machine check error, OEM diagnostic events, and Intel Management Engine warnings.
2+ months into service call and we've replaced the motherboard twice, the PCI card (BOSS NVMe riser) twice, the cables once... Flea drains, cold boots, AC cycles, you name it, we've tried it.
We've cut down the CPU machine check errors but the OEM diagnostic and Intel M.E. events still come along together, and it looks like it's happening when the system hangs. I've seen it now several times where during a restart it will hang on Shutting down service: Hyper-V Virtual Machine Management.
Support just informed me of a new BIOS, 1.4.5, so I'm running my restart script on it and see what happens... the results just in, updating the BIOS re-generated the CPU machine check error (does this mean it didn't go away? Is it a throwback reference? Who knows) and then the system stopped on trying to shut down H-V VMM service.
We also update the BIOS to version 1.4.5 for five days now. Update also Windows 2016 to KB4103720. We have not seen the CPU machine check error till now. But it is too early to cheer.
Do you have the possibility to reproduce the problem manual? Reboot the machine is in our environment no problem. The CPU machine check error coming up only random on a running machine.
I know the waiting time for 'Hyper-V Virtual Machine Management' service to stop. But it is an event I have seen on a lot of machines.
When I have news, I will post it.
All servers are up for three weeks since the BIOS update 1.4.5
I must slowly believe that the BIOS update solve the problems.