Start a Conversation

Unsolved

This post is more than 5 years old

189486

February 2nd, 2016 02:00

CPU error on startup

Hi, we upgraded R720 server with 192GB RAM each stick is 16GB and dual E5-2680

previously it had no errors and now on reboot/startup it shows these errors

CPU0704 CPU 1 machine check error detected. Power cycle system

Non-Recoverable. CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted

we updated the Bios and all firmware on the server

we tested the RAM, did a stability test, it those not show any even tried another  E5-2680 and still sometimes it gives these errors

5 Practitioner

 • 

274.2K Posts

February 2nd, 2016 08:00

Hello

CPU0704 CPU 1 indicates machine check error in which the System event log and operating system logs may indicate that the exception is external to the processor. Attempt the following steps:

1. Check system and operating system logs for exceptions.

2. Turn system off and remove power for one minute.

3. Ensure CPU 1 is seated correctly.

4. Reapply input power and turn system on and looks out for the error.

6 Posts

February 2nd, 2016 08:00

Hi,

as I mentioned above we removed a processor and even tried replacing with an another one

I reviewed the logs and there are no warning or critical alerts

5 Practitioner

 • 

274.2K Posts

February 2nd, 2016 15:00

There are a bunch issues both software and hardware that may also trigger the machine check error. I suggest that you run a DSET report and attach to the e-mail that I sent you. The report will enable us to review all hardware associated ESM logs.

5 Practitioner

 • 

274.2K Posts

February 3rd, 2016 15:00

The review of the DSET report does not point us to any hardware or software issues. The general system status health is good! Now just to clarify a few things; did you upgrade both the memory and processors? If you upgraded memory only, on what slots did you populate the DIMMs? It is most likely that it is the new memory triggering Machine Check error.

6 Posts

February 3rd, 2016 23:00

we upgraded both the CPU and the RAM, they are populated in A1-A6 and B1-B6

16.00 GB Dual Rank 1600 MHz

5 Practitioner

 • 

274.2K Posts

February 4th, 2016 08:00

Thank you. Attempt re-seating the processors, all the DIMMs and PCI components including risers. If the error persists, swap A DIMMs with B DIMMs and ensure that they are firmly seated in place. If no change swap the processors and see whether there is no change. This will help us to  isolate the issue.

Let me know the results above.

6 Posts

February 11th, 2016 07:00

Hi, we rechecked the CPU, RAM, raiser board, same thing, we upgraded another R720 with the same processors and same ram and get no errors on that node

5 Practitioner

 • 

274.2K Posts

February 11th, 2016 09:00

It points that the CPU and RAM are pretty good. Try using the same raiser cards on this other R720.

5 Practitioner

 • 

274.2K Posts

February 11th, 2016 12:00

In addition, notice that CPU 1 controls Risers 2& 3 with their associated PCI devices. In particular based on the DSET, you have  a QLogic Fiber Chanel Adapter on PCI5 on riser 2. See diagram below:

Move these Risers together with the PCI5 device, i.e QLogic Adapter to the other R720 as well.

6 Posts

February 18th, 2016 06:00

the error is still appearing without the QLogic Fiber Chanel Adapter

we cant switch the raiser boards as it's only a two node cluter

No Events found!

Top