PowerEdge: CPU Machine Check Errors

Summary: This article provides information about CPU Machine Check errors and common causes and proper handling when errors are seen.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

What are CPU Machine Check Errors?

On PowerEdge servers and leveraging solutions that use standard BIOS and iDRAC firmware, machine checks are captured into the system event log (SEL).
These entries are also reflected in the Lifecycle Controller log (LCL) under various Enhanced Error Message Initiative (EEMI) event codes.

Event code Event message
CPU0011 Uncorrectable machine check exception detected on CPU #
CPU0012 Correctable machine check exception detected on CPU #
CPU0704 CPU # machine check detected
UEFI0076 One or more corrected machine check errors have occurred
UEFI0078 One or more machine check errors occurred in the previous boot

 

 

 

Log Examples:

2022-10-22 22:12:35    506    CPU9000    An OEM diagnostic event occurred.
2022-10-22 22:12:34    505    CPU9000    An OEM diagnostic event occurred.
2022-10-22 22:12:33    504    CPU9000    An OEM diagnostic event occurred.
2022-10-22 22:12:31    503    CPU0704    CPU 2 machine check error detected.
2022-10-22 22:12:31    502    UEFI0078   One or more Machine Check errors occurred in the previous boot.

 

2025-05-21 03:42:32    320    CPU9000    An OEM diagnostic event occurred.
2025-05-21 03:42:30    319    CPU0704    CPU 1 machine check error detected.
2025-05-21 03:42:29    318    PST0090    A problem was detected related to the previous server boot.
2025-05-21 03:42:29    317    UEFI0078   One or more Machine Check errors occurred in the previous boot.

 

2021-09-02 16:02:18    712    UEFI0078   One or more Machine Check errors occurred in the previous boot.
2021-09-02 16:02:18    711    CPU0000    Internal error has occurred check for additional logs.

 


Cause

 

Understanding Causes of CPU Machine Check Errors

CPU Machine Check Errors (MCEs) have multiple possible causes, ranging from hardware to software triggers. These errors can be attributed to various factors, including:

  • BIOS Firmware or CPU Microcode
  • Motherboard CPLD Firmware
  • Memory Errors
  • PCIE Fatal Bus Errors
  • OS Crash or Software and Driver Faults (BSOD, PSOD, or Kernel Panics)
  • CPU Faults

The hardware logs can be used to help identify possible causes by checking if other component errors accompany the CPU Machine Check Errors.

 

Example CPU MCEs triggered from a Memory Error:
CPU MCE error caused by DIMM error

 

CPU MCE with DIMM error on newer servers

 

Example CPU MCE triggered from a Fatal Bus Error:
CPU MCE seen with a fatal BUS error

 

Example CPU MCE triggered from an OS crash:
CPU MCE with OS crash error

 


Resolution

 

General guidance

It is always helpful to ask these questions:

  • Have there been recent changes to the system, like updates or changes to hardware or configuration?
  • Are there other errors in the logs nearby that may be more informative than the machine check itself?
  • How frequently does the machine check happen? Was it a one-off? Can it be readily reproduced?
  • Are there environmental factors involved, such as specific workloads or power and thermal scenarios?

 

Firmware and drivers

Outdated or incompatible firmware and drivers are among the most common machine check culprits, as they work together to implement and control device behavior. So it is essential to review the versions being used as part of assessing any machine check investigation.

 

Among firmware, BIOS updates are critical:

  • Most BIOS releases incorporate updates provided by the respective processor vendor, many of which include explicit fixes for machine checks.
  • These UEFI updates for servers include microcode, reference code, and other module updates that control functionality including all reliability, availability, and serviceability (RAS) features among others.
  • Simultaneously, do not overlook other firmware in the system.
  • Virtually any device in the system may be the culprit, including on rare occasion the iDRAC. 

 

Identifying and Resolving CPU Machine Check Errors

To identify CPU Machine Check Errors, start by checking the hardware logs Lifecycle (LC) or System Event Log (SEL) from the IDRAC directly or gather a TSR or SupportAssist Collection to review the logs.

Look to see if the CPU MCE errors are preceded by any other errors and if they are focus troubleshooting on those components.

 

Troubleshooting Steps

  • Update all available firmware and monitor the results for any changes in error behavior.
  • If only one CPU is showing errors, swap the CPUs to determine if the error follows the CPU to the other socket.
  • If the MCE is triggered from another components error, focus the troubleshooting on that component.
    • Check what components are controlled by the CPU with the MCE.
    • For example: If it is a CPU1 MCE, check all risers and PCIE slots that are controlled by CPU1 and any devices installed in those slots, as well as memory on CPU1 side, check all A-DIMMs for errors.
    • To verify which CPU controls each riser or slot see the Servers Installation and Service Manual and look under Installing and removing system components > Expansion cards and expansion card risers > Expansion card installation guidelines.
    • For more information about identifying which CPU controls the risers or slots see: PowerEdge: Troubleshooting PCIe device detection issues
  • To rule out OS-related MCE triggers, test outside of the OS to see if the errors are still triggered outside of the OS.

    Run Stress Tests In Support Live Image

    Duration: 00:02:38 (hh:mm:ss)
    When available, closed caption (subtitles) language settings can be chosen using the CC icon on this video player.

 

Affected Products

PowerFlex rack, C Series, HS Series, Modular Infrastructure, Rack Servers, Tower Servers, XE Servers, XR Servers, OEM Server Solutions, PowerFlex appliance R650, PowerFlex appliance R6525, PowerFlex appliance R660, PowerFlex appliance R6625 , Powerflex appliance R750, PowerFlex appliance R760, PowerFlex appliance R7625, PowerFlex appliance R860, PowerFlex appliance R640, PowerFlex appliance R740XD, PowerFlex appliance R7525, PowerFlex appliance R840 ...
Article Properties
Article Number: 000349127
Article Type: Solution
Last Modified: 25 Jul 2025
Version:  4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.