PowerEdge: What are the error messages for the M1000e Enclosure
Summary: Partial listing of M1000e chassis error messages, severity, and potential causes
Instructions
When the PowerEdge M1000e blade enclosure encounters a problem, an error message is displayed on the LCD screen or in the Chassis Management Controller (CMC) System Event Logs.
The following tables show possible error messages and their causes so that you can fix the error and clear the message.
CMC Status Screen error messages
|
Severity |
Error message |
Cause |
|
Critical |
CMC <number> Battery: Battery sensor for CMC failed was asserted. |
CMC CMOS battery is missing or has no voltage. |
|
Critical |
CMC <number> CPU Temp: Temperature sensor for CMC failure event |
CMC CPU temperature has exceeded the critical threshold. |
|
Critical |
CMC <number> Ambient Temp: Temperature sensor for CMC failure event |
CMC ambient temperature has exceeded the critical threshold. |
Enclosure/Chassis Status Screen error messages
|
Severity |
Error message |
Cause |
|
Critical |
Chassis Fan <number> presence: Fan sensor for Chassis Fan device removed was asserted. |
The removed fan is required for proper cooling of the enclosure/chassis. |
|
Critical |
Power Supply Redundancy: PS Redundancy sensor for Power Supply, redundancy lost was asserted. |
One or more power supply units (PSUs) has failed or been removed so the system is no longer redundant. |
|
Critical |
Power Supply Redundancy: PS Redundancy sensor for Power Supply, non-redundant: insufficient resources |
One or more PSU has failed or been removed and the system lacks enough power to maintain normal operations. Servers could power down. |
|
Critical |
Control Panel Temp: Temperature sensor for Control Panel, failure event |
The Chassis/Enclosure temperature exceeded the critical threshold. |
|
Critical |
CMC<number>Stand-alone: Micro Controller sensor for CMC, non-redundant was asserted. |
The CMC is no longer redundant. This message will only show if the standby CMC was removed or failed. |
|
Critical |
Chassis Eventlog CEL: Event log sensor for Chassis Eventlog, all event logging disabled was asserted. |
The CMC cannot log events when the event log sensor is disabled. The event log is disabled when it becomes full. Clearing the log re-enables event logging. |
|
Critical |
Chassis Eventlog CEL: Event log sensor for Chassis Eventlog, log full was asserted. |
The chassis device detects that only one entry can be added to the CEL before it is full. |
|
Warning |
Chassis Eventlog CEL: Event log sensor for Chassis Eventlog, log almost full was asserted. |
The chassis event log is 75% full. |
|
Warning |
Power Supply Redundancy: PS Redundancy sensor for Power Supply, redundancy degraded was asserted. |
One or more PSU has failed or been removed and the system can no longer support full PSU redundancy. |
Fan Status Screen error messages
|
Severity |
Error message |
Cause |
|
Critical |
Chassis Fan <number> Status: Fan sensor for Chassis Fan failure event. |
The speed of the specified fan is not sufficient to provide enough cooling to the system. |
IOM Status Screen error messages
|
Severity |
Error message |
Cause |
|
Critical |
I/O Module <number> Status: Module sensor for I/O Module, transition to critical from less severe was asserted. |
The I/O module has a fault. The same error can also happen if the I/O module is thermal-tripped. |
|
Warning |
I/O Module <number> Status: Module sensor for I/O Module, transition to non-critical from OK was asserted. |
The IO module has a fabric mismatch or a link tuning mismatch. |
iKVM Status Screen error messages
|
Severity |
Error message |
Cause |
|
Non-Recoverable |
Local KVM Health: Module sensor for Local KVM, transition to non-recoverable was asserted. |
The Serial RIP or USB host chip has failed.
|
|
Critical |
Local KVM Health: Module sensor for Local KVM, transition to critical from less severe was asserted. |
The USB host enumeration or OSCAR failure has failed. |
|
Warning |
Local KVM Health: Module sensor for Local KVM, transition to non-critical from OK was asserted. |
There has been a minor failure, such as corrupted firmware. |
PSU Status Screen error messages
|
Severity |
Message |
Cause |
|
Critical |
Power Supply PSU <number>: Power Supply sensor for Power Supply failure was asserted. |
The PSU has failed. |
|
Critical |
Power Supply PSU <number>: Power Supply sensor for Power Supply, input lost was asserted. |
The AC power cord has been unplugged or there has been a loss of AC power. |
Server Status Screen error messages for M1000e Blade servers
|
Severity |
Error message |
Cause |
|
Warning |
System Board Ambient Temp: Temperature sensor for System Board, warning event |
The server ambient temperature crossed a warning threshold. |
|
Critical |
System Board Ambient Temp: Temperature sensor for System Board, failure event
|
The server ambient temperature crossed a failing threshold. |
|
Critical |
System Board CMOS Battery: Battery sensor for System Board, failed was asserted. |
The CMOS battery is not present or has no voltage.
|
|
Warning |
System Board Current Monitor: Current sensor for System Board, warning event
|
The current crossed a warning threshold. |
|
Critical |
System Board Current Monitor: Current sensor for System Board failure event |
The current has crossed a failing threshold. |
|
Critical |
<voltage sensor name>: Voltage sensor for System Board, state asserted was asserted. |
The voltage is out of range. |
|
Critical |
CPU<number> Status: Processor sensor for CPU<number, IERR was asserted.
|
The CPU has failed. |
|
Critical |
CPU<number> Status: Processor sensor for CPU<number>, thermal tripped was asserted. |
The CPU has overheated. |
|
Critical |
CPU<number> Status: Processor sensor for CPU<number, configuration error was asserted. |
The processor is the incorrect type or in the wrong location. |
|
Critical |
CPU<number> Status: Processor sensor for CPU<number>, presence was deasserted. |
The required CPU is missing or not present. |
|
Critical |
System Board Video Riser: Module sensor for System Board device removed was asserted. |
The required module was removed.
|
|
Critical |
Mezz B Status: Add-in Card sensor for Mezz B, install error was asserted. |
The incorrect Mezzanine card is installed for I/O fabric. |
|
Critical |
Mezz C Status: Add-in Card sensor for Mezz C, install error was asserted. |
The incorrect Mezzanine card was installed for I/O fabric. |
|
Critical |
Backplane Drive <number>: Drive Slot sensor for backplane drive removed |
The storage drive was removed. |
|
Critical |
Backplane Drive <number>: Drive Slot sensor for backplane, drive fault was asserted. |
The storage drive failed. |
|
Critical |
System Board Fault Fail-Safe: Voltage sensor for System Board, state asserted was asserted. |
This event is generated when the system board voltages are not at normal levels. |
|
Critical |
System Board OS Watchdog: Watchdog sensor for System Board reboot was asserted. |
The iDRAC watchdog detected that the system has crashed (the timer expired because no response was received from host) and the action is set to reboot. |
|
Critical |
System Board OS Watchdog: Watchdog sensor for System Board, power off was asserted. |
The iDRAC watchdog detected that the system has crashed (the timer expired because no response was received from host) and the action is set to power off. |
|
Critical |
System Board OS Watchdog: Watchdog sensor for System Board power cycle was asserted. |
The iDRAC watchdog detected that the system has crashed (the timer expired because no response was received from Host) and the action is set to power cycle. |
|
Critical |
System Board SEL: Event log sensor for System Board, log full was asserted. |
The SEL device has detected that only one entry can be added to the SEL before it is full. |
|
Warning |
ECC Corr Err: Memory sensor, correctable ECC (<DIMM Location>) was asserted. |
Correctable ECC errors have reached a critical rate. |
|
Critical |
ECC Uncorr Err: Memory sensor, uncorrectable ECC (<DIMM Location>) was asserted. |
An uncorrectable ECC error was detected. |
|
Critical |
I/O Channel Chk: Critical Event sensor, I/O channel check NMI was asserted. |
A critical interrupt has been generated in the I/O Channel. |
|
Critical |
PCI Parity Err: Critical Event sensor, PCI PERR was asserted. |
A parity error was detected on the PCI bus. |
|
Critical |
PCI System Err: Critical Event sensor, PCI SERR (<Slot number or PCI Device ID>) was asserted. |
The device detected a PCI error. |
|
Critical |
SBE Log Disabled: Event log sensor, correctable memory error logging disabled was asserted. |
Single-bit error logging is disabled when too many SBEs are logged. |
|
Critical |
Logging Disabled: Event log sensor, all event logging disabled was asserted. |
All error logging is disabled. |
|
Non-Recoverable |
CPU Protocol Err: Processor sensor, transition to non-recoverable was asserted. |
The processor protocol has entered a nonrecoverable state. |
|
Non-Recoverable |
CPU Bus PERR: Processor sensor, transition to non-recoverable was asserted. |
The processor bus PERR has entered a nonrecoverable state. |
|
Non-Recoverable |
CPU Init Err: Processor sensor, transition to non-recoverable was asserted. |
The processor initialization has entered a nonrecoverable state. |
|
Non-Recoverable |
CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted. |
The processor machine check has entered a nonrecoverable state. |
|
Critical |
Memory Spared: Memory sensor redundancy lost (<DIMM Location>) was asserted. |
Memory spare is no longer redundant. |
|
Critical |
Memory Mirrored: Memory sensor redundancy lost (<DIMM Location>) was asserted. |
The mirrored memory is no longer redundant. |
|
Critical |
Memory RAID: Memory sensor redundancy lost (<DIMM Location>) was asserted. |
The RAID memory is no longer redundant. |
|
Critical |
Memory Cfg Err: Memory sensor configuration error (<DIMM Location>) was asserted. |
The memory configuration is incorrect for the system. |
|
Warning |
Mem Redun Gain: Memory sensor redundancy degraded (<DIMM Location>) was asserted. |
The memory redundancy is downgraded but not lost. |
|
Critical |
PCIE Fatal Err: Critical Event sensor, bus fatal error was asserted. |
A fatal error is detected on the PCI bus. |
|
Critical |
Chipset Err: Critical Event sensor, PCI PERR was asserted. |
A chip error is detected. |
|
Warning |
Mem ECC Warning: Memory sensor, transition to non-critical from OK (<DIMM Location>) was asserted. |
Correctable ECC errors have surpassed a normal rate. |
|
Critical |
Mem ECC Warning: Memory sensor, transition to critical from less severe (<DIMM Location>) was asserted. |
Correctable ECC errors have reached a critical rate. |
|
Critical |
System Board POST Err: POST sensor for System Board, POST fatal error <additional error information> was asserted. |
See the Dell PowerEdge M1000e EnclosureOwner’s Manual for additional error information on BIOS POST errors. |