Data Domain: DD9800 - voltage is faulty - alert
Summary: Data Domain DD9800 Revision2 - Troubleshooting Alert: The storage processor (SP) has failed - Voltage is faulty.
Symptoms
Due to an overlapping bitmap entry, DD9800 Revision 2 may indict an SP due to "Voltage is Faulty", when the real issue is a CPU External IERR.
When both events (CPU External IERR and "Voltage is faulty") occur simultaneously, it is safe to ignore the voltage alert, and focus troubleshooting on why the CPU encountered a CPU IERR. For more details on troubleshooting CPU IERR errors on DD9800 platforms, contact your Support provider for assistance.
Affected Systems:
- DD9800 Rev 2
- DDOS 5.7.x / 6.0.x / <6.1.3.0
Symptoms:
Posted Alert
=========
Time: Tue Mar 27 07:19:53 2018 Alert Id: p0-57 Event Id: EVT-ENVIRONMENT-00032 Event Message: The storage processor has failed Object: Enclosure=1 Additional Information: Cause=Voltage is faulty "Voltage is faulty" event found in messages.engineering log =============================================== Mar 15 05:48:27 platmon: CRITICAL: The storage processor has failed. Enclosure=1 Cause="Voltage is faulty" Mar 15 05:48:28 platmon: INFO: Event posted: p0-275 (11000113:28521xxxx): EVT-ENVIRONMENT-00032: The storage processor has failed EVT-OBJ::Enclosure=1 EVT-INFO::Cause=Voltage is faulty Mar 15 05:48:28 platmon: INFO: _ems_post_pubsub_event: Publishing event for alert EVT-ENVIRONMENT-00032
CPU IERR event found in bios.txt log =============================== 1 | 03/15/2018 | 05:33:39 | SMI Critical Interrupt Events Enter_SMI | SMI Critical Interrupt | Asserted | Used AUX Log (LSB 0x0) Used AUX Log (MSB 0x0) 2 | 03/15/2018 | 05:33:41 | CPU Status Events CPU2_Status | CPU IERR | Asserted | CPU External IERR 3 | 03/15/2018 | 05:33:41 | Entering IERR Interrupt Events Enter_SMI | IERR Interrupt | Asserted | Used AUX Log (LSB 0x24) Used AUX Log (MSB 0x0) 4 | 03/15/2018 | 05:33:42 | BMC Chassis Ctrl Events BMC_Chassis_Ctrl | Reset through BMC | Asserted 5 | 03/15/2018 | 05:34:04 | Power Unit DC_State | State Asserted | Deasserted
Cause
This issue is found in DDOS versions 5.7 , 6.0, & 6.1
The root cause is SP fault bitmap is overlapped, so when an IERR event happens, the warning message incorrectly displays "voltage is faulty"
#define APL_FRU_FAULT_SP_CPUMISC (1 << 17)#define APL_FRU_FAILEDMAP_VOLTFAULT_SP (1 << 17)