Dlm: DLMV0198E Error Correctable Machine Check Error Troubleshooting
요약: The DLmV0198E alert signals a machine check on a CPU sensor, pointing to a fault in the Central Processing Unit (CPU) or Advanced Programmable Interrupt Controller (APIC), and may affect memory. ...
증상
DLmV0198E: Processor sensor CPUMachineCheck - Correctable machine check error (CPU 2 | APIC ID 32 ) Asserted
DLmV0198E: Processor sensor CPUMachineCheck - Correctable machine check error (CPU 1 | APIC ID 4 ) Asserted
And so on
원인
A hardware error is issued, and an alert is generated under error message DLmV0198E.
해결
The respective VTE System event log must be reviewed for errors. The DLmV0198E error message is generic and does not always reveal the actual failure.
You can check the error using RACADM or by connecting to the VTE<
If you can connect to the unit by command line or Putty, the following command should reveal one or more errors.
racadm getsel
Or
ipmitool sel list
Errors are from the same date as the alert.
Record: 17 Date/Time: 03/08/2020 05:19:04 Source: system Severity: Critical Description: Correctable memory error logging disabled for a memory device at lo cation DIMM_B1.
The above example shows a memory card (DIMM) error.
Customers: if you are notified of this alert message, check to see that a call home alert was sent to Dell through the Support site. This error does generate a call home, and if a case has already been created, the Dell Support staff follows up on the alert. It usually takes less than an hour to receive a call home alert of this nature.
If you have lost access to the unit, or the DLm is not mounting tapes, contact DLm Support immediately.
Support staff: Troubleshoot the alert using the information obtained from the System event log. If you are unable to connect to the command line, attempt to connect to the iDRAC interface to see if the unit can be power cycled. If the unit does not respond to iDRAC commands, send a field resource on site to replace the VTE, if needed.
If the unit is accessible, gather the necessary information to troubleshoot the alert.