ECS: xDoctor: RAP163: Critical System Memory Event
Summary: A Critical System Memory Event has occurred and is in need of review and DIMM replacement.
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
xDoctor reports a Critical System Memory Event in need of review.
------------------------------------
ERROR - Critical System Memory Event
------------------------------------
Node = Nodes
Extra = {'Nodes': {'169.254.1.1': ['Memory #0x02 - Uncorrectable ECC (UnCorrectable ECC | DIMMB1) (06/10/2023 08:45:16)', 'Memory #0x03 - Uncorrectable ECC (UnCorrectable ECC | DIMMB1) (06/10/2023 08:45:16)', 'Memory Mmry ECC Sensor - Correctable ECC (11/26/2015 12:38:51)']}}
RAP = RAP163
Solution = KB 215723
Timestamp = 2023-07-10_170539
PSNT = CKMXXXXXXXXXXX @ 4.8-92.0Cause
NOTE: If any of the DIMMs are missing or an Uncorrectable event shows in the system event logs (SEL), the DIMMs must be replaced.
- Check SEL logs to confirm that there are uncorrectable errors on the node.
Command: (Remote command)
# sudo ipmitool -H <iDrac IP> -U root -P passwd -I lanplus sel elist
Command: (Local Node)
# sudo ipmitool sel elist
Example:
admin@node1:~> sudo ipmitool -H 192.XXX.2XX.107 -U root -P passwd -I lanplus sel elist 1 | 12/04/2021 | 07:29:19 | Event Logging Disabled SEL | Log area reset/cleared | Asserted 2 | 12/29/2021 | 23:00:29 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted 3 | 01/26/2022 | 11:44:08 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted 4 | 08/03/2022 | 18:31:45 | Power Supply PS Redundancy | Redundancy Lost | Asserted 5 | 08/03/2022 | 18:31:48 | Power Supply Status | Power Supply AC lost | Asserted 6 | 08/03/2022 | 18:43:14 | Power Supply Status | Power Supply AC lost | Deasserted 7 | 08/03/2022 | 18:43:22 | Power Supply PS Redundancy | Fully Redundant | Asserted 8 | 08/03/2022 | 18:51:27 | Power Supply PS Redundancy | Redundancy Lost | Asserted 9 | 08/03/2022 | 18:51:27 | Power Supply Status | Power Supply AC lost | Asserted a | 08/03/2022 | 19:02:03 | Power Supply Status | Power Supply AC lost | Deasserted b | 08/03/2022 | 19:02:14 | Power Supply PS Redundancy | Fully Redundant | Asserted c | 01/19/2023 | 05:38:27 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted d | 02/06/2023 | 02:10:25 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted e | 03/02/2023 | 17:12:15 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted f | 05/09/2023 | 15:56:41 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMA1) | Asserted 10 | 05/09/2023 | 17:16:16 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted 11 | 05/09/2023 | 20:57:41 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMA1) | Asserted 12 | 05/09/2023 | 20:59:25 | Unknown #0x2e | | Asserted 13 | 05/09/2023 | 20:59:25 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMB1) | Asserted 14 | 05/11/2023 | 05:43:34 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted 15 | 06/10/2023 | 08:43:26 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMA1) | Asserted 16 | 06/10/2023 | 08:45:16 | Unknown #0x2e | | Asserted 17 | 06/10/2023 | 08:45:16 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMA1) | Asserted 18 | 06/10/2023 | 08:45:16 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMB1) | Asserted
- Confirm if there are DIMMs missing due to the event.
Command:
# sudo dmidecode -t memory | grep "Locator\|Size" | grep -v "Cache\|Volatile\|Cache\|Logical\|Bank"
Example:
admin@node1:~> sudo dmidecode -t memory | grep "Locator\|Size" | grep -v "Cache\|Volatile\|Cache\|Logical\|Bank"
Size: No Module Installed <-- DIMM is missing
Locator: A1
Size: 16384 MB
Locator: A2
Size: No Module Installed
Locator: A3
Size: No Module Installed
Locator: A4
Size: No Module Installed
Locator: A5
Size: No Module Installed
Locator: A6
Size: No Module Installed
Locator: A7
Size: No Module Installed
Locator: A8
Size: 16384 MB
Locator: B1
Size: 16384 MB
Locator: B2
Size: No Module Installed
Locator: B3
Size: No Module Installed
Locator: B4
Resolution
Collect the outputs from the above commands and open a service request referencing KB 215723 to review the server DIMM for replacement.
If the DIMM has been successfully replaced, xDoctor version 4.8.92.0 or higher requires clearing the SEL on the impacted node. It stops further alerts on this log entry.
If the DIMM has been successfully replaced, xDoctor version 4.8.92.0 or higher requires clearing the SEL on the impacted node. It stops further alerts on this log entry.
Example - Clearing the System Event Log (SEL):
Query the iDRAC to get the system event log and confirm that the error is present in the output.
Remember, before clearing the SEL check for any other error that needs to be addressed. Also, save the log to /var/log/hardware as described in KB 49569.
In this example 192.168.219.101 corresponds to the iDRAC IP of node 1:
admin@provo~> ipmitool -I lanplus -H 192.168.219.101 -U root -P passwd sel list 1 | 01/06/2022 | 04:34:58 | Event Logging Disabled #0x72 | Log area reset/cleared | Asserted 2 | 02/03/2022 | 17:15:21 | Physical Security #0x73 | General Chassis intrusion () | Asserted 3 | 02/03/2022 | 17:15:28 | Physical Security #0x73 | General Chassis intrusion () | Deasserted 4 | 08/18/2023 | 01:44:01 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMA1) | Asserted
Clear the SEL:
admin@provo:~> ipmitool -I lanplus -H 192.168.219.101 -U root -P passwd sel clear Clearing SEL. Please allow a few seconds to erase.
Validate the list was cleared:
admin@provo~> ipmitool -I lanplus -H 192.168.219.101 -U root -P passwd sel list 1 | 08/30/2023 | 12:56:55 | Event Logging Disabled #0x72 | Log area reset/cleared | Asserted
Affected Products
ECS Appliance Gen 3Products
ECS ApplianceArticle Properties
Article Number: 000215723
Article Type: Solution
Last Modified: 30 May 2024
Version: 7
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.