ECS: xDoctor: RAP163: Evento Crítico de Memória do Sistema
Summary: Ocorreu um Evento Crítico de Memória do Sistema que necessita de revisão e substituição da DIMM.
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
O xDoctor comunica um Evento Crítico de Memória do Sistema a necessitar de revisão.
------------------------------------
ERROR - Critical System Memory Event
------------------------------------
Node = Nodes
Extra = {'Nodes': {'169.254.1.1': ['Memory #0x02 - Uncorrectable ECC (UnCorrectable ECC | DIMMB1) (06/10/2023 08:45:16)', 'Memory #0x03 - Uncorrectable ECC (UnCorrectable ECC | DIMMB1) (06/10/2023 08:45:16)', 'Memory Mmry ECC Sensor - Correctable ECC (11/26/2015 12:38:51)']}}
RAP = RAP163
Solution = KB 215723
Timestamp = 2023-07-10_170539
PSNT = CKMXXXXXXXXXXX @ 4.8-92.0Cause
Nota: Se algum dos DIMMs estiver em falta ou aparecer um evento Incorrigível nos registos de eventos do sistema (SEL), os DIMMs têm de ser substituídos.
- Verifique os logs SEL para confirmar se há erros incorrigíveis no nó.
Comando: (Comando remoto)
# sudo ipmitool -H <iDrac IP> -U root -P passwd -I lanplus sel elist
Comando: (Nó Local)
# sudo ipmitool sel elist
Exemplo:
admin@node1:~> sudo ipmitool -H 192.XXX.2XX.107 -U root -P passwd -I lanplus sel elist 1 | 12/04/2021 | 07:29:19 | Event Logging Disabled SEL | Log area reset/cleared | Asserted 2 | 12/29/2021 | 23:00:29 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted 3 | 01/26/2022 | 11:44:08 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted 4 | 08/03/2022 | 18:31:45 | Power Supply PS Redundancy | Redundancy Lost | Asserted 5 | 08/03/2022 | 18:31:48 | Power Supply Status | Power Supply AC lost | Asserted 6 | 08/03/2022 | 18:43:14 | Power Supply Status | Power Supply AC lost | Deasserted 7 | 08/03/2022 | 18:43:22 | Power Supply PS Redundancy | Fully Redundant | Asserted 8 | 08/03/2022 | 18:51:27 | Power Supply PS Redundancy | Redundancy Lost | Asserted 9 | 08/03/2022 | 18:51:27 | Power Supply Status | Power Supply AC lost | Asserted a | 08/03/2022 | 19:02:03 | Power Supply Status | Power Supply AC lost | Deasserted b | 08/03/2022 | 19:02:14 | Power Supply PS Redundancy | Fully Redundant | Asserted c | 01/19/2023 | 05:38:27 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted d | 02/06/2023 | 02:10:25 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted e | 03/02/2023 | 17:12:15 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted f | 05/09/2023 | 15:56:41 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMA1) | Asserted 10 | 05/09/2023 | 17:16:16 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted 11 | 05/09/2023 | 20:57:41 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMA1) | Asserted 12 | 05/09/2023 | 20:59:25 | Unknown #0x2e | | Asserted 13 | 05/09/2023 | 20:59:25 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMB1) | Asserted 14 | 05/11/2023 | 05:43:34 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted 15 | 06/10/2023 | 08:43:26 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMA1) | Asserted 16 | 06/10/2023 | 08:45:16 | Unknown #0x2e | | Asserted 17 | 06/10/2023 | 08:45:16 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMA1) | Asserted 18 | 06/10/2023 | 08:45:16 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMB1) | Asserted
- Confirme se existem DIMMs em falta devido ao evento.
Comando:
# sudo dmidecode -t memory | grep "Locator\|Size" | grep -v "Cache\|Volatile\|Cache\|Logical\|Bank"
Exemplo:
admin@node1:~> sudo dmidecode -t memory | grep "Locator\|Size" | grep -v "Cache\|Volatile\|Cache\|Logical\|Bank"
Size: No Module Installed <-- DIMM is missing
Locator: A1
Size: 16384 MB
Locator: A2
Size: No Module Installed
Locator: A3
Size: No Module Installed
Locator: A4
Size: No Module Installed
Locator: A5
Size: No Module Installed
Locator: A6
Size: No Module Installed
Locator: A7
Size: No Module Installed
Locator: A8
Size: 16384 MB
Locator: B1
Size: 16384 MB
Locator: B2
Size: No Module Installed
Locator: B3
Size: No Module Installed
Locator: B4
Resolution
Recolha as saídas dos comandos acima e abra um pedido de serviço fazendo referência à 215723 KB para rever a DIMM do servidor para substituição.
Se a DIMM tiver sido substituída com sucesso, a versão 4.8.92.0 ou superior do xDoctor requer a limpeza do SEL no nó afetado. Ele interrompe outros alertas sobre essa entrada de log.
Se a DIMM tiver sido substituída com sucesso, a versão 4.8.92.0 ou superior do xDoctor requer a limpeza do SEL no nó afetado. Ele interrompe outros alertas sobre essa entrada de log.
Exemplo - Limpar o Registo de Eventos do Sistema (SEL):
Consulte o iDRAC para obter o registo de eventos do sistema e confirme se o erro está presente na saída.
Lembre-se, antes de limpar a verificação SEL, para qualquer outro erro que precise ser resolvido. Além disso, salve o log em /var/log/hardware conforme descrito em KB 49569.
Neste exemplo, 192.168.219.101 corresponde ao IP do iDRAC do nó 1:
admin@provo~> ipmitool -I lanplus -H 192.168.219.101 -U root -P passwd sel list 1 | 01/06/2022 | 04:34:58 | Event Logging Disabled #0x72 | Log area reset/cleared | Asserted 2 | 02/03/2022 | 17:15:21 | Physical Security #0x73 | General Chassis intrusion () | Asserted 3 | 02/03/2022 | 17:15:28 | Physical Security #0x73 | General Chassis intrusion () | Deasserted 4 | 08/18/2023 | 01:44:01 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMA1) | Asserted
Limpe o SEL:
admin@provo:~> ipmitool -I lanplus -H 192.168.219.101 -U root -P passwd sel clear Clearing SEL. Please allow a few seconds to erase.
Valide se a lista foi limpa:
admin@provo~> ipmitool -I lanplus -H 192.168.219.101 -U root -P passwd sel list 1 | 08/30/2023 | 12:56:55 | Event Logging Disabled #0x72 | Log area reset/cleared | Asserted
Affected Products
ECS Appliance Gen 3Products
ECS ApplianceArticle Properties
Article Number: 000215723
Article Type: Solution
Last Modified: 30 May 2024
Version: 7
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.