ECS: RAP015: Falla de temperatura; Código del síntoma: 2010

Summary: Se alcanzó un sensor de temperatura en el nodo que indica que se alcanzó un nivel crítico.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

Un sensor de temperatura detectó una temperatura superior a un umbral crítico.
Es posible que un componente no esté funcionando correctamente y provoque que un sensor de temperatura informe que se ha alcanzado un nivel crítico.
Un sensor de temperatura en el nodo informa que se alcanzó un nivel crítico.

Cause

Se produjo un problema que causó que un sensor de temperatura superara un nivel crítico.

Resolution

Para Gen2, desplácese hasta la parte inferior.

Hardware de 3.ª generación: 

1. Compruebe el estado de los sensores de temperatura mediante cs_hal en el nodo informado.

Comando: 
#cs_hal sensors temp
 
Ejemplo: Para Gen3, solo hay tres sensores de temperatura, como se indica a continuación.
 
admin@n1-mgmt:~>  cs_hal sensors temp
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      53 Degrees Celsius
Processor         Temperature         Temp              OK      54 Degrees Celsius
System Board      Temperature         Inlet Temp        CRIT    40 Degrees Celsius; above critical threshold
System Board      Temperature         Exhaust Temp      OK      50 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.
admin@n1-mgmt:~>
2.Compruebe todos los nodos del rack y vea si otros nodos informan que el sensor de temperatura no está "correcto"

Comando: 
viprexec -i  cs_hal sensors temp

Ejemplo: En este ejemplo, varios nodos en la mitad superior del rack informan que la temperatura de entrada es demasiado alta. 
admin@n1-mgmt:~> viprexec -i  cs_hal sensors temp

Output from host : 192.168.219.1
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      53 Degrees Celsius
Processor         Temperature         Temp              OK      53 Degrees Celsius
System Board      Temperature         Inlet Temp        CRIT    40 Degrees Celsius; above critical threshold
System Board      Temperature         Exhaust Temp      OK      50 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.2
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      47 Degrees Celsius
Processor         Temperature         Temp              OK      49 Degrees Celsius
System Board      Temperature         Inlet Temp        CRIT    39 Degrees Celsius; above critical threshold
System Board      Temperature         Exhaust Temp      OK      50 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.3
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      46 Degrees Celsius
Processor         Temperature         Temp              OK      46 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      35 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      47 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.4
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      48 Degrees Celsius
Processor         Temperature         Temp              OK      50 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      35 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      47 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.5
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      48 Degrees Celsius
Processor         Temperature         Temp              OK      50 Degrees Celsius
System Board      Temperature         Inlet Temp        WARN    38 Degrees Celsius; above non-critical threshold
System Board      Temperature         Exhaust Temp      OK      49 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.6
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      50 Degrees Celsius
Processor         Temperature         Temp              OK      52 Degrees Celsius
System Board      Temperature         Inlet Temp        CRIT    39 Degrees Celsius; above critical threshold
System Board      Temperature         Exhaust Temp      OK      51 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.7
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      45 Degrees Celsius
Processor         Temperature         Temp              OK      48 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      36 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      47 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.8
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      51 Degrees Celsius
Processor         Temperature         Temp              OK      49 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      31 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      43 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.9
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      52 Degrees Celsius
Processor         Temperature         Temp              OK      51 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      30 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      42 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.10
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      54 Degrees Celsius
Processor         Temperature         Temp              OK      51 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      28 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      41 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.
 192.168.219.7
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      45 Degrees Celsius
Processor         Temperature         Temp              OK      48 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      36 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      47 Degrees Celsius

Output from host : 192.168.219.11
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      56 Degrees Celsius
Processor         Temperature         Temp              OK      55 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      27 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      40 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.12
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      59 Degrees Celsius
Processor         Temperature         Temp              OK      59 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      26 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      38 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.13
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      51 Degrees Celsius
Processor         Temperature         Temp              OK      49 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      26 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      36 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.14
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      57 Degrees Celsius
Processor         Temperature         Temp              OK      60 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      26 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      38 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.15
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      59 Degrees Celsius
Processor         Temperature         Temp              OK      59 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      26 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      39 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.

Output from host : 192.168.219.16
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
Processor         Temperature         Temp              OK      56 Degrees Celsius
Processor         Temperature         Temp              OK      56 Degrees Celsius
System Board      Temperature         Inlet Temp        OK      26 Degrees Celsius
System Board      Temperature         Exhaust Temp      OK      38 Degrees Celsius

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.
admin@n1-mgmt:~>

3. Escenarios posibles:
  1. Un nodo solo informa un sensor o más: Si el problema se observa en un solo nodo en el que se informa que la temperatura no es "correcta", lo más probable es que esto indique un problema de la parte o que el nodo no tiene un buen flujo de aire debido a que es más probable que se trate de un problema interno que de un problema del rack.
  2. Varios nodos se ven afectados, ya que se trata más de un problema ambiental dentro del rack en sí o posiblemente del centro de datos


4. Compruebe que los ventiladores funcionen bien. De lo contrario, es posible que deba reemplazar un ventilador.

Comando:

#cs_hal sensors fan
Ejemplo: 
admin@ecs:~>cs_hal sensors fan

Output from host : 192.168.219.1
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
System Board      Fan                 Fan1              OK      12600 RPM
System Board      Fan                 Fan2              OK      12600 RPM
System Board      Fan                 Fan3              OK      16920 RPM
System Board      Fan                 Fan4              OK      16800 RPM
System Board      Fan                 Fan5              OK      17040 RPM
System Board      Fan                 Fan6              OK      16920 RPM
System Board      Fan                 Fan Redundancy    OK      fully redundant;

NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.
3. Si todos los ventiladores informan OK, esto significa que no hay problemas con los sistemas de ventiladores. Comuníquese con el equipo de Power Edge para comprobar si una parte se debe reemplazar. Si algún ventilador informa un problema, siga ECS: Dial Home: Falla del ventilador; Código del síntoma: Año 2008

4. Importante: Utilice https://central.dell.com/case-lookup/  y busque la PSNT (etiqueta de número de serie del producto) para comprobar el historial. Compruebe cuántas ocurrencias ocurrieron en los últimos 3 a 6 meses. Compruebe si el problema era persistente y afectaba a varios nodos o si todo un rack se ve afectado con una temperatura de entrada superior a la normal, esto indica un problema ambiental persistente que se debe resolver. No cierre el caso como duplicado, a menos que haya un plan de acción y conclusiones claras para resolver el problema de temperatura. 

5. Si el equipo de PE no encuentra un problema o si el historial contiene muchas instancias de la misma alerta (durante 3 meses o más), consulte con un L2 en enjambre y prepárese para trabajar y ordenar un CE a fin de revisar las condiciones ambientales del rack y los nodos afectados. 
 
2.ª generación: 
 
1. Compruebe el estado de los sensores de temperatura mediante cs_hal.
Ejemplo:
# cs_hal sensors temp
Entity            Type                Label             Status  Info
-----             -----               -----             -----   -----
System Board      Temperature         SSB Therm Trip    OK
System Board      Temperature         BB Inlet Temp     OK      32 Degrees Celsius
CPU (DCMI Compat) Temperature         HSBP Temp         OK      -222 Degrees Celsius
System Board      Temperature         SSB Temp          OK      60 Degrees Celsius
System Board      Temperature         BB BMC Temp       OK      51 Degrees Celsius
System Board      Temperature         P1 VR Temp        OK      38 Degrees Celsius
System Board      Temperature         IB Temp           OK      46 Degrees Celsius
System Board      Temperature         Exit Air Temp     OK      54 Degrees Celsius
Front Panel       Temperature         IOM Temp          OK      43 Degrees Celsius
Drive Backplane   Temperature         HSBP PSOC         OK      37 Degrees Celsius
Front Panel       Temperature         LAN NIC Temp      OK      67 Degrees Celsius
Power Supply      Temperature         PS1 Temperature   OK      34 Degrees Celsius
Power Supply      Temperature         PS2 Temperature   OK      34 Degrees Celsius
Processor         Temperature         P1 Therm Margin   OK      216 Degrees Celsius
Processor         Temperature         P2 Therm Margin   OK      206 Degrees Celsius
Processor         Temperature         P1 Therm Ctrl %   OK      0 Unspecified
Processor         Temperature         P2 Therm Ctrl %   OK      0 Unspecified
Processor         Temperature         P1 DTS Therm Mgn  OK      216 Degrees Celsius
Processor         Temperature         P2 DTS Therm Mgn  OK      206 Degrees Celsius
Processor         Temperature         P1 VRD Hot        OK
Processor         Temperature         P2 VRD Hot        OK
System Board      Temperature         DIMM Thrm Mrgn 1  OK      201 Degrees Celsius
System Board      Temperature         DIMM Thrm Mrgn 2  OK      200 Degrees Celsius
System Board      Temperature         DIMM Thrm Mrgn 3  OK      198 Degrees Celsius
System Board      Temperature         DIMM Thrm Mrgn 4  OK      197 Degrees Celsius
System Board      Temperature         Agg Thrm Mgn 1    OK      233 Degrees Celsius
2. Siga los mismos pasos para la 3.ª generación (pero no informe a PowerEdge). Se actualizarán más detalles para la 2.ª generación en el futuro. 

Affected Products

ECS Appliance

Products

ECS Appliance
Article Properties
Article Number: 000046763
Article Type: Solution
Last Modified: 30 Apr 2024
Version:  6
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.