ECS: RAP015: temperatursvigt; Symptomkode: 2010
Summary: En temperatursensor på noden, der rapporterer, at et kritisk niveau er nået.
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
En temperatursensor har registreret en temperatur, der er over en kritisk tærskel.
En komponent fungerer muligvis ikke korrekt, hvilket får en temperatursensor til at rapportere, at et kritisk niveau er nået.
En temperatursensor på noden rapporterer, at et kritisk niveau er nået.
En komponent fungerer muligvis ikke korrekt, hvilket får en temperatursensor til at rapportere, at et kritisk niveau er nået.
En temperatursensor på noden rapporterer, at et kritisk niveau er nået.
Cause
Der er opstået et problem, der fik en temperatursensor til at gå over et kritisk niveau.
Resolution
For Gen2 skal du rulle ned til bunden.
Gen3-hardware:
1. Kontroller status for temperatursensorerne ved hjælp af cs_hal på den rapporterede node.
Kommando:
Gen3-hardware:
1. Kontroller status for temperatursensorerne ved hjælp af cs_hal på den rapporterede node.
Kommando:
#cs_hal sensors temp
Eksempel: For Gen3 er der kun tre Temp-sensorer som følger.
admin@n1-mgmt:~> cs_hal sensors temp Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 53 Degrees Celsius Processor Temperature Temp OK 54 Degrees Celsius System Board Temperature Inlet Temp CRIT 40 Degrees Celsius; above critical threshold System Board Temperature Exhaust Temp OK 50 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. admin@n1-mgmt:~>
2. Kontroller alle noder i racket, og se, om andre noder rapporterer , at temp-sensoren ikke er "OK"
Kommando:
Eksempel: I dette eksempel rapporterer flere noder i den øverste halvdel af racket indløbstemperaturen for høj.
3. Mulige scenarier:
4. Vigtigt: Brug https://central.dell.com/case-lookup/ og slå PSNT (Product Serial Number Tag) op for at kontrollere historikken. Kontroller, hvor mange forekomster der er sket i løbet af de sidste 3-6 måneder. Kontroller, om problemet var vedvarende og påvirkede flere noder, eller hvis et helt rack er påvirket med indløbstemperatur, der er højere end normalt, indikerer dette et vedvarende miljøproblem, der skal løses. Luk ikke sagen som en dublet, medmindre der er en klar handlingsplan og konklusioner for at løse temperaturproblemet.
5. Hvis PE-teamet ikke finder et problem, eller hvis historikken indeholder mange forekomster fra den samme advarsel (i 3 måneder eller mere), skal du rådføre dig med en L2 over Swarm og forberede dig på at arbejde, bestille en CE for at gennemgå miljøforholdene for det berørte rack og noder.
Kommando:
viprexec -i cs_hal sensors temp
Eksempel: I dette eksempel rapporterer flere noder i den øverste halvdel af racket indløbstemperaturen for høj.
admin@n1-mgmt:~> viprexec -i cs_hal sensors temp Output from host : 192.168.219.1 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 53 Degrees Celsius Processor Temperature Temp OK 53 Degrees Celsius System Board Temperature Inlet Temp CRIT 40 Degrees Celsius; above critical threshold System Board Temperature Exhaust Temp OK 50 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.2 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 47 Degrees Celsius Processor Temperature Temp OK 49 Degrees Celsius System Board Temperature Inlet Temp CRIT 39 Degrees Celsius; above critical threshold System Board Temperature Exhaust Temp OK 50 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.3 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 46 Degrees Celsius Processor Temperature Temp OK 46 Degrees Celsius System Board Temperature Inlet Temp OK 35 Degrees Celsius System Board Temperature Exhaust Temp OK 47 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.4 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 48 Degrees Celsius Processor Temperature Temp OK 50 Degrees Celsius System Board Temperature Inlet Temp OK 35 Degrees Celsius System Board Temperature Exhaust Temp OK 47 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.5 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 48 Degrees Celsius Processor Temperature Temp OK 50 Degrees Celsius System Board Temperature Inlet Temp WARN 38 Degrees Celsius; above non-critical threshold System Board Temperature Exhaust Temp OK 49 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.6 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 50 Degrees Celsius Processor Temperature Temp OK 52 Degrees Celsius System Board Temperature Inlet Temp CRIT 39 Degrees Celsius; above critical threshold System Board Temperature Exhaust Temp OK 51 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.7 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 45 Degrees Celsius Processor Temperature Temp OK 48 Degrees Celsius System Board Temperature Inlet Temp OK 36 Degrees Celsius System Board Temperature Exhaust Temp OK 47 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.8 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 51 Degrees Celsius Processor Temperature Temp OK 49 Degrees Celsius System Board Temperature Inlet Temp OK 31 Degrees Celsius System Board Temperature Exhaust Temp OK 43 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.9 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 52 Degrees Celsius Processor Temperature Temp OK 51 Degrees Celsius System Board Temperature Inlet Temp OK 30 Degrees Celsius System Board Temperature Exhaust Temp OK 42 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.10 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 54 Degrees Celsius Processor Temperature Temp OK 51 Degrees Celsius System Board Temperature Inlet Temp OK 28 Degrees Celsius System Board Temperature Exhaust Temp OK 41 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. 192.168.219.7 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 45 Degrees Celsius Processor Temperature Temp OK 48 Degrees Celsius System Board Temperature Inlet Temp OK 36 Degrees Celsius System Board Temperature Exhaust Temp OK 47 Degrees Celsius Output from host : 192.168.219.11 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 56 Degrees Celsius Processor Temperature Temp OK 55 Degrees Celsius System Board Temperature Inlet Temp OK 27 Degrees Celsius System Board Temperature Exhaust Temp OK 40 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.12 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 59 Degrees Celsius Processor Temperature Temp OK 59 Degrees Celsius System Board Temperature Inlet Temp OK 26 Degrees Celsius System Board Temperature Exhaust Temp OK 38 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.13 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 51 Degrees Celsius Processor Temperature Temp OK 49 Degrees Celsius System Board Temperature Inlet Temp OK 26 Degrees Celsius System Board Temperature Exhaust Temp OK 36 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.14 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 57 Degrees Celsius Processor Temperature Temp OK 60 Degrees Celsius System Board Temperature Inlet Temp OK 26 Degrees Celsius System Board Temperature Exhaust Temp OK 38 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.15 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 59 Degrees Celsius Processor Temperature Temp OK 59 Degrees Celsius System Board Temperature Inlet Temp OK 26 Degrees Celsius System Board Temperature Exhaust Temp OK 39 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : 192.168.219.16 Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 56 Degrees Celsius Processor Temperature Temp OK 56 Degrees Celsius System Board Temperature Inlet Temp OK 26 Degrees Celsius System Board Temperature Exhaust Temp OK 38 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. admin@n1-mgmt:~>
3. Mulige scenarier:
- En node, der kun rapporterer en sensor eller mere: Hvis problemet kun ses på én node, hvor temperaturen ikke rapporterer "OK", så indikerer dette sandsynligvis et delproblem, eller noden har ikke en god luftstrøm på grund af mere sandsynligt et internt problem end et rack-problem.
- Flere noder påvirkes, dette mere et miljøproblem i selve racket eller muligvis datacenteret
4. Kontroller, at blæserne kører OK. Hvis ikke, skal en ventilator muligvis udskiftes.
Kommando:
#cs_hal sensors fanEksempel:
admin@ecs:~>cs_hal sensors fan Output from host : 192.168.219.1 Entity Type Label Status Info ----- ----- ----- ----- ----- System Board Fan Fan1 OK 12600 RPM System Board Fan Fan2 OK 12600 RPM System Board Fan Fan3 OK 16920 RPM System Board Fan Fan4 OK 16800 RPM System Board Fan Fan5 OK 17040 RPM System Board Fan Fan6 OK 16920 RPM System Board Fan Fan Redundancy OK fully redundant; NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.3. Hvis alle blæsere rapporterer OK, betyder det, at der ikke er nogen problemer med blæsersystemerne. Kontakt Power Edge-teamet for at kontrollere, om en del skal udskiftes. Hvis nogen fans rapporterer problemer, skal du følge ECS: Ring hjem: ventilatorfejl; Symptomkode: 2008
4. Vigtigt: Brug https://central.dell.com/case-lookup/ og slå PSNT (Product Serial Number Tag) op for at kontrollere historikken. Kontroller, hvor mange forekomster der er sket i løbet af de sidste 3-6 måneder. Kontroller, om problemet var vedvarende og påvirkede flere noder, eller hvis et helt rack er påvirket med indløbstemperatur, der er højere end normalt, indikerer dette et vedvarende miljøproblem, der skal løses. Luk ikke sagen som en dublet, medmindre der er en klar handlingsplan og konklusioner for at løse temperaturproblemet.
5. Hvis PE-teamet ikke finder et problem, eller hvis historikken indeholder mange forekomster fra den samme advarsel (i 3 måneder eller mere), skal du rådføre dig med en L2 over Swarm og forberede dig på at arbejde, bestille en CE for at gennemgå miljøforholdene for det berørte rack og noder.
Gen 2:
1. Kontroller status for temperatursensorerne ved hjælp af cs_hal.
Eksempel:
# cs_hal sensors temp Entity Type Label Status Info ----- ----- ----- ----- ----- System Board Temperature SSB Therm Trip OK System Board Temperature BB Inlet Temp OK 32 Degrees Celsius CPU (DCMI Compat) Temperature HSBP Temp OK -222 Degrees Celsius System Board Temperature SSB Temp OK 60 Degrees Celsius System Board Temperature BB BMC Temp OK 51 Degrees Celsius System Board Temperature P1 VR Temp OK 38 Degrees Celsius System Board Temperature IB Temp OK 46 Degrees Celsius System Board Temperature Exit Air Temp OK 54 Degrees Celsius Front Panel Temperature IOM Temp OK 43 Degrees Celsius Drive Backplane Temperature HSBP PSOC OK 37 Degrees Celsius Front Panel Temperature LAN NIC Temp OK 67 Degrees Celsius Power Supply Temperature PS1 Temperature OK 34 Degrees Celsius Power Supply Temperature PS2 Temperature OK 34 Degrees Celsius Processor Temperature P1 Therm Margin OK 216 Degrees Celsius Processor Temperature P2 Therm Margin OK 206 Degrees Celsius Processor Temperature P1 Therm Ctrl % OK 0 Unspecified Processor Temperature P2 Therm Ctrl % OK 0 Unspecified Processor Temperature P1 DTS Therm Mgn OK 216 Degrees Celsius Processor Temperature P2 DTS Therm Mgn OK 206 Degrees Celsius Processor Temperature P1 VRD Hot OK Processor Temperature P2 VRD Hot OK System Board Temperature DIMM Thrm Mrgn 1 OK 201 Degrees Celsius System Board Temperature DIMM Thrm Mrgn 2 OK 200 Degrees Celsius System Board Temperature DIMM Thrm Mrgn 3 OK 198 Degrees Celsius System Board Temperature DIMM Thrm Mrgn 4 OK 197 Degrees Celsius System Board Temperature Agg Thrm Mgn 1 OK 233 Degrees Celsius
2. Følg de samme trin som Gen 3 (men rapportér ikke til PowerEdge), flere detaljer vil blive opdateret til Gen2 i fremtiden.
Affected Products
ECS ApplianceProducts
ECS ApplianceArticle Properties
Article Number: 000046763
Article Type: Solution
Last Modified: 30 Apr 2024
Version: 6
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.