ECS:系统检测到节点上有高温
Summary: 如果我收到一封电子邮件警报,通知我系统检测到节点上的高温传感器读数,我可以检查什么?
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Instructions
-
确认警报节点是什么硬件。
admin@node1:~> sudo xdoctor -x Telegraf Version: 3.8.0.2-1549.73c8abc2 Fabric Version: 3.8.0.2-4347.d30cd09 Fabric-Zookeeper Version: 3.8.0.2-120.b4a1c5c Utilities Version: 3.7.0.4-1166.b78f3fe Influxdb Version: 3.8.0.2-1549.73c8abc2 Grafana Version: 3.8.0.2-1549.73c8abc2 Syslog Version: 3.8.0.2-4347.d30cd09 Service Version: 9.0.0.0-22840.479b013c74 Os Version: 3.8.0.2-2113.3fa664c.3 Fluxd Version: 3.8.0.2-1549.73c8abc2 Throttler Version: 3.8.0.2-1549.73c8abc2 Object Image Version: 3.8.0.2-138636.7343cd5c2c3 -------------------- ECS Version: 3.8.0.2 -------------------- HW Gen : 2 HW Model: U-Series HW Code : S2600KP ------------------------- xDoctor Version: 4.8-98.0 -------------------------对于第 1/2 代节点,请回复电子邮件中需要帮助的表单。对于第 3 代节点,请按照本知识库文章的其余部分进行操作。
-
检查温度传感器的当前状态。在下面,我们在其中两个节点上看到“CRIT”,指示这两个节点上的问题。如果所有节点报告为“OK”,但最近多次收到此警报,则这可能是一个反复出现的问题。如果是这样,请回复电子邮件中需要帮助的表单,并定期发出温度警报。
admin@node1:~> viprexec -i cs_hal sensors temp Output from host : xxx.xxx.xxx.xxx Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 53 Degrees Celsius Processor Temperature Temp OK 53 Degrees Celsius System Board Temperature Inlet Temp CRIT 40 Degrees Celsius; above critical threshold System Board Temperature Exhaust Temp OK 50 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : xxx.xxx.xxx.xxx Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 47 Degrees Celsius Processor Temperature Temp OK 49 Degrees Celsius System Board Temperature Inlet Temp CRIT 39 Degrees Celsius; above critical threshold System Board Temperature Exhaust Temp OK 50 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. Output from host : xxx.xxx.xxx.xxx Entity Type Label Status Info ----- ----- ----- ----- ----- Processor Temperature Temp OK 46 Degrees Celsius Processor Temperature Temp OK 46 Degrees Celsius System Board Temperature Inlet Temp OK 35 Degrees Celsius System Board Temperature Exhaust Temp OK 47 Degrees Celsius NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information. ... ... ...
如果多个节点报告为不正常,则可能是数据中心环境中的问题。检查 ECS 所在的区域中是否没有可能导致 ECS 温度升高的问题。
-
检查 ECS 风扇的状态。
admin@ecs:~>cs_hal sensors fan Output from host : xxx.xxx.xxx.xxx Entity Type Label Status Info ----- ----- ----- ----- ----- System Board Fan Fan1 OK 12600 RPM System Board Fan Fan2 OK 12600 RPM System Board Fan Fan3 OK 16920 RPM System Board Fan Fan4 OK 16800 RPM System Board Fan Fan5 OK 17040 RPM System Board Fan Fan6 OK 16920 RPM System Board Fan Fan Redundancy OK fully redundant; NOTE: on Axum and EX-series, use "sudo -i racadm getsensorinfo" to obtain sensor information.
-
回复电子邮件中需要帮助的表单,包括温度传感器输出和风扇输出。
Affected Products
ECSArticle Properties
Article Number: 000227188
Article Type: How To
Last Modified: 30 Jul 2024
Version: 2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.