PowerVault ME4: Incorrect temperature warning alerts on ME4084/ME484
Summary: Alerts received are about CPU and sideplane but ME4 events log only indicate sideplane ambient temp as high. Apparently the alert about CPU temp high was a false positive alert. Resolve any environmental cooling issues and update to firmware GT280R010-01 or later. ...
Symptoms
The following event alerts sent from the PowerVault ME4
An Enclosure Management Processor (EMP) reported an alert condition on a temperature sensor. Type: fault. (enclosure: 0, WWN: 500C0FF0F1A7C63C) temperature sensor for CPU Temperature-Ctlr A, sensor status: Over temperature failure, temperature: 37 C An Enclosure Management Processor (EMP) reported an alert condition on a temperature sensor. Type: degraded. (enclosure: 0, WWN: 500C0FF0F1A7C63C) temperature sensor for Ambient Temp, Left Sideplane, sensor status: Over temperature warning, temperature: 34
Analysis by Dell technical support of the support bundle (store log) reports an issue with only the sideplane, there is no CPU temp issue:
Sample log entries:
– GEM CLI command: envctrl_zone Zone Group Card Name Location Temperature Threshold State 0 0 Common Ambient Sp0:0 32.000 33 OK 1 0 Common Ambient Sp2:0 36.000 33 HIGH zone 0 name : Ambient location : Sp0:0 currentTemperature : 34.000 faultStates.generatedFault : 0 faultStates.detectedFault : 1 faultStates.generatedPredictedFail: 0 faultStates.detectedPredictedFail : 0 faultStates.elementSpecificFaults : 0x8 faultStateIgnoreMask.bitmask : 0x0 modifiedCriticalColdTemperature : 3 modifiedCriticalHotTemperature : 37 modifiedWarningColdTemperature : 7 modifiedNormalTemperature : 20 modifiedWarningHotTemperature : 33 modifiedShutdownTemperature : 8388607 zone 1 name : Ambient location : Sp2:0 currentTemperature : 37.000 faultStates.generatedFault : 0 faultStates.detectedFault : 1 faultStates.generatedPredictedFail: 0 faultStates.detectedPredictedFail : 0 faultStates.elementSpecificFaults : 0x8 faultStateIgnoreMask.bitmask : 0x0 modifiedCriticalColdTemperature : 3 modifiedCriticalHotTemperature : 37 modifiedWarningColdTemperature : 7 modifiedNormalTemperature : 20 modifiedWarningHotTemperature : 33 modifiedShutdownTemperature : 8388607
Event logs:
03/15 13:29:56.359985 C:0 EVENT W ENCL TEMP ALERT EncID 0 SP 2 Ambient Ovr Tmp W 33C WARNING 03/15 14:58:13.245895 C:0 EVENT R ENCL TEMP ALERT EncID 0 SP 2 Ambient OK NO ERROR 03/15 14:59:43.244220 C:0 EVENT W ENCL TEMP ALERT EncID 0 SP 2 Ambient Ovr Tmp W 33C WARNING 03/15 15:06:54.758190 C:0 EVENT R ENCL TEMP ALERT EncID 0 SP 2 Ambient OK NO ERROR 03/15 16:32:00.000242 C:0 EVENT W ENCL TEMP ALERT EncID 0 SP 2 Ambient Ovr Tmp W 34C WARNING 03/15 16:37:22.123440 C:0 EVENT W ENCL TEMP ALERT EncID 0 SP 0 Ambient Ovr Tmp W 34C WARNING
Cause
Resolution
Dell has confirmed this is an issue and has fixed the incorrect sensor mapping in controller firmware GT280R010-01.
These high temperature events are usually due to high ambient inlet temperature or insufficient cooling or due to blockage of outlet air. Check the operating environment and rectify these conditions. For more information about environmental operating specifications, see the Dell PowerVault ME4 Series Storage System Owner’s Manual.
Release Notes: FMW-48174 Event mapping for temperature alert events was incorrect.