Solution:
Generate a CMC Logs report.
Refer to
Obtaining logs from the M1000E chassis management controller (CMC).
You would see the following in the racdump log:
# racadm getfanreqinfo
[Ambient Temperature Fan Request %]
38
[Server Module Fan Request Table]
<Slot#> |
<Server Name> |
<Blade Type> |
<Power State> |
<Presence> |
<Fan Request%> |
1 |
LVDEDESXIP1A |
N/A |
N/A |
Not Present |
N/A |
2 |
LVESXVDIIP1B |
N/A |
N/A |
Not Present |
N/A |
3 |
LVESXVDIIP1C |
N/A |
N/A |
Not Present |
N/A |
4 |
LVESXVDIIP1D |
N/A |
N/A |
Not Present |
N/A |
5 |
LVESXVDIIP1E |
PowerEdge M620 |
|
Present |
38 |
6 |
LVESXVDIIP1F |
PowerEdge M620 |
|
Present |
38 |
7 |
LVESXVDIIP1G |
PowerEdge M620 |
|
Present |
38 |
8 |
LVESXVDIIP1H |
PowerEdge M620 |
|
Present |
38 |
9 |
LVESXVDIIP1I |
PowerEdge M620 |
|
Present |
38 |
10 |
LVESXVDIIP1J |
PowerEdge M620 |
|
Present |
38 |
11 |
SLOT-11 |
N/A |
N/A |
Not Present |
N/A |
12 |
SLOT-12 |
N/A |
N/A |
Not Present |
N/A |
13 |
LVESXVDIIP1M |
PowerEdge M620 |
|
Present |
38 |
14 |
LVESXVDIIP1N |
PowerEdge M620 |
|
Present |
38 |
15 |
LVESXVDIIP1O |
PowerEdge M620 |
|
Present |
38 |
16 |
LVESXVDIIP1AP |
PowerEdge M620 |
|
Present |
38 |
[Switch Module Fan Request Table]
<IO> |
<Name> |
<Type> |
<Presence> |
<Fan Request%> |
Switch-1 |
MXL 10/40GbE |
10 GbE KR |
Present |
30 |
Switch-2 |
MXL 10/40GbE |
10 GbE KR |
Present |
83 |
Switch-3 |
MXL 10/40GbE |
10 GbE KR |
Present |
58 |
Switch-4 |
MXL 10/40GbE |
10 GbE KR |
Present |
30 |
Switch-5 |
Dell Ethernet Pass-Through |
Gigabit Ethernet |
Present |
30 |
Switch-6 |
Dell Ethernet Pass-Through |
Gigabit Ethernet |
Present |
30 |
Do not replace HW for this issue. This alone does not indicate that there is an issue.
The MXL/IOA starts requesting higher fan speed when it crosses its high temp of approximately 76C and will not stop requesting the increased fan speed until it drops below 76C and then will not start reducing fan speed until the temp drops below 60C.
IOM Health 1
Temperature <= 60C - At or below normal operating temperature.
CMC reaction Fan speed reduced 4% every 20s.
IOM Health 2
Temperature 61 … 75C - Normal operating temperature.
CMC reaction No changes to fan speed.
IOM Health 3
Temperature 76 … 83C - Elevated operating temperature, more cooling needed.
CMC reaction Fan speed increased 5% every 5s.
IOM Health 4
Temperature 84 … 85C - Critical temperature, max cooling needed.
CMC reaction Fan speed increased 20% every 5s.
IOM Health 5
Temperature >= 86C - System over temperature, thermal trip condition.
CMC reaction Fan speed at 100% PWM, and IOM will shut down after 5 s.
When MXL or IOA is inserted in the chassis, reseated, or when CMC reboots, it normally goes through a learning process to find the fan speed that provides temperature stability for the IOM. This learning process causes intended oscillations in fan speed, and the chassis may go to 80% or even 100% PWM 1 or 2 times before stabilizing. The learning process normally takes 20 - 30 minutes to complete, but sometimes it can take up to 1 hour, due to interference from Server blade requests.
Sometimes the customer has concerns that MXL/IOA installed in different chassis are stable at different fan speeds. The comparison of fan speed of different IOMs can be meaningful only under a strict set of conditions.
To make such a comparison, the IOMs need to have the same:
- Room temperature
- Slot installed
- number and type of external modules that are installed in the MXL/IOA
- number of active internal and external links
- number and type of fans installed
- number and type of neighboring IOMs active
- number and type of Server blades active
- presence or absence of dummies in empty slots
- Traffic
All these factors affect the generation and dissipation of heat in the MXL/IOA, therefore affecting the cooling that is needed to achieve temperature stability.