PowerEdge Hardware General

2 Bronze

Sudden Noisy Fans

Hello all,

I have a DELL PowerEdge 6850 Server. Suddenly this morning my fans went on full speed without any reason (No heat problems, server room is well conditioned).

The server is running normally, only the fans are at full speed and noisy. Updated my BIOS but this didn't solve the problem.

Please advice a.s.a.p

Replies (8)
Moderator
Moderator

Naeimsaad,

If there isn't an error being reported regarding a fan failure, then I would suggest you start with updating the BMC that's located here. The reason being is that the fans are primarily controlled by the BMC, so if it falls behind then it can cause the fans to go to maximum. 

Let me know if this resolves the issue.


Chris Hawk
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!
2 Bronze

Thank you for your fast response. I will try this solution and get you the feedback.

3 Zinc

I can understand if a hardware fault occurs on say a motherboard temp sensor being picked up by BMC firmware which then sets the fan speed to full blast so as to protect the equipment much the same way as if a fan fails and BMC sets the fans to full blast to protect the equipment for overheating.

What i have difficulty understanding is how, in the absence of any hardware or firmware updates, the fans go full blast and the solution is to update System & BMC firmware in the hope that something magic will happen.

In my view, if there were no firmware or hardware changes and the fans simply went nuts with no apparent faults being detected, then i'd say there is some logic fault in System and/or BMC that Dell should investigate and correct.

All too often the solution seem to be "update firmware' and in many instances this may resolve (or simply mask) the problem. In other instances we read firmware updates result in bricks and subsequent hardware replacement being needed.

I just wish the change logs associated with firmware updates were much more detailed so we can see what has been corrected, changed, etc and dependancies with other components clearly identified. That way we may clearly understand if an update will indeed resolve the issues we are having.....

7 Thorium

I agree with you in that I wish the release notes were much more detailed than they are, but the fact is that they are not nearly as detailed as they should be and many changes go undocumented, which means that updating the firmware is a logical first step - to correct any bugs detected and corrected by Dell engineering teams. 99.999% of the time or better, there is no harm in running the updates anyway - they are preventative maintenance and generally address known issues and should be installed proactively - why would you wait until you experience the issues before you installed the fix for it? Bricking hardware from failed firmware updates is EXCEEDINGLY rare, way too rare to justify any fear or paranoia over running them.

2 Bronze

Dear all, 

the log is giving an event of Fan redundancy lost. i tried many things and found that even though the 4 fans is running on full speed, 2 of them is not being read by the Dell Administrator. Its saying that 2 are on full speed and 2 are on 0 speed.

I moved around the fans on the slots, the slots can still read the 2 fans that were running but the other 2 fans are still giving 0 on all slots.

I think that concludes that even though those 2 fans are running, they are damaged or at least have board issues and need to be replaced.

any thoughts??

3 Zinc

Naeimsaad,

the fan speed sensor is part of each fan. The fan speed signal is sent to the motherboard via the tacho wire (which is just one of four wires coming from the fan and plugged into the motherboard).

Now i'm not sure about the mechanics of your PE-6850, but when you swapped the fans did the fault follow the fans that the system previously indicated as having 0RPM? If so, you may need to simply buy replacement fans as it's either the tacho wire or the internal speed sensor within the fan that is broken. If the fault does not move when you swap the fans, there is likley something wrong with the motherboard and in that case i'd look at the fan connector for bent pin or pin corrosion of some kind before considering replacing the motherboard.

theflash,

i agree in 99.99% of cases there is no harm in updating firmware, especially when a UPS is in place, but in principle i don't think firmware update should be the first port of call when fault finding. Restarts, firmware updates and the like can clean/reset the firmware state and thus mask the problem which could simply come back later. Better to follow appropriate fault finding methodologies to diagnose and then resolve the fault. Any fault finding methodology that relies first on firmware update is flawed. 

In this case we have found out that a hardware fault was indicated and the solution was not to update System(BIOS) and BMC firmware but to note which fans were reported as faulty and then swap the fans around noteing if the fault followed the fans with subsequent hardware replacement as appropriate.

Updating the System, BMC and/or other firmware to mitigate some issues should be part of any proactive periodic maintenance, just not an automatic first step in unplanned fault diagnosis.

Even if the risk of bricking during firmware update is small, it nevertheless still exsist. Unlike days gone by, when one could simply pull the BIOS chip and place it in a PROM writer before putting it back in the motherboard, these days it's new motherboard that is needed when a flash has gone bad :emotion-1:

And my firmware update paranoia is due to having been bitten by bad flashes, just not with Dell servers as yet :emotion-1:

2 Bronze

Dear SKYLARKING,

As i was going to update the firmware after i updated the BIOS, i came across this "Dell Admin" tool which helped me indicate that there were 2 fans giving faulty results (0RPM) and yes this fault followed the fans when i was swapping them among all slots. The other 2 fans reported normally.

So i'm about to replace the faulty fans for now and check if the speeds drop to normal and the Dell Admin tool reports normal fans...

Thank you very much for your help

7 Thorium

Any fault finding methodology that relies first on firmware update is flawed. 

I disagree. I have worked with many servers and computers over the years, and I believe firmware to be an integral part of troubleshooting and maintaining a healthy system and network.

I wish I could tell you exactly which systems and firmware versions, but:

There is one server whose ESM/BMC update specifies a fix for a "fan reporting error". There are several servers whose ESM/BMC updates address fan speeds and cooling efficiency. Seems relevant in this case, even if not actually applicable.

There many servers whose BIOS updates address SBE errors. You could spend waste a lot of time, energy, and downtime troubleshooting DIMM's and slots when it is a firmware bug.

There is an HDD firmware update that specifies a fix for a "false SMART fail", which cannot be reversed, so if the update is not installed and the drive exhibits the bug, it is needlessly ruined for lack of proper server maintenance.

There are several backplane updates that address issues with random or specific types of timeouts on disk connections and/or power management. You could spend hours and hours running disk diagnostics, swapping slots, swapping disks, reinstalling, etc. and not find a satisfactory solution, causing you to replace the disks, controller, backplane, or even the server.

Granted (and it is part of my [side] point) not all fixes are documented. That is a disservice to customers and to Dell, but given the fact that many bug fixes are documented and knowing that many are undocumented, it makes way more sense to me to start with firmware updates. PROPER server maintenance would have you installing these updates on a regular schedule anyway. Not updating system firmware is just inviting problems and bugs that need troubleshooting and puts your data and environment at risk. Once the server hardware is running as bug-free as Dell has designed, then more rigorous troubleshooting can be more reliable and fruitful.

I don't automatically advise to update the firmware, particularly when the system seems unstable, but when it seems safe to do so, I believe it to be an important FIRST step, as many troubleshooting steps would be in vain with out-of-date firmware. Only Dell engineering knows what goes into the updates. Dell Support doesn't have any more idea than we do. I understand for them it is easier (and probably required) to make sure the firmware is updated before further troubleshooting ... a methodology IN understand AND support.

Top Contributor
Latest Solutions