I have Poweredge 2970 running ESXi4 server.
It have DRAC5, PERC6, 2x Quad port NICs installed.
It's located in a ventilated environment, where temperature wont never go higher than 25 celsius degrees.
Problem is it's fan noise, it's too high now, after I attached two NICs.
I did some testing and find this:
Only DRAC5 attached. Fans: 4350 rpm.
DRAC5 + PERC6 attached. Fans: 5700 rpm.
DRAC5 + PERC6 + 1 NIC attached. Fans: 6300 rpm.
DRAC5 + PERC6 + 2 NIC attached. Fans: 8700 rpm.
When I run this tests, system board probe temperature was 21 celsius degrees.
Is there anything to turn fans down, (I'm sure, 4350 rpm will keep it cool and happy)?
Please let me know.
I think it's the BMC that controls this.
I just looked at mine... under 5K
We have similar boxes.. 12 cores, 24gb ram, Perc 6, only 2 dual port nics.. I think the Perc 6 is a Perc 6/i though.
As John said, it is indeed the BMC that monitors the system for airflow/pressure, temperature, and installed devices ... it is not just a function of the internal temperature. As you have seen, it is programmed to up the speed when cards in general are inserted - some are specific cards.
There is no way to "control" the fan speed, but often, updating the BIOS and BMC/ESM will add code to make sure that it is cooling as efficiently as possible - sometimes resulting in lower fan speeds.
I also have similar issue with BMC that results in high fan speed.
I also have this issue with my PowerEdge 2970 fan spinning very fast and the noise very high. There is always an error that show at boot that indicate BMC is bad. The server is deployed in an air conditioned environment so the temperature is always low.
My question is can I use the server with the fan rpm so high (given that I am sure it is not due to high temperature but probably due to the malfunctioning BMC)? or will using the server at the this high fan rpm crash it or lead to other issue.
Yes, you can use it ... the server only shuts down once it reaches critical status, so as long as there is not actually a cooling issue, it should never reach critical.
Try the following:
What expansion cards do you have installed? How many drives? Which RAID controller? How much RAM?
The fan strategy embedded within the BMC likely has a rather coarse cooling strategy. And from my very limited experience with a T610, it seems this strategy is rather pessimistic in that it can't accurately correlate input air temp and device heat output to the required airflow speeds thus choosing a overly fast and thus ridiculous noisy fan speed. It may not be so relevant in dedicated server rooms (though OHS may have a different view) but it is completely unacceptable for those of us that use a pedestal server located in an office.
Though the issue may have been resolved in the latest and greatest gen12 servers, Dell still needs to pull it's finger out and rethink the cooling algorithm embedded within BMC and release a firmware fix for the older gen servers (especially for pedestal servers). Such a fix is way overdue.
At boot I usually get a message "Baseboard Management Controller Communication failure" and on checking the event viewer log one of system logs is
"The IPMI device driver attempted to communicate with the
IPMI BMC device during device creation.
However the communication failed due to a timeout.
Either the configured timeouts are too small and the
IPMI BMC device responses take longer than the timeouts
or the BMC has a technical fault.
You can increase the timeouts associated with the
IPMI device driver or seek assistance from the BMC device manufacturer."
I believe that all these are contributory to the high fan speed and the front panel display not showing. Any idea how the BMC can be replaced or updated?
from my understanding, as part of the standards applied to motherboard design, to start the fan, speed is initially ramped up to get the device spinning, then quickly slowed down to the required rpm based on the cooling strategy. Since the fan/current/rpm characteristics are known, there should be an fan warning/error indication if a fan is clogged up with dust/fluff and and thus performing out of bounds. Logic can also be used to indicate a warning or error should temperature sensors indicate the system is hotter than the intake air temp and airflow dictate it should be (though this later part is not clearly defined in docs i have read).
The problem with lots of differing systems (not just Dell) is that some warning and error indications are a little abstract and at times less than clear as to where the problem may reside. There is always room for improvement.
petteri's complaints that the fans are spinning too fast for the conditions (and presumably it's annoying). There has not been any indication of h/w issues here.
tohalete has a similar fan noise complaint but he also has a BMC problem (and wants to know if he should be concerned about the server).
We know that in Dell servers, the MOBO/BMC combo controls the fan speed using logic defined and held in secret by Dell. An owner can take no actions to modify the cooling strategies via BIOS, etc. At times the fans run much faster than actually needed.
As such, petteri's problem can only be fixed by Dell exposing more of the cooling strategy via BIOS so the owners of the equipement can adjust appropriately for their conditions of use.
tohalete's problem may be resolved via correcting the underlying BMC isues errors using your (thefalsh's) suggestions but any underlying noise concerns he may have will likely remain. Again, Dell needs to expose a little more of the cooling strategy via BIOS and allow us to tweek.
BUT, having had issues with Dell and HP products and considering resolution has been handled via component swap out rather than true h/w root analysis, one can only conclude that the HW must be way overpriced to cater for this support strategy and that the designers are too shielded from reality. So to expect Dell to release a firmware fix for what is a poor cooling strategy for pedestal is a BIG ask (of their support structure) and one likely not to materialize.
So yes, my comments are clearly lost on this one... But they shouldn't be...