Start a Conversation

Unsolved

This post is more than 5 years old

98618

November 24th, 2010 13:00

M1000e fans and the M610x...what's happening?

Has anyone had a problem with the M610x causing all the fans in the chassis to ramp up to 14-15K RPM and stay there?

We have M1000e with:
M610x in slots (1&9) and (2&10)
M600 in slots 3,4 5,6,7,8,11 & 12
M610 in slots 13 & 14

Chassis power capped at 5500W. 6 PSUs give 2360W each.

Without the M610x servers on, all the fans run between 4.7K and 6.1K RPM...temps are low and nowwhere near the popwer cap I've set.

Switch an M610x and it takes off...all fans immediately up over 14K and they stay there.

I've just spent 8 hours overtime updating every firmware I could find and still no joy.

And by the way, it's a show stopper. The extra power drawn by the chassis when all fans hit the roof takes us periolously close to our allowance in that rack. Are Dell aware of this? Is there a firmware fix on the way? Anything to give me hope?

I have a call raised with ProSupport (UK) 826469471 and have also found this poor soul with a similar issue: http://www.delltechcenter.com/thread/4317201/M1000e+fan+control

Is there an issue with the BIOS/CPLD of the M610x that needs looking at maybe?

Regards,

Neil

December 8th, 2010 01:00

I will open a support call today

December 8th, 2010 10:00

Dell have picked up one of our M610x servers - it's going to their facility in Montpellier (it's no use to us right now) in order to determine if it's a problem with the servers or the chassis. Evidence is beginning to mount that it's the latter, by the looks of things, though. Let's hope we get to the bottom of it all, and let's keep it alive until we do.

December 9th, 2010 01:00

Hi, KongY,
The processors in M610x are Xeon L5640 * 2 with 96GB memory.
BIOS version is 2.1.16.

I already informed customer support of DELL Japan, and they replaced momory unit because CMC showed "Mem ECC Warning" but after a couple of hours, the same warning appears and the fans are keeping the maximum speed.
I do not know how to "open a support case" other than just claiming to DELL support.

180 Posts

December 9th, 2010 07:00

Numao,

If you called Dell customer support and opened a case, they would assign you a support #. Stay tuned as PG and IPS are finalizing root-cause analysis.

KongY@Dell

KongY@Dell

180 Posts

December 9th, 2010 07:00

Neil,

Sorry to hear about more of your issues. Have you opened a support case for the M600s? I will ping PG and IPS again.

KongY@Dell

December 9th, 2010 07:00

The problem is reproduced when Dell plugged our blade into their chassis - so Dell claim that the server is at fault. As a contingency, they have put through a replacement order so hopefully we will have two new servers whcih work soon.

More problems though - after we've upgraded all the firmwares on the chassis and our existing M600, we've hit other problems worse than the first!

Our M600s stop communication with the chassis - and then come back with no intervention - not always at the same time. iDRACs stop responding and then come back to life with no intervention. The only symptom we see is in Windows System eventlog IPMIDRV 1004 warnings. But if we're unlucky and a couple of servers in a cluster get struck at the same time, we lose Exchange or our Hyper-V cluster. Not good. Not good at all.

I've raised a separate thread here... http://www.delltechcenter.com/thread/4378686/IPMIDRV+1004+errors+on+Windows+2008+R2+-+M600+blades

December 9th, 2010 08:00

It's all coming back to me. About 6 months ago we had a amber light on one of our M600's (in a different chassis) and at the same time the fans kicked off. Pulling out the blade and the fans went back to normal. The fault turned out to be a faulty DIMM in the blade.
However in my current situation - fans running at full speed - there are no error lights on the blades and no indication of problems on any of them. Unfortunately they are all part of a critical system so it is going to be a while before I am allowed to power them down.
Also the call I logged earlier - the advice was to update the BIOS and firmware on the chassis and blades. Following your comments nrawlinson I might hold off doing that!!

December 9th, 2010 08:00

Your drivers are probably already all up to date. If they aren't, make sure that you have the latest Intel "chipset" drivers. While provisioning all of our M610s, I installed the chipset drivers. During the driver install, the fans cranked up to full speed. Once the install was finished, they went back to normal operation. I admit this is a long shot, but it is definitely worth checking.

December 9th, 2010 08:00

Thanks David. Might be worth posting a link - but we;ve established it's not a software issue. Dell engineers have made progress and are working on a fix - some sort of problem between the processors and iDRACs.

December 9th, 2010 09:00

Yes - I certainly would hold fire on the updates for now...I've got a separate call open with Dell for this new problem - and I'll update that issue on the other forum post I made.

http://www.delltechcenter.com/thread/4378686/IPMIDRV+1004+errors+on+Windows+2008+R2+-+M600+blades

I have been impressed with the way this forum works, and the fact that Dell are so responsive.

December 10th, 2010 08:00

Latest on the M610x and the fans ramping up is this:

From looking at the server we sent off to Dell, they've discovered a problem between the iDRAC and the "low-power" Intel CPUs in there. Ours are the L5640 2.26Ghz. A new iDRAC firmware is in testing, assuming all goes well this should be available some time next week.

Does this tie in with anyone else's problem - especially Numao perhaps?

December 14th, 2010 18:00

Yes, we use the same CPU: L5640. So the problem is the chemistry between CPU and firmware?
The difference from nrawlinson's case is that once we remove the GPU card from M610x, the fans calm down to the normal speed.

Any good news?

December 14th, 2010 23:00

We're just awaiting the new iDRAC firmware. Hopefully not too long now!

December 17th, 2010 03:00

We've been offered a "Beta" firmware, but this is for our live production systems so I'm a little wary.

December 22nd, 2010 13:00

After some reassurances, we agreed to Dell's disclaimer...the beta firmware (3.04) is in and, at least initially, has solved the issue. Looking forward to a full public release now, sometime in 2011. Hopefully that's it for this issue and we can get on with our project finally. Thanks for watching!

Neil
No Events found!

Top