Start a Conversation

Unsolved

This post is more than 5 years old

458000

December 9th, 2010 03:00

IPMIDRV 1004 errors on Windows 2008 R2 - M600 blades

Since updating firmware on all our M600 blades in order to attempt to resolve another issue ( http://www.delltechcenter.com/thread/4353610/M1000e+fans+and+the+M610x...what%27s+happening%3F), we have started to have problems with various network connections randomly dropping out - usually for about 10 minutes - before returning with no intervention. The whole time, the following error is logged in the Windows System event log:

Log Name: System
Source: IPMIDRV
Date: 09/12/2010 09:53:46
Event ID: 1004
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: hostname.domain.com
Description:
The IPMI device driver attempted to communicate with the IPMI BMC device during normal operation. However the communication failed due to a timeout. You can increase the timeouts associated with the IPMI device driver.
Event Xml:



1004
3
0
0x80000000000000

3638
System
hostname.domain.com



\Device\00000066
000004000100000000000000EC030580000000000000000000000000000000000000000000000000E50000C0



Can anyone enlighten me as to what the problem might be?
We've had these errors on at least 5 of our 8 M600s, all on Windows 2008 R2.

January 24th, 2011 08:00

Not fixed by Intel NICs.

180 Posts

January 24th, 2011 10:00

Neil,

Are you still engaging with support and IPS on this issue? No new updates from my PG contacts.

KongY@Dell

January 24th, 2011 11:00

Had a conference today with escalation from Dell. We've decided on a new chassis and allow IPS to take a look at ours in the labs.

6 Posts

March 11th, 2011 07:00

Neil,

I am experiencing the same issue - Dell M1000e chassis, M600 blade w/v2.3.1 firmware, iDRAC v1.5.3. I get the IPMI errors (1004) every couple of days for about 2 minutes. Fortunately this has only happened after formal business hours, but it a very disconcerting because it is on our Exchange 2007 CAS and would be very obvious to users.

Have you, or Dell, found a resolution to the issue?

Thanks in advance,

Chris

March 12th, 2011 07:00

Not as yet - we're swapping out the blade chassis tomorrow...dell want ours to get in the labs. Is that your only M600 in the chassis? Updated any firmwares recently? What else is in the chassis? Any Hyper-V?

6 Posts

March 12th, 2011 19:00

Our chassis is about 3 years old and still has the original 8 blades that we purchased with it - all M600s. The one having the issue may be only one at the 2.3.1 firmware (others are older 2.1.4 I think), but it is definately the only one running Server 2008 R2. No Hyper-V, but 4 of the 8 blades are running VMWare ESX 4 and have no issues. We've only started having the issue when I repurposed the one blade, updated the firmware, and installed Server 2008 R2 - that was about 4 months ago.

Let me know if I can provide any other information.

Chris

March 13th, 2011 01:00

That is interesting...maybe that will narrow down the search. Hopefully someone from Dell might be able to take a look at that. I'll update to let you know how today's chassis swap-out goes.

March 15th, 2011 06:00

And which CPU(s) in there?

March 15th, 2011 06:00

Chassis change has not fixed...midplane revision is 1.1 in this one, as opposed to 1.0 in our old chassis. Given this info from you Chris, I'm going to downlevel one of our M600s that we've had to take out of production to see if I can nail the problem. No comment from Dell on your issue as yet I see...maybe they're looking into it already. Just out of interest, could you post some further info for you R2 blade...which NICs/firmware/drivers?

6 Posts

March 15th, 2011 07:00

I'm sorry to hear that the chassis replacement did not work. My suspicion here is that it is a problem with the Server 2008 R2 IPMI device driver and the M600 blade combination. I went back and checked the BIOS on all of my blades and found that I mis-spoke in my previous e-mail - all 4 of the ESX servers are running BIOS 2.3.1 along with the R2 server that is experiencing the problem. The other 3 M600s are running the older 2.1.4 BIOS.

In response to your request, below is information on hardware and driver versions that I retreived from Server Administrator/OpenManage:

NICS: Broadcom BCM5708S NetXtreme II GigE FRMW 5.2.7 DRVR 4.6.110.0 (there are 4 of these, Fabrics A and B)
CPUS: Intel(R) Xeon(R) CPU E5430 @ 2.66GHz Model 23 Stepping 6 (2 of these)

Please let me know if you need additional information.

Chris

March 15th, 2011 10:00

We've found the IPMI errors are a symptom of a deeper problem. the iDRAC stops responding when it happens, and there are IPMI errors in the chassis logs too...and I've seen this happen when the server is sat in the BIOS screen, no O/S loaded. It may be your problem is different, but in our case, the IPMI errors are symptomatic of a hardware issue. Thanks for the update, though. I'll let you know if I find anything tomorrow.

6 Posts

March 15th, 2011 11:00

I have not seen any IPMI errors in the logs for the chassis. Maybe I'm not recognizing them, but when I go under the logs tab, I do not see any theat appear to be related to the problem under either the Hardware Log or the CMC Log.

March 16th, 2011 01:00

If you run up an SSH command session to the CMC and run RACDUMP, you'll probably see them. Search the output for IPMI.

March 21st, 2011 03:00

Try running C:\PROGRAM FILES (x86)\DELL\SYSMGT\RAC5\RACADM.EXE RESETRAC on the server that keeps having IPMI errors. It's only been over the weekend, but I've not had any errors since I ran this command. Throughout all this, I've been managing the iDRACs through the chassis. Maybe someone at Dell can enlighten us on whether this "soft" reset through the racadm command on the local server does something that a hard reset, such as a reseat or a virtual reseat does not.

6 Posts

March 22nd, 2011 14:00

I tried running the RACADM command on the server as you suggested. I found the syntax to be a bit different from what you specified (RACRESET instead of RESETRAC), but unfortunately I am still receiving the IMPI errors. The weird thing is that is they only occur after normal production hours, but never on the exact same time on each occurance.

Thanks anyway for suggesting this - I hope it's still working for you.
No Events found!

Top