R910 RAID array statisics not available through iDRAC. How to get them?

Question

I've got an R910 this one:

System Information:
System Model                = PowerEdge R910
System Revision       = I
System BIOS Version   = 2.2.3
Service Tag                    = 7LH**** (Edited)
Express Svc Code        = 165********(Edited)
Host Name                     = NEPTUNE
OS Name                = VMware ESXi 5.0.0 build-469512
Power Status                  = ON

And too bad for me, but finance decided that it had been running a long time, and would likely still do so and let's save the support money this year. Great. Naturally now I have a problem with my awesome R910 that runs much of my infrastructure, and has been rock solid until this weekend.

I noticed many of my VMs experiencing storage access issues and remounting their drives reak only. FSCK and reboots fix this issue, but it recurrs, and my iDRAC is not any help when I try to troubleshoot this. I have no visibility into the array controller or the condition of the disks. The iDRAC is having trouble producing web pages for me at this point, but I do have access to the command line. Are there racadm commands I can use to see if I have failed drives?

Thanks for your help.

Jack_the_coiner · Answer

Does this stand for what I think it stands for?

mass-storage-fu

racdump shows me 3 of these:

514 root            SW< [mass-storage-fu]
515 root            SW< [mass-storage-fu]
516 root            SW< [mass-storage-fu]

Jack_the_coiner · Answer

Here is some more color: see the attached picture which shows the console of the ESXi server. It appears that one of the scsi disks is in trouble:

I think if I pull that one out the RAID array can run in a degraded state until I can get a replacement, but how to tell which physical device corresonds to 6782bcb052095c001619190425be4e6? This is the challenge now.

Jack_the_coiner · Answer

Is there any way to access this PERC controller and mark that disk as failed so the controller can rebuild this with the spare?

Jack_the_coiner · Answer

Nothing? I suppose because the R910 is such a "high end" box most people who have one utilize Dell on-site support every time there is a problem. This is an expensive way to go:

1-year costs $1,901.65

3-years costs $5,823.33

This covers all hardware in the box and 4hr on-site...supposedly. Honestly I had this kind of support for the first three years of this machine, and no Dell expert ever showed up for a hard drive issue. Also I can never remember Dell or HP (forget IBM) ever actually sending anyone out to do anything for me. I think the 4-hour on-site claim is suspect. Anyway I will have my own guy out there today and hopefully we can get to the bottom of this, then I will post the resolution here.

Jack_the_coiner · Answer

Wow this has gone from bad into the fire. I stopped all VMs and followed Chris Hawk's instructions that add the Dell OpenManage Offline Bundle and VIB to the ESXi server and visibility to the array. Finally a reboot. And the R910 did not come back. I am now staring at 2 errors:

1. E1000 failsafe voltage error

2. E122E onboard regulator failed

This is very very bad. Please help me if you have seen this before.

PowerEdge HDD/SCSI/RAID

R910 RAID array statisics not available through iDRAC. How to get them?

Was this post helpful?