Unsolved
This post is more than 5 years old
2 Intern
•
131 Posts
0
14674
June 1st, 2015 15:00
R910 RAID array statisics not available through iDRAC. How to get them?
I've got an R910 this one:
System Information:
System Model = PowerEdge R910
System Revision = I
System BIOS Version = 2.2.3
Service Tag = 7LH**** (Edited)
Express Svc Code = 165********(Edited)
Host Name = NEPTUNE
OS Name = VMware ESXi 5.0.0 build-469512
Power Status = ON
And too bad for me, but finance decided that it had been running a long time, and would likely still do so and let's save the support money this year. Great. Naturally now I have a problem with my awesome R910 that runs much of my infrastructure, and has been rock solid until this weekend.
I noticed many of my VMs experiencing storage access issues and remounting their drives reak only. FSCK and reboots fix this issue, but it recurrs, and my iDRAC is not any help when I try to troubleshoot this. I have no visibility into the array controller or the condition of the disks. The iDRAC is having trouble producing web pages for me at this point, but I do have access to the command line. Are there racadm commands I can use to see if I have failed drives?
Thanks for your help.


Jack_the_coiner
2 Intern
•
131 Posts
0
June 1st, 2015 15:00
Does this stand for what I think it stands for?
mass-storage-fu
racdump shows me 3 of these:
514 root SW< [mass-storage-fu]
515 root SW< [mass-storage-fu]
516 root SW< [mass-storage-fu]
Jack_the_coiner
2 Intern
•
131 Posts
0
June 1st, 2015 16:00
Here is some more color: see the attached picture which shows the console of the ESXi server. It appears that one of the scsi disks is in trouble:
I think if I pull that one out the RAID array can run in a degraded state until I can get a replacement, but how to tell which physical device corresonds to 6782bcb052095c001619190425be4e6? This is the challenge now.
Jack_the_coiner
2 Intern
•
131 Posts
0
June 1st, 2015 17:00
Is there any way to access this PERC controller and mark that disk as failed so the controller can rebuild this with the spare?
Jack_the_coiner
2 Intern
•
131 Posts
0
June 2nd, 2015 09:00
Nothing? I suppose because the R910 is such a "high end" box most people who have one utilize Dell on-site support every time there is a problem. This is an expensive way to go:
1-year costs $1,901.65
3-years costs $5,823.33
This covers all hardware in the box and 4hr on-site...supposedly. Honestly I had this kind of support for the first three years of this machine, and no Dell expert ever showed up for a hard drive issue. Also I can never remember Dell or HP (forget IBM) ever actually sending anyone out to do anything for me. I think the 4-hour on-site claim is suspect. Anyway I will have my own guy out there today and hopefully we can get to the bottom of this, then I will post the resolution here.
Jack_the_coiner
2 Intern
•
131 Posts
0
June 2nd, 2015 14:00
Wow this has gone from bad into the fire. I stopped all VMs and followed Chris Hawk's instructions that add the Dell OpenManage Offline Bundle and VIB to the ESXi server and visibility to the array. Finally a reboot. And the R910 did not come back. I am now staring at 2 errors:
1. E1000 failsafe voltage error
2. E122E onboard regulator failed
This is very very bad. Please help me if you have seen this before.