Access Raid Config while server is running

Question

Hi,

We have a R510 build as XenServer 5.6 with 2 Raid 5 Configs. Now 1 Month out of warranty, and one of the Raids had 2 drives removed from the Raid thereby disabling access to the VD.

In order to rebuild the Raid, I had to go into the Bios and open the Raid Config tool. I managed to manually re-import the drives and rebuild the array, but will need to replace the disks that were disjointed.

I have a few questions.

1. What tool do I use to get access to the Raid while the server is running. The rebuild of the array took about 8 hours which is very long to be offline. The server has iDrac 6.5, which does not have Disk access. I have installed the XenServer Openmanage package, but that only gives me a summary of the OS, and no configuration options.

2. What do I need to change to have the drives automatically import and rebuild new/replacement drives. These arrays are supposed to be hot swappable, but that is impossible if I have to boot into the Raid Config tool.

3. If 3 drives are listed as Online, and the fourth is listed as Rebuilding, is it save to reboot the server into normal operation, or does the rebuild need to finish before rebooting the server into normal operation.

Thank you in advance.

theflash1932 · Answer

1. You need to use the OpenManage Server Administrator software to access it while "live", but I don't have any info for running it on Xen:

http://en.community.dell.com/techcenter/virtualization/w/wiki/3072.aspx

Even if you intiate the rebuild from the controller's configuration utility, you do NOT have to wait until it is completed to boot the OS ... the rebuild will continue after rebooting and will run while the server is up and running, although with slightly degraded performance.

2. Arrays are not hot-swappable - drives are hot-swappable, and because you have to boot to CTRL-R to initiate the rebuild does not make them non-hot-swappable (although it may seem that way when you have no way of managing the drives). I would suggest taking a look at the instructions in the link above for accessing the controller/drives while system is live, and posting a specific question for getting it going on Xen if the above doesn't help (or maybe one of the Dell SysMan experts will post here as a follow-up).

3. As above, you need only INITIATE the rebuild in the CTRL-R utility (in the absence of an automatic rebuild or OMSA to start it while system is up and running) ... you may reboot to and run the OS after initiating the rebuild.

gaartsen · Answer

Hi and thank you very much for your reply. I managed to get the Xen OpenManage Server Administrator going, so now I can at least see what's happening.

I replaced one of the "failed" drives and rebuild the raid, but after a few hours the new drive was removed from the raid as well and marked "failed". I get following error

"The Virtual Disk has bad blocks. For more details, see the Virtual Disk Bad Block Management section in the Online Help."

I am wondering whether this could be due to a hardware failure on the backplane. Or perhaps an overdue firmware upgrade PERC H700 Integrated from 12.10.0-0025 to 12.10.6 would solve things.
I have put this upgrade off because I have no fail-over solution, and have no idea about the fail-rate of BIOS and Firmware upgrades.

Any further help and/or hints would be highly appreciated

theflash1932 · Answer

What is the make/model of drives you are using?  Are they certified (Dell) drives?  Out of date firmware can certainly lead to faults with the disks/controllers/virtual disks?  You can attempt a Consistency Check on the virtual disk, but if that doesn't resolve it, the only way to fix virtual disk corruption is to wipe it out and reinitialize it.

theflash1932 · Answer

Because firmware updates are recommended on Dell servers, for fixes to reliability and performance issues, great care has been taken to make them safe.  If you Google, yes, you will find firmware updates gone bad, but of the thousands of firmware updates I have personally performed, only once or twice have I ever experienced failed hardware because of them ... and with those, the hardware health was questionable anyway.

gaartsen · Answer

I have raid5 VD with 4 certified ST3300657SS drives of which 1 is non-critical.
The messages for that drive are:
Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:0 Controller 0, Connector 0

the other problem VD is a raid 5 with 4 non certified ST31000528AS drives. 1 bay keeps marking the drive as FAILED
The messages for that drive are:

2076	Mon Feb 24 16:08:08 2014	Storage Service	Virtual disk Check Consistency failed: Virtual Disk 1 (Data) Controller 0 (PERC H700 Integrated)
	2095	Mon Feb 24 16:08:08 2014	Storage Service	Unexpected sense. SCSI sense data: Sense key: 6 Sense code: 29 Sense qualifier: 0: Physical Disk 1:0:7 Controller 0, Connector 1
	2405	Mon Feb 24 16:08:08 2014	Storage Service	Command timeout on physical disk: Physical Disk 1:0:7 Controller 0, Connector 1
	2123	Mon Feb 24 16:08:07 2014	Storage Service	Redundancy lost: Virtual Disk 1 (Data) Controller 0 (PERC H700 Integrated)
	2057	Mon Feb 24 16:08:07 2014	Storage Service	Virtual disk degraded: Virtual Disk 1 (Data) Controller 0 (PERC H700 Integrated)
	2405	Mon Feb 24 16:08:06 2014	Storage Service	Command timeout on physical disk: Physical Disk 1:0:7 Controller 0, Connector 1
	2346	Mon Feb 24 16:08:06 2014	Storage Service	Error occurred: Error on PD 07(e0x20/s7) (Error f0).: Physical Disk 1:0:7 Controller 0, Connector 1
	2048	Mon Feb 24 16:08:06 2014	Storage Service	Device failed: Physical Disk 1:0:7 Controller 0, Connector 1
	2095	Mon Feb 24 16:07:41 2014	Storage Service	Unexpected sense. SCSI sense data: Sense key: 6 Sense code: 29 Sense qualifier: 0: Physical Disk 1:0:7 Controller 0, Connector 1
	2405	Mon Feb 24 16:07:41 2014	Storage Service	Command timeout on physical disk: Physical Disk 1:0:7 Controller 0, Connector 1
	2405	Mon Feb 24 16:07:08 2014	Storage Service	Command timeout on physical disk: Physical Disk 1:0:7 Controller 0, Connector 1
	2405	Mon Feb 24 16:06:46 2014	Storage Service	Command timeout on physical disk: Physical Disk 1:0:7 Controller 0, Connector 1
	2095	Mon Feb 24 16:06:46 2014	Storage Service	Unexpected sense. SCSI sense data: Sense key: 6 Sense code: 29 Sense qualifier: 0: Physical Disk 1:0:7 Controller 0, Connector 1
	2095	Mon Feb 24 16:06:46 2014	Storage Service	Unexpected sense. SCSI sense data: Sense key: 6 Sense code: 29 Sense qualifier: 0: Physical Disk 1:0:7 Controller 0, Connector 1
	2058	Mon Feb 24 15:53:17 2014	Storage Service	Virtual disk Check Consistency started: Virtual Disk 1 (Data) Controller 0 (PERC H700 Integrated)
	2095	Mon Feb 24 15:44:33 2014	Storage Service	Unexpected sense. SCSI sense data: Sense key: 5 Sense code: 24 Sense qualifier: 0: Enclosure 0:0 Controller 0, Connector 0
	2359	Mon Feb 24 15:42:38 2014	Storage Service	A non-Dell supplied disk drive has been detected: Physical Disk 1:0:5 Controller 0, Connector 1
	2359	Mon Feb 24 15:42:38 2014	Storage Service	A non-Dell supplied disk drive has been detected: Physical Disk 1:0:6 Controller 0, Connector 1
	2359	Mon Feb 24 15:42:38 2014	Storage Service	A non-Dell supplied disk drive has been detected: Physical Disk 1:0:7 Controller 0, Connector 1
	2387	Mon Feb 24 15:42:37 2014	Storage Service	Virtual disk bad block medium error is detected.: Virtual Disk 1 (Data) Controller 0 (PERC H700 Integrated)
	2094	Mon Feb 24 15:42:37 2014	Storage Service	Predictive Failure reported: Physical Disk 0:0:0 Controller 0, Connector 0
	2359	Mon Feb 24 15:42:37 2014	Storage Service	A non-Dell supplied disk drive has been detected: Physical Disk 1:0:4 Controller 0, Connector 1
	2131	Mon Feb 24 15:42:36 2014	Storage Service	The current firmware version 12.10.0-0025 is older than the required firmware version 12.10.1-0001 for a controller of model 0x1F17: Controller 0 (PERC H700 Integrated)

PowerEdge HDD/SCSI/RAID

Access Raid Config while server is running

Was this post helpful?