Unsolved
This post is more than 5 years old
3 Posts
0
107349
Access Raid Config while server is running
Hi,
We have a R510 build as XenServer 5.6 with 2 Raid 5 Configs. Now 1 Month out of warranty, and one of the Raids had 2 drives removed from the Raid thereby disabling access to the VD.
In order to rebuild the Raid, I had to go into the Bios and open the Raid Config tool. I managed to manually re-import the drives and rebuild the array, but will need to replace the disks that were disjointed.
I have a few questions.
1. What tool do I use to get access to the Raid while the server is running. The rebuild of the array took about 8 hours which is very long to be offline. The server has iDrac 6.5, which does not have Disk access. I have installed the XenServer Openmanage package, but that only gives me a summary of the OS, and no configuration options.
2. What do I need to change to have the drives automatically import and rebuild new/replacement drives. These arrays are supposed to be hot swappable, but that is impossible if I have to boot into the Raid Config tool.
3. If 3 drives are listed as Online, and the fourth is listed as Rebuilding, is it save to reboot the server into normal operation, or does the rebuild need to finish before rebooting the server into normal operation.
Thank you in advance.
theflash1932
9 Legend
9 Legend
•
16.3K Posts
0
February 24th, 2014 08:00
1. You need to use the OpenManage Server Administrator software to access it while "live", but I don't have any info for running it on Xen:
http://en.community.dell.com/techcenter/virtualization/w/wiki/3072.aspx
Even if you intiate the rebuild from the controller's configuration utility, you do NOT have to wait until it is completed to boot the OS ... the rebuild will continue after rebooting and will run while the server is up and running, although with slightly degraded performance.
2. Arrays are not hot-swappable - drives are hot-swappable, and because you have to boot to CTRL-R to initiate the rebuild does not make them non-hot-swappable (although it may seem that way when you have no way of managing the drives). I would suggest taking a look at the instructions in the link above for accessing the controller/drives while system is live, and posting a specific question for getting it going on Xen if the above doesn't help (or maybe one of the Dell SysMan experts will post here as a follow-up).
3. As above, you need only INITIATE the rebuild in the CTRL-R utility (in the absence of an automatic rebuild or OMSA to start it while system is up and running) ... you may reboot to and run the OS after initiating the rebuild.
gaartsen
3 Posts
0
February 26th, 2014 02:00
Hi and thank you very much for your reply. I managed to get the Xen OpenManage Server Administrator going, so now I can at least see what's happening.
I replaced one of the "failed" drives and rebuild the raid, but after a few hours the new drive was removed from the raid as well and marked "failed". I get following error
"The Virtual Disk has bad blocks. For more details, see the Virtual Disk Bad Block Management section in the Online Help."
I am wondering whether this could be due to a hardware failure on the backplane. Or perhaps an overdue firmware upgrade PERC H700 Integrated from 12.10.0-0025 to 12.10.6 would solve things.
I have put this upgrade off because I have no fail-over solution, and have no idea about the fail-rate of BIOS and Firmware upgrades.
Any further help and/or hints would be highly appreciated
theflash1932
9 Legend
9 Legend
•
16.3K Posts
0
February 26th, 2014 07:00
What is the make/model of drives you are using? Are they certified (Dell) drives? Out of date firmware can certainly lead to faults with the disks/controllers/virtual disks? You can attempt a Consistency Check on the virtual disk, but if that doesn't resolve it, the only way to fix virtual disk corruption is to wipe it out and reinitialize it.
theflash1932
9 Legend
9 Legend
•
16.3K Posts
0
February 26th, 2014 07:00
Because firmware updates are recommended on Dell servers, for fixes to reliability and performance issues, great care has been taken to make them safe. If you Google, yes, you will find firmware updates gone bad, but of the thousands of firmware updates I have personally performed, only once or twice have I ever experienced failed hardware because of them ... and with those, the hardware health was questionable anyway.
gaartsen
3 Posts
0
February 26th, 2014 16:00
I have raid5 VD with 4 certified ST3300657SS drives of which 1 is non-critical.
The messages for that drive are:
Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:0 Controller 0, Connector 0
the other problem VD is a raid 5 with 4 non certified ST31000528AS drives. 1 bay keeps marking the drive as FAILED
The messages for that drive are: