Even if you intiate the rebuild from the controller's configuration utility, you do NOT have to wait until it is completed to boot the OS ... the rebuild will continue after rebooting and will run while the server is up and running, although with slightly degraded performance.
2. Arrays are not hot-swappable - drives are hot-swappable, and because you have to boot to CTRL-R to initiate the rebuild does not make them non-hot-swappable (although it may seem that way when you have no way of managing the drives). I would suggest taking a look at the instructions in the link above for accessing the controller/drives while system is live, and posting a specific question for getting it going on Xen if the above doesn't help (or maybe one of the Dell SysMan experts will post here as a follow-up).
3. As above, you need only INITIATE the rebuild in the CTRL-R utility (in the absence of an automatic rebuild or OMSA to start it while system is up and running) ... you may reboot to and run the OS after initiating the rebuild.
Hi and thank you very much for your reply. I managed to get the Xen OpenManage Server Administrator going, so now I can at least see what's happening.
I replaced one of the "failed" drives and rebuild the raid, but after a few hours the new drive was removed from the raid as well and marked "failed". I get following error
"The Virtual Disk has bad blocks. For more details, see the Virtual Disk Bad Block Management section in the Online Help."
I am wondering whether this could be due to a hardware failure on the backplane. Or perhaps an overdue firmware upgrade PERC H700 Integrated from 12.10.0-0025 to 12.10.6 would solve things. I have put this upgrade off because I have no fail-over solution, and have no idea about the fail-rate of BIOS and Firmware upgrades.
Any further help and/or hints would be highly appreciated
What is the make/model of drives you are using? Are they certified (Dell) drives? Out of date firmware can certainly lead to faults with the disks/controllers/virtual disks? You can attempt a Consistency Check on the virtual disk, but if that doesn't resolve it, the only way to fix virtual disk corruption is to wipe it out and reinitialize it.
Because firmware updates are recommended on Dell servers, for fixes to reliability and performance issues, great care has been taken to make them safe. If you Google, yes, you will find firmware updates gone bad, but of the thousands of firmware updates I have personally performed, only once or twice have I ever experienced failed hardware because of them ... and with those, the hardware health was questionable anyway.
I have raid5 VD with 4 certified ST3300657SS drives of which 1 is non-critical. The messages for that drive are: Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:0 Controller 0, Connector 0
the other problem VD is a raid 5 with 4 non certified ST31000528AS drives. 1 bay keeps marking the drive as FAILED The messages for that drive are:
2076
Mon Feb 24 16:08:08 2014
Storage Service
Virtual disk Check Consistency failed: Virtual Disk 1 (Data) Controller 0 (PERC H700 Integrated)
2095
Mon Feb 24 16:08:08 2014
Storage Service
Unexpected sense. SCSI sense data: Sense key: 6 Sense code: 29 Sense qualifier: 0: Physical Disk 1:0:7 Controller 0, Connector 1
2405
Mon Feb 24 16:08:08 2014
Storage Service
Command timeout on physical disk: Physical Disk 1:0:7 Controller 0, Connector 1
Virtual disk degraded: Virtual Disk 1 (Data) Controller 0 (PERC H700 Integrated)
2405
Mon Feb 24 16:08:06 2014
Storage Service
Command timeout on physical disk: Physical Disk 1:0:7 Controller 0, Connector 1
2346
Mon Feb 24 16:08:06 2014
Storage Service
Error occurred: Error on PD 07(e0x20/s7) (Error f0).: Physical Disk 1:0:7 Controller 0, Connector 1
2048
Mon Feb 24 16:08:06 2014
Storage Service
Device failed: Physical Disk 1:0:7 Controller 0, Connector 1
2095
Mon Feb 24 16:07:41 2014
Storage Service
Unexpected sense. SCSI sense data: Sense key: 6 Sense code: 29 Sense qualifier: 0: Physical Disk 1:0:7 Controller 0, Connector 1
2405
Mon Feb 24 16:07:41 2014
Storage Service
Command timeout on physical disk: Physical Disk 1:0:7 Controller 0, Connector 1
2405
Mon Feb 24 16:07:08 2014
Storage Service
Command timeout on physical disk: Physical Disk 1:0:7 Controller 0, Connector 1
2405
Mon Feb 24 16:06:46 2014
Storage Service
Command timeout on physical disk: Physical Disk 1:0:7 Controller 0, Connector 1
2095
Mon Feb 24 16:06:46 2014
Storage Service
Unexpected sense. SCSI sense data: Sense key: 6 Sense code: 29 Sense qualifier: 0: Physical Disk 1:0:7 Controller 0, Connector 1
2095
Mon Feb 24 16:06:46 2014
Storage Service
Unexpected sense. SCSI sense data: Sense key: 6 Sense code: 29 Sense qualifier: 0: Physical Disk 1:0:7 Controller 0, Connector 1
2058
Mon Feb 24 15:53:17 2014
Storage Service
Virtual disk Check Consistency started: Virtual Disk 1 (Data) Controller 0 (PERC H700 Integrated)
2095
Mon Feb 24 15:44:33 2014
Storage Service
Unexpected sense. SCSI sense data: Sense key: 5 Sense code: 24 Sense qualifier: 0: Enclosure 0:0 Controller 0, Connector 0
2359
Mon Feb 24 15:42:38 2014
Storage Service
A non-Dell supplied disk drive has been detected: Physical Disk 1:0:5 Controller 0, Connector 1
2359
Mon Feb 24 15:42:38 2014
Storage Service
A non-Dell supplied disk drive has been detected: Physical Disk 1:0:6 Controller 0, Connector 1
2359
Mon Feb 24 15:42:38 2014
Storage Service
A non-Dell supplied disk drive has been detected: Physical Disk 1:0:7 Controller 0, Connector 1
2387
Mon Feb 24 15:42:37 2014
Storage Service
Virtual disk bad block medium error is detected.: Virtual Disk 1 (Data) Controller 0 (PERC H700 Integrated)
2094
Mon Feb 24 15:42:37 2014
Storage Service
Predictive Failure reported: Physical Disk 0:0:0 Controller 0, Connector 0
2359
Mon Feb 24 15:42:37 2014
Storage Service
A non-Dell supplied disk drive has been detected: Physical Disk 1:0:4 Controller 0, Connector 1
2131
Mon Feb 24 15:42:36 2014
Storage Service
The current firmware version 12.10.0-0025 is older than the required firmware version 12.10.1-0001 for a controller of model 0x1F17: Controller 0 (PERC H700 Integrated)
theflash1932
9 Legend
•
16.3K Posts
0
February 24th, 2014 08:00
1. You need to use the OpenManage Server Administrator software to access it while "live", but I don't have any info for running it on Xen:
http://en.community.dell.com/techcenter/virtualization/w/wiki/3072.aspx
Even if you intiate the rebuild from the controller's configuration utility, you do NOT have to wait until it is completed to boot the OS ... the rebuild will continue after rebooting and will run while the server is up and running, although with slightly degraded performance.
2. Arrays are not hot-swappable - drives are hot-swappable, and because you have to boot to CTRL-R to initiate the rebuild does not make them non-hot-swappable (although it may seem that way when you have no way of managing the drives). I would suggest taking a look at the instructions in the link above for accessing the controller/drives while system is live, and posting a specific question for getting it going on Xen if the above doesn't help (or maybe one of the Dell SysMan experts will post here as a follow-up).
3. As above, you need only INITIATE the rebuild in the CTRL-R utility (in the absence of an automatic rebuild or OMSA to start it while system is up and running) ... you may reboot to and run the OS after initiating the rebuild.
gaartsen
3 Posts
0
February 26th, 2014 02:00
Hi and thank you very much for your reply. I managed to get the Xen OpenManage Server Administrator going, so now I can at least see what's happening.
I replaced one of the "failed" drives and rebuild the raid, but after a few hours the new drive was removed from the raid as well and marked "failed". I get following error
"The Virtual Disk has bad blocks. For more details, see the Virtual Disk Bad Block Management section in the Online Help."
I am wondering whether this could be due to a hardware failure on the backplane. Or perhaps an overdue firmware upgrade PERC H700 Integrated from 12.10.0-0025 to 12.10.6 would solve things.
I have put this upgrade off because I have no fail-over solution, and have no idea about the fail-rate of BIOS and Firmware upgrades.
Any further help and/or hints would be highly appreciated
theflash1932
9 Legend
•
16.3K Posts
0
February 26th, 2014 07:00
What is the make/model of drives you are using? Are they certified (Dell) drives? Out of date firmware can certainly lead to faults with the disks/controllers/virtual disks? You can attempt a Consistency Check on the virtual disk, but if that doesn't resolve it, the only way to fix virtual disk corruption is to wipe it out and reinitialize it.
theflash1932
9 Legend
•
16.3K Posts
0
February 26th, 2014 07:00
Because firmware updates are recommended on Dell servers, for fixes to reliability and performance issues, great care has been taken to make them safe. If you Google, yes, you will find firmware updates gone bad, but of the thousands of firmware updates I have personally performed, only once or twice have I ever experienced failed hardware because of them ... and with those, the hardware health was questionable anyway.
gaartsen
3 Posts
0
February 26th, 2014 16:00
I have raid5 VD with 4 certified ST3300657SS drives of which 1 is non-critical.
The messages for that drive are:
Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:0 Controller 0, Connector 0
the other problem VD is a raid 5 with 4 non certified ST31000528AS drives. 1 bay keeps marking the drive as FAILED
The messages for that drive are: