I'm trying to work out the correct procedure to replace a drive in a mirror that has had a SMART failure prediction. The drive is still running and the array is still showing as optimal but the disk is showing as "failure predicted."
The server is an R720 with embedded PERC H710P Mini. It's running Linux and we have Dell OMSA and MegaCli64 installed. I need to do this without downtime.
We have 2x 900GB disks in a RAID-1 volume (virtual disk 0) and 7x 600GB in RAID-6 (virtual disk 1). We are not using hot spares. The problem disk is the second disk in the RAID-1 array - 0:1:1 (or [32:1 according to MegaCli).
I've found a walkthrough guide but I'm not sure if this is the correct procedure and I can't afford to "test" it or get it wrong:
Walkthrough: Change/replace a drive
Set the drive offline, if it is not already offline due to an error
MegaCli64 -PDOffline -PhysDrv [32:1] -a0
Mark the drive as missing
MegaCli64 -PDMarkMissing -PhysDrv [32:1] -a0
Prepare drive for removal
MegaCli64 -PDPrpRmv -PhysDrv [32:1] -a0
Change/replace the drive
Pull out the problem disk and insert replacement disk.
Re-add the new drive to your RAID virtual drive and start the rebuilding
MegaCli -PdReplaceMissing -PhysDrv [32:1] -Array0 -row1 -a0
MegaCli -PDRbld -Start -PhysDrv [32:1] -a0
Some notes/questions about this:
1) Auto-rebuild is enabled on the controller. Are any of the above steps unnecessary?
2) Any idea how long this might take? I know it's highly dependant on the rebuild rate setting and the amount of activity on the server but I'm just after some real world examples on similar hardware.
3) With the -PDReplaceMissing command, I've based the values of -ArrayN and -rowN on the following output (cut to show the relevant disk):
# ./MegaCli64 -CfgDsply -aALL
Product Name: PERC H710P Mini
Number of DISK GROUPS: 2
DISK GROUP: 0
Number of Spans: 1
Span Reference: 0x00
Number of PDs: 2
Number of VDs: 1
Physical Disk: 1
Enclosure Device ID: 32
Slot Number: 1
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: 1
Device Id: 1
From what I've read, the -ArrayN number is the Span Reference from the above output, but examples I've seen show a '0' in the command rather than the 0x00 that I see in my output.
Thanks for your assistance,
You don’t need to do all of those steps. Offlining the drive, MegaCli64 -PDOffline -PhysDrv [32:1] -a0
Is the only step that you need to do before swapping the drive. When you put the new drive in it should rebuild automatically.