Start a Conversation

Solved!

Go to Solution

95

November 9th, 2023 14:35

RAID1 physical drive 0:1:1 replaced drive comes up as Ready - now HD 0:1:0 has an error

I have a customer where we have had a RAID1 mirror running for a long time. Recently the second physical hard drive in the RAID 1 pair (0:1:1) failed and was replaced with a new drive. This drive did not auto-build but was flagged as "Ready".

Now the first physical drive has reported a single error at the same block location in /var/log/messages - the server is a CentOS 7 Linux server and the iDrac interface also  shows the drive as having the error (attached screen shots below this text). The server is running happily (for now) so what I propose is to set the 0:1:1 drive from "Ready" to "Global Online Spare" and hopefully the array will build again with at least a new healthy drive running in 0:1:1.
The customer has purchased a second drive which I will then have them replace 0:1:0 with and hopefully it should auto-build the RAID1 and be back to running correctly again with two healthy drives. It may make this second new drive "Ready" as it did with the first replaced drive, but I am assuming I can then set the second replacement drive as Global Online Spare for this other drive?

I need to wait until a time where I have full rsync backups of the hosted servers (this CentOS 7 server is running VirtualBox for hosted Linux servers VMs and one Windows 2016 server VM), before I risk any potential problems.

I would  sincerely value any input on whether this is a safe way to go. Presently a fresh build and restore of the huge amounts of 'stuff' is off the table as they don't want down-time as the site is moving premises shortly..... and we don't want to power off the box prior to move, move premises..... and then boot and have any RAID/HD boot issues at the new location.

Screen shots attached for your perusal.

Many thanks in advance.

Moderator

 • 

3.1K Posts

November 9th, 2023 21:27

Hello,

 

First make sure you have a validated backup.

 

Rebuilding in to an array that has an online pred fail condition could copy bad blocks over to the replacement drive and puncture the array.

 

Setting 0:1:1 to hot spare should start the rebuild

0:1:0 is showing Predictive Fail Yes and needs to be replaced also.

 

When we have a predictive fail drive that is still an online member of the array, there are 2 things that should be done before pulling and replacing: Using OpenManage Server Administrator (OMSA) or the PERC BIOS.

 

1.Consistency check:

In OMSA: Expand Storage>expand controller. Select ‘Virtual Disk’. Choose ‘Check Consistency’ from dropdown and Execute. You can view results in the LifeCycle Controller log.

 

2.When completed consistency check then Put drive offline before replacing:

In OMSA: Expand Storage>expand controller>expand connector>enclosure>select physical disks (or array disks). Select the drop down next to hard drive , choose offline and execute.

 

The rebuild should start automatically once the drive is replaced. If it does not then you can use OMSA or PERC BIOS to set the replacement drive as a hot spare that will make the rebuild begin.

 

Repeat the consistency check

 

Offline the drive for disk 0:1:0 and replace, let it rebuild and run consistency check again

 

You may take a review of this knowledge article:

Double Faults and Punctures in RAID Arrays

https://dell.to/3FR8pA3

 

2 Posts

November 10th, 2023 00:09

@Dell -Charles R​ 
Thank you so much. I was going to do the whole procedure which you have verified and so much appreciate the extra part on handling the downed drive rather than just pulling it out. Very scary stuff when you don't do this every I.T day :-)

No Events found!

Top