dmc1961

1 Rookie

•

3 Posts

0

429

November 9th, 2023 14:35

RAID1 physical drive 0:1:1 replaced drive comes up as Ready - now HD 0:1:0 has an error

I have a customer where we have had a RAID1 mirror running for a long time. Recently the second physical hard drive in the RAID 1 pair (0:1:1) failed and was replaced with a new drive. This drive did not auto-build but was flagged as "Ready".

Now the first physical drive has reported a single error at the same block location in /var/log/messages - the server is a CentOS 7 Linux server and the iDrac interface also shows the drive as having the error (attached screen shots below this text). The server is running happily (for now) so what I propose is to set the 0:1:1 drive from "Ready" to "Global Online Spare" and hopefully the array will build again with at least a new healthy drive running in 0:1:1.
The customer has purchased a second drive which I will then have them replace 0:1:0 with and hopefully it should auto-build the RAID1 and be back to running correctly again with two healthy drives. It may make this second new drive "Ready" as it did with the first replaced drive, but I am assuming I can then set the second replacement drive as Global Online Spare for this other drive?

I need to wait until a time where I have full rsync backups of the hosted servers (this CentOS 7 server is running VirtualBox for hosted Linux servers VMs and one Windows 2016 server VM), before I risk any potential problems.

I would sincerely value any input on whether this is a safe way to go. Presently a fresh build and restore of the huge amounts of 'stuff' is off the table as they don't want down-time as the site is moving premises shortly..... and we don't want to power off the box prior to move, move premises..... and then boot and have any RAID/HD boot issues at the new location.

Screen shots attached for your perusal.

Many thanks in advance.

Responses(4)

DELL-Charles R

Moderator

•

3.5K Posts

1

November 9th, 2023 21:27

Hello,

First make sure you have a validated backup.

Rebuilding in to an array that has an online pred fail condition could copy bad blocks over to the replacement drive and puncture the array.

Setting 0:1:1 to hot spare should start the rebuild

0:1:0 is showing Predictive Fail Yes and needs to be replaced also.

When we have a predictive fail drive that is still an online member of the array, there are 2 things that should be done before pulling and replacing: Using OpenManage Server Administrator (OMSA) or the PERC BIOS.

1.Consistency check:

In OMSA: Expand Storage>expand controller. Select ‘Virtual Disk’. Choose ‘Check Consistency’ from dropdown and Execute. You can view results in the LifeCycle Controller log.

2.When completed consistency check then Put drive offline before replacing:

In OMSA: Expand Storage>expand controller>expand connector>enclosure>select physical disks (or array disks). Select the drop down next to hard drive , choose offline and execute.

The rebuild should start automatically once the drive is replaced. If it does not then you can use OMSA or PERC BIOS to set the replacement drive as a hot spare that will make the rebuild begin.

Repeat the consistency check

Offline the drive for disk 0:1:0 and replace, let it rebuild and run consistency check again

You may take a review of this knowledge article:

Double Faults and Punctures in RAID Arrays

https://dell.to/3FR8pA3

DELL-Young E

Moderator

•

3.9K Posts

0

June 17th, 2024 05:53

Hello,
this is the option with PERC CLI https://dell.to/3XthGZc.
perccli /cx[/ex]/sx set offline
page 26
but much easier to just power off server and pull the drive. Because PERC BIOS it means need to power off the server anyway.

And steps to install OMSA on Centos/RHEL:
https://dell.to/3xbBYfg
Dell Linux Yum Repository

You had VD running on pred failed disk, so for PERC to rebuild it it had to puncture the RAID ,so mark good sectors ad bad on new drive just to move forward-
if you have a VD puncture you have to delete and recreate the VD.
https://dell.to/3VJhdRi
https://dell.to/3xaK8Vd (An array that is puntured will evnetually have to be deleted)
Please read these carefully.

Respectfully,

D

dmc1961

1 Rookie

•

3 Posts

0

November 10th, 2023 00:09

@Dell -Charles R
Thank you so much. I was going to do the whole procedure which you have verified and so much appreciate the extra part on handling the downed drive rather than just pulling it out. Very scary stuff when you don't do this every I.T day :-)

D

dmc1961

1 Rookie

•

3 Posts

0

June 16th, 2024 00:23

Apologies for the late response on this one. Lot's of tasks means all has been put off, but ready to tackle this one now. The issue now is the iDrac8 interface is not giving me any option in the Physical Drives except to put one of the other non-RAID drives into a RAID, there is no way to select the current Physical Disk 0:1:0 to offline? I have been reading how-tos and checking on-line but there is just to option to set this drive off-line.

View All

No Events found!

PowerEdge HDD/SCSI/RAID

RAID1 physical drive 0:1:1 replaced drive comes up as Ready - now HD 0:1:0 has an error