1 Rookie

 • 

14 Posts

3558

September 14th, 2020 07:00

Dell PowerEdge R720 HD Failure - NEED HELP

Hello everyone. First post so I apologize in advance for any etiquette mistakes.

I'm new to this job and new to server administration. I noticed a flashing light going from green to amber on one of the HD's of my PowerEdge R720. We are under warranty so a new drive is in the mail right now. Once I receive it, physically replacing it seems easy enough. Its what I have to/should do before/after the replacement that I'm lost on. My predecessor, who retired after 14 years, did not keep a record of setup/maintenance or anything about these servers. We use VSphere for VMs and based on the iDRAC, it looks like those may be on on these servers. We also have an EqualLogic device that houses all the data.

I was able to get a hold of my predecessor and he said that it may be setup as a redundant RAID and that the new drive might "rebuild itself" once installed. Seems like a risk to just assume he's right, but maybe it's that easy.

What info can I provide to you guys to help determine what needs to be done. I've included a couple screenshots of the iDRAC UI but am unsure if this is the detail you need. 

1.png

 

2.png

 

I will be watching this thread like a hawk so I will get you the info you need ASAP. Thanks in advance.

Moderator

 • 

4.7K Posts

September 16th, 2020 07:00

Hello jhines,

 

The Consistency Check and Offline process can be done in the controller BIOS or if you have OpenManage Server Administrator (OMSA) installed in the host.

 

To get into the controller during POST press when the controller initializes. Check legend at the bottom for navigation keys and press F2 when on selected item to bring up sub menu.

 

 

If you have OMSA installed view  this link

PERC - How to perform a 'Check Consistency' using OpenManage Server Administrator : https://dell.to/3msxHda

 

Documentation: Dell EMC Server Administrator Storage Management 9.2 User’s Guide : https://dell.to/3kmB7Mp

Page 129 Performing a Check Consistency

Page 114 Setting the physical disk online or offline


Please let me know how it goes.

Moderator

 • 

4.7K Posts

September 14th, 2020 12:00

Hello jhines,

 

I'm sorry to see you have a predictive fail hard drive. Good to see you have a drive on the way.

 

Yes the drive should rebuild fine once replaced. Of course it is always recommended to make sure you have a backup.

 

What I'm seeing from the screen shot is PD 0:1:0 is a pred fail drive that is still an online member of the array.

 

Best practice: Run consistency check on the array, After it completes then Offline the  0:1:0  drive before pulling.

You do have a RAID 1, install replacement and rebuild should start automatically.

If for some reason the rebuild does not start automatically you can assign the replacement drive as a hot spare (global hot spare is fine) and it will start rebuilding.

 

 

Reference:

https://dell.to/35C4c2f

Page 44 Running a Data Consistency Check

Page 62 Physical Disk Management Menu

     Select Physical Disk Operations—Selects and execute physical disk operations such as force offline

 

Please let me know how it goes.

1 Rookie

 • 

14 Posts

September 16th, 2020 07:00

Hi Charles,

Thanks so much for the replay. Due to COVID, I am only in the office one day a week, which is today, so I am looking to get this done today.

What utility should i be using to run this consistency check and is that utility the same one I use to take the drive offline? Right now, the only way i access anything is through the iDRAC interface.

Moderator

 • 

4.7K Posts

September 16th, 2020 08:00

Hello jhines,

 

Yes you are correct. Prepare the host for a normal shut down (shut down vm's) then when you reboot and enter controller BIOS during post you can do your consistency check and after it completes offline drive 0:1:0 for replacement.  You should see the rebuild start automatically when replacement is installed.

1 Rookie

 • 

14 Posts

September 16th, 2020 08:00

Hi Charles,

I don't know if my predecessor installed OSMA on the server and am not sure how to check to see if that's the case. 

How would I access the BIOS?

We do have a display/keyboard hooked up to the server which I took a photo of below. Is this helpful? Apologies for my ignorance here. 

 

Screen Shot 2020-09-16 at 11.14.50 AM.png

 

 

1 Rookie

 • 

14 Posts

September 16th, 2020 08:00

Ok I see. That makes sense.

Am I right in assuming that all VMs through VSphere that are currently running on that machine will go down until the drive is replaced and the server is booted back up?

If thats the case, should I shut those down virtually first?

Moderator

 • 

4.7K Posts

September 16th, 2020 08:00

Hello jhines,

 

Thank you for the update. No apology necessary I'm here to help if I can.

 

From the image it looks like the server is still booted.

 

Reboot the server and during POST when the H710 PERC Controller initializes press to enter the controller BIOS.
Reference the link on my first post to run consistency check and offline the drive.

 

 

1 Rookie

 • 

14 Posts

September 16th, 2020 08:00

Perfect. Thank you! I will have to wait until the end of the day then. Then I'll reply back with the results.

I read some other forums about this being hot swappable? Is that true? 

Is what you're recommended just a better practice than yanking out the failing drive and installing the new one?

Moderator

 • 

4.7K Posts

September 16th, 2020 09:00

That is correct. After consistency check and offlining the drive you pull it and replace (we are not powering down for this process = hot swap).


The steps I recommend are best practice when you have an predictive fail drive that is still online member of the array, as you have.

1 Rookie

 • 

14 Posts

September 16th, 2020 10:00

Thanks Charles. I will be starting in about 20 minutes and I will report back. 

Moderator

 • 

4.7K Posts

September 16th, 2020 11:00

Hello jhines,

 

I see you are in the System BIOS. You need to be in the PERC Controller BIOS

 

See image 1 on this link : https://dell.to/3mvBJkI
Last line is when the controller initializes and this is when you press

1 Rookie

 • 

14 Posts

September 16th, 2020 11:00

@Dell -Charles R I need ya pal! If you're available, please.

1 Rookie

 • 

14 Posts

September 16th, 2020 11:00

HELP!

The server was shutdown and has since rebooted. I am in the BIOS screen (held CTRL+R and then hit F2 to enter System Setup) but I cannot find where to run a consistency check nor where i can take a drive offline. Please see screenshot below.

Screen Shot 2020-09-16 at 2.36.53 PM.png

What do I do? 

1 Rookie

 • 

14 Posts

September 16th, 2020 11:00

@Dell -Charles R Ok I think I'm back on track. I had to click Finish in the bottom right of the screen above and it brought me back a menu and i found the Controller Bios settings, which led me to the Virtual Disk Management area and I found Check Consistency. It is running now.

I poked around a bit more and found where to Force Offline a drive. I will do that once the check is done.

My question now is, where will I see that it is rebuilding?

1 Rookie

 • 

14 Posts

September 16th, 2020 12:00

@Dell -Charles R I think I'm on the home stretch. As you can see below, the drive is rebuilding after reinstalling it. I didnt have to force it online. It just did it on its own.

Screen Shot 2020-09-16 at 3.17.37 PM.png

 Once the rebuild is complete, am I good to go to boot up my VMs and start testing functionality? Or is there something else to do?

No Events found!

Top