Roveer
1 Nickel

Proper way to replace a failed drive on a H710 array controller

I have a H710 array controller in a T320 server.

Boot is set up as Raid 1 with 2 drives mirrored.

Data is set up as Raid 10 with 6 drives.

Last week one of the Boot partition Raid 1 drives failed (drive 0).  The server continues to operate in a degraded fashion.  I have a replacement drive arriving today.

I want to know the correct method to replace this drive without screwing up the second drive.  Obviousely I have to install the new drive in the first slot, but what do I need to do with the controller in order to re-establish the raid 1 without losing the array?

Thanks.

0 Kudos
9 Replies
Moderator
Moderator

RE: Proper way to replace a failed drive on a H710 array controller

Hello

If the drives are in a backplane then they are hot plug. If the drive is in and offline/failed state then you just need to pull it out, put the replacement drive in the carrier, and then insert the replacement drive. The controller should initiate a rebuild automatically. If it does not initiate a rebuild automatically then set the replacement drive as a hot spare to initiate the rebuild. I would wait 10-15 minutes to make sure the rebuild has not started, the status may take a few minutes to update in OpenManage Server Administrator.

You can read more about replacement procedures in the PERC manual:

www.dell.com/support/home/product-support/product/poweredge-rc-h710/manuals

Thanks

Daniel Mysinger
Dell EMC, Enterprise Engineer

Get support on Twitter @DellCaresPRO

0 Kudos
Roveer
1 Nickel

RE: Proper way to replace a failed drive on a H710 array controller

I received the new drive today.  I put it in the enclosure.  It started flashing, the Orange LCD display turned back to blue, the orange light on the back of the box turned blue.

In idrac it still does not show the drive installed on physical disks, and the logical disk still shows degraded.

What more should I be doing?

---update---

I rebooted into h710 setup and it shows the replaced drive and everything looks fine.

I then rebooted and let the OS load up and then went back to idrac.  The LD shows online (no longer degraded), but the PD still does not show the drive.  Not what I was expecting...

0 Kudos
Moderator
Moderator

RE: Proper way to replace a failed drive on a H710 array controller

the PD still does not show the drive

Are you saying the replacement disk does not show in the physical disk list? When you view the virtual disk details does it display the disk there?

Thanks

Daniel Mysinger
Dell EMC, Enterprise Engineer

Get support on Twitter @DellCaresPRO

0 Kudos
Roveer
1 Nickel

RE: Proper way to replace a failed drive on a H710 array controller

from idrac it does NOT show up in the pd section, yet in the array controller cntr-r at boot, it shows up there, and on the ld it shows as on-line not degraded like it used to.

0 Kudos
Moderator
Moderator

RE: Proper way to replace a failed drive on a H710 array controller

It looks like an OMSA reporting issue. It is not properly updating the inventory. This is usually resolved by restarting the system, but that doesn't appear to be working.

I would suggest deleting the inventory. If you go to the installation location for OMSA you can delete the inventory xml file(s). By default it would be in C:\Program Files\Dell\SysMgt\oma\log\

Inside that log folder you should see a "cachecfg" file and an inventory.xml.1 file or something similar. Delete the cache file and the inventory file and restart the system. A new inventory will be created. If the issue persists I would suggest uninstalling OMSA, restarting, reinstall OMSA, and then restart again.

Thanks

Daniel Mysinger
Dell EMC, Enterprise Engineer

Get support on Twitter @DellCaresPRO

0 Kudos
Roveer
1 Nickel

RE: Proper way to replace a failed drive on a H710 array controller

Seems like I might have bigger problems.  On Tuesday I did a shutdown (power off), removed the power cord for a minute, restarted.  Since then when going into the storage section of idrac I get an error RAC0501.  Also notice the recent log entries are from 2016.  That seems very strange.  This is the first time I'm seeing this.  on Monday I could see all the storage options, since the power off I can't.  Any ideas what I can do to resolve this?  I haven't had any time to research the error yet.

----update----

I looked around a bit and see others have had the same problem.  Seems like the way to fix is to do a idrac reset, power off, drain flee power and restart.  I vaguely remember having to do this once before.  I'll give it a try and report back.

0 Kudos
Roveer
1 Nickel

RE: Proper way to replace a failed drive on a H710 array controller

Here's an update to my saga.

Today I tried "resetting idrac" from the idrac main menu.  Then powered down, removed power, held power button to drain flee power and reboot.  Same result, still getting the errors.  I then decided to power off, move the h710 to another slot and restart.  Initially it didn't seem to make any difference, still getting errors.  I then rebooted, went into raid setup (cntrl-r) exited, rebooted and let it sit a bit.  Now I am once again seeing all the drive parameters and no longer getting and RAC errors.

I'm in the process of putting the raid controller back in it's original slot and will see what happens next.

Not sure what caused this problem or what exactly fixed this problem, but It seems that I have a workable setup again.  Very strange.

---update---

I moved the H170 back to slot 6, did a reboot, went into array setup, rebooted again No Joy.

The idrac inventory shows array controller moving from slot 4 to slot 6, all the drives show up as well as the enclosure (in system inventory in idrac), but going to storage in idrac still shows RAC0501 and RAC0503.

I moved the controller back to slot 4 and rebooted and now all the stastics show up under the storage section.

So either I have a problem with slot 6 (which has had the controller for over a year), or something is corrupted in data held for this device.  Is there something I can clear to get it to forget anything it might have known about what was in slot 6?  That's where I'd really like the H710 controller to be since its away from the other controllers and does generate a fair amount of heat.

0 Kudos
ZET_9
1 Copper

RE: Proper way to replace a failed drive on a H710 array controller

Hello!

I will describe the situation.
I am the new system administrator in the company and for the first time I am facing Dell servers.
As a legacy from a former system administrator, I got a couple of Dell PowerEdge servers.
One of them is Dell PowerEdge R620 (without LCD display) with PERC H710.
It has a RAID 1 with 2 hard disks and one of the disks in the failed state (the amber LED lights up) and the temperature indicator lights up, but this is the second question.

Not installed OMSA and not cautious iDRAC.

I need to know the Part Number to order a new disk, but the documents to the server are lost and I do not know how to find out the number without restarting the server?
I'm afraid to reboot because when rebooting Rebuild can start and the second hard disk could fail from the load, can this happen or is Rebuild not running itself until the disk is Failed and replaced with a new one?

According to the tool ..

Physical disk failure detection
Failed physical disks are detected and rebuilds automatically start to new disks that are inserted into the
same slot. Automatic rebuilds can also occur with hot spares. If you have configured hot spares, the
controllers automatically.

Performing A Manual Rebuild Of An Individual Physical Disk
Use the following procedures to rebuild one.
1. Press <Ctrl> <N> to access the PD Mgmt screen.
A list of physical disks is displayed. The status of each disk is displayed under the heading State.
2. Press the down-arrow key to highlight a physical disk that has a failed state.
3. Press <F2> to display a menu of available actions.
The Rebuild option is highlighted at the top of the menu.
4. Press the right-arrow key to display the rebuild options and select Start.
5. After you start the rebuild, press <Esc> to display the previous menu.
NOTE: You can also use the VD Mgmt screen to perform a manual rebuild. Use the arrow key
to highlight a physical disk, and press <F2>. In the menu that is displayed, select the Rebuild
option.

I want to make sure again the correctness of the actions.
1. I enter the utility <Ctrl> <R> then in <Ctrl> <N>.
2. If the drive is in the offline and the failed drive, and then insert the replacement drive. The controller should initiate a rebuild automatically. Wait 10-15 minutes.
3. If it does not initiate a rebuild automatically, then do the rebuild manually as written in the instruction.
4. If it does not guarantee a rebuild automatically then set the replacement drive as a hot spare to initiate the rebuild.

Help me please!

0 Kudos
Highlighted
PdxITGirl
1 Copper

RE: Proper way to replace a failed drive on a H710 array con


@ZET_9 wrote:

Hello!

I will describe the situation.

<cut>

Help me please!


You probably won't get a reply as you hijacked someone else's thread. Best to create a new thread with your issue, as it is unique to you and not relevant to the original poster.