Start a Conversation

Unsolved

This post is more than 5 years old

T

2120

November 17th, 2017 16:00

Predictive Failure on RAID drive in a PowerEdge 320

Currently one of our Dell servers (PowerEdge 320)is showing a predictive failure indicated by flashing green and amber LED’s on the disk, as well as showing a S.M.A.R.T alert.  The server is running esxi 5.1 and has 3 VM's running that I can't shutdown. What is the correct procedure for replacing the failing drive?  I was thinking of using MegaCli to do the hard drive swap.  Does Dell recommend another procedure to not disrupt server operation?  I should also mention there is no IDRAC port on this server.

I was going to use the following commands in MegaCli

Set the drive offline, if it is not already offline due to an error

MegaCli -PDOffline -PhysDrv [E:S] -aN

Mark the drive as missing

MegaCli -PDMarkMissing -PhysDrv [E:S] -aN

Prepare drive for removal

MegaCli -PDPrpRmv -PhysDrv [E:S] -aN

Change/replace the drive

If you’re using hot spares then the replaced drive should become your new hot spare drive:

MegaCli -PDHSP -Set -PhysDrv [E:S] -aN

In case you’re not working with hot spares, you must re-add the new drive to your RAID virtual drive and start the rebuilding process

MegaCli -PdReplaceMissing -PhysDrv [E:S] -ArrayN -rowN -aN
MegaCli -PDRbld -Start -PhysDrv [E:S] –aN

Do I have any other options?

 

November 20th, 2017 13:00

Thanks for the Reply,

I ended up using MegaCLI because I couldn't restart or reboot the server, OpenManage Server Administrator requires that I do a reboot of the server to take effect.

I used the following link to figure out how replace the drive using MegaCli commands

https://supportforums.cisco.com/t5/security-documents/megacli-common-commands-and-procedures/ta-p/3114544

Moderator

 • 

6.2K Posts

November 20th, 2017 13:00

Hello

I would recommend that you use our tools. OpenManage Server Administrator can be installed on the host and managed from a remote management station. You can use megaCLI if you like, but it is not a tool we support.

I'm assuming that you are using a RAID level that has some redundancy and that replacing the drive will not take the virtual disk offline. The recommended steps are:

  1. Get a valid tested backup of any important data
  2. Run a consistency check
  3. Offline the predictive failure drive
  4. Hot swap the predictive failure drive with the replacement if you have a backplane. If you have a cabled drive configuration that does not support hot swap then shut down the system and swap the drives.
  5. A rebuild should start automatically, it can take several minutes for management and monitoring programs to properly display a rebuild in progress. If a rebuild does not start automatically then you can set the replacement drive as a global hot spare to initiate a rebuild.

If your controller does not support one of the features listed then skip that step. The H710 should support all of the steps outlined.

Thanks

9 Legend

 • 

16.3K Posts

November 20th, 2017 13:00

OpenManage Server Administrator requires that I do a reboot of the server to take effect

It should not, and if it does, that is not typical and would have something to do with your server's configuration. OMSA hasn't routinely needed a reboot on installation for about a decade.

No Events found!

Top