Start a Conversation

Unsolved

This post is more than 5 years old

45774

May 6th, 2014 16:00

R710 Disk Failure

Hello, I have a single disk in a raid 1 that is blinking green, amber, off. I didn't purchase of setup the machine so I'm only going off what I can see. Those two bays have two 500GB SATA hot swap drives according to the Dell invoice I found. That storage in VMware ESXi says just under 500GB so I am assuming they are RAID-1 configuration although I con't confirm without rebooting the machine. Signifying disk is in predicted failure. The virtual machines on this array is backed up and replicated to another machine. It houses our domain controller, so having issues with that could be a huge problem and having to fail it over to another host could cause issues.

I've ordered a replacement from Dell, even though the machine is off warranty. Should I try pulling the drive, waiting 10 seconds and re-seeding it to see if it rebuilds? I've heard shutting the machine down could be fatal so I'm not too keen to do so. I also won't have the replacement for a few business days so my time is limited on options.

9 Legend

 • 

16.3K Posts

May 6th, 2014 19:00

I've heard shutting the machine down could be fatal so I'm not too keen to do so.

Why?  I mean, there is no real need to shutdown the machine at this point, but why would it be "fatal"?

Should I try pulling the drive, waiting 10 seconds and re-seeding it to see if it rebuilds?

You could ... whether or not it rebuilds depends on why it went offline in the first place.  If it was just a temporary or firmware glitch, then it would rebuild and be happy, but if the drive is bad there is no point in rebuilding it - even if it does rebuild, chances of it failing again are high.  To test the drive, you would need to reboot the system to the bootable diagnostics (or test it in another system).  If you want to try to rebuild it, just remove it (for up to 30 or even 60 seconds - to allow the controller to acknowledge it has been removed), then put it back in to see if it rebuilds.

May 6th, 2014 19:00

Do I even bother trying to rebuild the drive then or leave it and wait for the new one to arrive? Dell was saying May 12th however I told the representative to rush it if at all possible, money not  an issue.

9 Legend

 • 

16.3K Posts

May 6th, 2014 19:00

It depends on too many unknowns to just say "yes" or "no" ... it's up to you.  If it rebuilds, and lasts for a few days, it's that many days that your array is not susceptible to data corruption.  If it doesn't rebuild, no big deal - it would have sat in an offline state for a week anyway.

May 7th, 2014 14:00

I'm going to leave it as is, wait for the new one to arrive. I had changed our replication schedule to twice a day, keep current MAC of all VM's on that array so I can quickly recover those VM's if those disks fail (theoretically speaking)

4 Operator

 • 

1.8K Posts

May 7th, 2014 16:00

 "Should I try pulling the drive, waiting 10 seconds and re-seeding it to see if it rebuilds" We only wish we could re-seed a drive

If you work with raids a great deal, there are times you know you have a chance of resurrecting a drive, but most of the time when a drive fails or is logged as in predictive failure, the drive needs replacement.

Every time you cause an array to rebuild, there is a greater then usual chance a second disk in an array will fail, permanently killing an array. So if you want to attempt resurrecting a failed/failing disk, your  adding  extra risk to an array; the larger the drives or array size, the greater the risk. I have seen a very few disks truly resurrect, most flat line again within hours or a few days. Best to have a hot spare in place, and get a new drive as soon as possible.

As soon as possible does not mean you order a drive 4 days after it fails. I had a client order a drive 4 days after failure ( 4 days after I told them to order a drive, and they insisted they wanted to place the order), drive was not shipped out fast, came in on the 10th day, one day after a second drive failed. Some people bring on their own failures.

 

No Events found!

Top