Start a Conversation

Unsolved

This post is more than 5 years old

A

19753

June 4th, 2009 09:00

Failed Drive and RAID Redundancy on PowerVault 745N

Hello,

We've got a PowerVault 745N with 4*250G disks running as a live system. Disk 0 failed but I was able to replace it with a spare. It automatically started rebuilding the two arrays and I left it going overnight.

There's a 10GB OS Virtual Disk and a 688GB Data one. It managed to successfully rebuild the OS Virtual Disk in about 20 minutes but seems to have failed on the Data one about 8 hours later.

The failure event in OpenManage Array Manager is simply "CERC SATA1.5/6ch Controller 0, Virtual Disk (DATA 1 1) rebuild failed." and then another to tell me that it's no longer redundant.

The DATA drive is now showing up with a status of "Failed Redundancy". I've tried asking it to "Check Consistency", it warns about making config changes at the same time, but as soon as I press yes to continue it fails with an error:  "Array Manager: Operation failed."

I've also tried asking the command line interface to do it which also fails:

C:\>amcli /c2

Dell OpenManage Array Manager Command Line Interface. Version 3.7.0
Copyright 2002 Dell Computer Corporation. All rights reserved.


Error: A consistency check cannot be performed on the virtual disk

List of virtual disks on the system

No.     Name                     RAID Level     Redundant      State

1.      OS 0                     RAID 5         Yes            READY

2.      DATA 1 1                 RAID 5         Yes            FAILED_REDUNDANCY

Again the error message doesn't tell me what's wrong or how to fix it, just that it can't do what I want it to.

 

What can I do? How can I get it to rebuild this drive?

Thanks

9.3K Posts

June 4th, 2009 11:00

You may have some corrupted raid data on the data volume. When this happens, the rebuild fails at the point that it encounters this corruption.

 

If the original drive hadn't failed yet, a consistency check could have caught this and fixed it as the data had redundancy, so any corrupted data could have been re-created using the raid 5's native redundancy.

 

Odds are that the only way to 'fix' this, is to copy all data off of the data-volume, then delete the data volume, recreate (let it initialize), and copy your data back. Then, to prevent this from happening on a next drive failure, devise a schedule to do the consistency check every month or so.

847 Posts

June 4th, 2009 16:00

You have to watch the real drive size as well.  Often older drives sent out by manuafacturers have more sectors mapped out and the space can be slightly less.  A little less on the same model drive and it won't rebuild back into the raid.

 

Not sure, but just an additional thought on it.

5 Posts

June 5th, 2009 08:00

It appears to be more like an application error saying that it can't even start the rebuild rather than an error saying that it started the check and the check failed. Or do you mean in the data where the RAID controller stores the RAID definitions? Though presumably it either stores that on the controller itself or on the disks outside of the arrays we create?

Also, there are no more "rebuild started" events logged to suggest that it has tried to start again.

My understanding was that with a missing disk it just wouldn't know which parts of the array were corrupt or not and should be able to recalculate the parity or missing data from what remained? Obviously if one of the remaining disks has corruption on it then the parity would be computed based on that and we'll have to restore those files from previous backups.

As the Data volume appears to be fine accessing it from windows, it should just be able to use the data to recreate the failed disk, no?

 

Odds are that the only way to 'fix' this, is to copy all data off of the data-volume, then delete the data volume, recreate (let it initialize), and copy your data back. Then, to prevent this from happening on a next drive failure, devise a schedule to do the consistency check every month or so.

This sounds like it's our only option at the moment, which is dissapointing as it misses the whole point of having hot swappable drives. I'd like to avoid doing this if possible because it would mean a number of hours of downtime.

 

JOHNADCO:

The replacement disk is a different make and model from the others but the Array Manager reports it as being slightly bigger than the others leaving 32.14 MB unallocated as opposed to 145 KB. Does that mean it's ok in this respect, or do the values reported by Array Manager mean something else?

847 Posts

June 8th, 2009 14:00

Should be fine.

 

Definetly stinks when a raid won't rebuild.  Sounds like it's still running though?  get that data off and rebuild the affected raid.

5 Posts

June 8th, 2009 14:00

Cheers.

I deleted the array a couple of hours ago and it's rebuilding it now. I made sure to leave some space at the end just in case. We weren't using anywhere near the whole volume so that's not a problem and if it saves having to do this again, it'll be worth it.

Looks like it's going to keep going to take another few hours though unfortunately.

Do you know if I can start using it whilst it's still rebuilding? It says it's resynching/not redundant but it appears as an online, unallocated basic disk in disk management and it looks like it will let me create a partition on it. I can't find any information online about whether that's ok or if it's a REALLY BAD idea.

Would mean I could start the backup now though rather than waiting till 3am! :(

5 Posts

June 9th, 2009 03:00

Well I was good and waited.

It didn't pay off because it stopped before it finished the rebuild again. This time the whole of disk 0 has gone offline so we're back to stage 0 and the C: drive is also degraded now too.

5 Posts

June 11th, 2009 02:00

After swapping in a different brand new disk, it rebuilt both the arrays and is all running fine again.

847 Posts

June 11th, 2009 09:00

One bad apple spoiled that whole bunch?   It does happen.    I'll bet your decently irritated about now with it?   ")

No Events found!

Top