Unsolved
This post is more than 5 years old
3 Posts
0
22584
May 21st, 2014 20:00
PowerEdge w/ PERC 6/i - raid configuration question
I would like to apologize in advance if my situation/question could have easily been answered by better searching. I'm a little stressed and wasn't able to effectively search for exactly what I need to know.
I will describe the situation first and then mention specifically what I would like to do and need to know.
I have a PowerEdge tower server with a PERC 6/i controller and 8 drives of 1TB each, configured as a RAID-5. I've had drive failures in the past and promptly replaced them. I just had 2 drives fail at the same time.
- Shut off server.
- Replace bad drive 1 with a new one (disk 1).
- Restart.
- Mark replaced drive as hot spare.
- No chance of rebuilding.
- Shut off.
- Changed bad drive 1 back to defective disk.
- Restart.
- Foreign configuration problem on drive 1.
- Shut off.
- Replace bad drive 2 with a new one (same disk 1).
- Restart.
- Foreign configuration problem on both drives.
- Shut off.
- Changed bad drive 2 back to defective disk.
- Replace bad drive 1 with new one (same disk 1). The bad drive is making noise.
- Restart.
- Clear foreign configuration.
- Both previously bad drives are now "ready". (drive 1 is new, drive 2 is original)
- Can't rebuild.
I know there's no chance of recovering drive 1, but drive 2 doesn't appear to be actually damaged (or can operate long enough for me to copy the data). Is there any way to reconfigure the original drive 2 to be considered part of the array as the other drives? At no point did any drive or the array attempt to be rebuilt, so I'm hoping dearly that the data is still on that drive, but the RAID configuration was reset somehow on/for that drive. I recall reading that RAID configuration data is stored on the drive itself (possibly in place of the partition table) or maybe in the RAID controller.
I know it goes without saying, but I desperately need some help and hope to get my data back. Thank you kindly in advance to anyone that may be able to help.


theflash1932
11 Legend
•
16.3K Posts
0
May 21st, 2014 22:00
RAID 6 allows for two drive failures without losing data, but it is not bulletproof either, so a traditional backup solution should always be employed.
Yes, you will need 8 drives. If your original drive does not work at all - as in, does not power on, etc. - then use a blank, BUT AS SOON AS you configure RAID, you MUST force offline the "blank" disk. The longer it is online with the other disks, the greater the chances of if destroying the entire array.
If a retag does not work, then you will have no option left but to use a professional data recovery service. A retag can make things harder/worse for data recovery efforts, so if you MUST get the data back, the best chance of recovering your data is calling a data recovery professional before attempting a retag.
dellsprt
3 Posts
0
May 21st, 2014 22:00
Thank you for your detailed reply and additional advice/information. I will certainly reconfigure my setup to account for dual (maybe multiple) drive failures.
The data is important to me and others. It will take some time, but I will look into creating a secondary setup to which I can image the remaining 7 working drives before performing your suggestions and probably use it as further backup afterwards. Do you think this is enough to give a more serious recovery option a better chance at working?
Last I checked, no drives showed as foreign. In order to retag, I probably need all 8 drives. Since one doesn't work, how shall I proceed? Use a blank one in it's place?
Thank you once more.
theflash1932
11 Legend
•
16.3K Posts
0
May 21st, 2014 22:00
Here are a couple of places where this went wrong that should help the next time around:
- NEVER power off a server to replace a hot-swappable drive. If it is hot-swap, swap it hot.
- You should have imported the foreign configuration instead of clearing it. Clear only when the data is online and accessible; import only when the data is not online and accessible. Better yet, ask before doing one or the other. ("the RAID configuration was reset somehow on/for that drive" -> this is what "clearing" the config did)
- RAID 5 will allow you to lose one drive. If you have two drives fail, the array is dead and a rebuild is NOT possible. You must first bring online one of the failed drives before a rebuild of the last missing disk is possible. Obviously, under certain circumstances, a rebuild is never possible if the missing RAID disk(s) is bad or completely non-functional.
- RAID is NOT a substitute for a backup system. It should only be considered a first line of defense against downtime in the event of a drive failure.
Your ONLY hope:
1. Power down. 2. Remove "new" replacement drives and put them in a box until this is sorted out. 2. Put the original drives back in. It doesn't matter their health status - anything but the originals are completely useless. 3. Power up, boot to CTRL-R, PD MGMT ... do any drives show as Foreign?
If so, go to VD MGMT, highlight the controller, F2, and attempt to Import the foreign configuration.
If not and they show as Ready, go you will need to perform a "retag" ... this involves deleting your RAID 5 and recreating it, using the exact same settings as the original WITHOUT initializing the array during creation.
4. See if the OS boots and your data is accessible. If so, the FIRST and ONLY concern should be backing up your important data.
If you MUST get the data back, call Data Recovery before performing a retag.
dellsprt
3 Posts
0
May 22nd, 2014 13:00
This particular system had a somewhat dual role of current/backup file server. I will set up another system to act as a true backup. I've had 3 separate drives go bad on this system in the past <5 years, at different times, without losing any data. Given so many drive failures, I don't have much confidence in them and wanted to have better fault tolerance; thus the suggestion of raid-6.
Do you know whether it will make any difference if I image all the original drives to a backup before attempting any recovery? I mean, if something goes wrong, can I image them back and try again? Or better yet, try recovering/retagging using a different batch of drives copied from the originals, so that I can save the originals for data recovery if needed.
I think this should be possible since there would be no way to every recover from a controller failure otherwise. If I knew the specifications for the configuration stored on the drives themselves, I could try resetting the drive gone bad -> gone foreign -> gone ready to it's previous state of simple good and part of the array. This would allow me to either rebuild the array or backup the data, since I should be able to access it with only one truly bad drive. I'm hoping this configuration data isn't proprietary, but do you happen to know more about it?
theflash1932
11 Legend
•
16.3K Posts
0
May 22nd, 2014 13:00
It is unlikely that cloning the disks will capture the RAID metadata, making the copy worthless, but I'm not sure - I've never attempted such an operation.
The RAID configuration is stored in two places: on the controller, and on the drives. When the controller boots up, it loads its configuration and compares it to the configuration on each disk. If there are any that don't match the configuration (including timestamp), the controller marks it as foreign, and the disk(s) stays in a foreign state until the user tells the controller what to do with it. So, in the event of a controller failure, the controller is replaced, the system powered on, and the foreign configuration imported. In the event that a single disk is foreign (and the data is still accessible), that configuration must be cleared before the drive is "ready" for the controller to use it for any "ready" operation (assign hot-spare to rebuild missing disk, configure new VD, etc.).