Unsolved

This post is more than 5 years old

3 Posts

8749

December 9th, 2008 11:00

reverting to an old array of discs for raid controller.

Hello, I have a story of several things not to do, and finally a question I hope someone can answer.

I was charged with upgrading an older dell poweredge machine running linux with a raid5 array from 3 40 GB discs to 3 500 GB discs.  The hope was to start at the end of the business day and have it up and running by the next morning.  My plan was to take an image of the machine, remove the old drives, insert new drives, rebuild the array and splat the image back on.  I could then resize partitions, etc, as I saw fit.

When I got on site, turns out there are some issues with semi-modern kernels not seeing the CERC controller, so after trying all my ideas and doing a bit of reading, I realized that getting the image off of the machine was going to be quite difficult.  So, I developed plan b.  plan b was to copy the files I needed, remove the three discs, insert new ones, install a new os, build up the services I needed, and restore the data.  That actually went well, and I did have the machine back in place the following morning and in production, with a few samba hiccoughs, but no serious deal breakers.  Until I realized I had not gotten all the files I needed copied off of the old array.

I tried using mdadm on a secondary box to see if it would see the array, but it gave me no joy.  So I took the new array out, put the old array in, and tried to boot.  This was mistake number, two, I think.  When it didn't work, I put the new array back in.  Now, neither of the two sets of discs boot.  Not that I care about the new array, I can build that again, and there is nothing on it that I don't have a way to fix.  the primary objective now is to get the data off of the old array.  Now, I just have the old array in there.

So, when in trouble, find documentation.  Actually, lesson learned, before getting in trouble, find documentation, but it's too late to do it that way now.  Now that I am more familiar (though still not an expert), I realize there are other ways to accomplish what I was trying to do.  But, from what I have been able to glean so far, the CERC controller writes it's configuration in two places, nvram on the controller, and on the disc so that if the discs are put on a controller that doesn't have an nvram configuration, the controller will know how the discs are supposed to be setup.  I have removed the array, cleared the configuration, then added the array, and looked at the config.  When I view/add the config, I see that it is reporting the size that would be correct for the new array, not the old.  Yet, since I cleared the config with no discs on the controller, I can only conclude that it is recieving that data about the array from the discs.  So it would appear the old discs have been written with the config for the new discs.  All three discs report as online, but trying most things, like consistency checks, result in an error about the mismatch between the config and reality of the situation.

In checking the disks out with fdisk and parted, I see that one disc has no partition table, and two discs have 4 partitions, the first of which on both discs seems to be a small proprietary partition, presumably what the config is written on.  This would have me believe at least two of the discs still contain the data I am after.  I have no particular idea why the one disc reports no partition table, it has suffered nothing that the other two discs haven't also suffered, but even if it is wiped for some reason, having the other two should still provide enough to get the data if I can figure a way to get the controller to use a configuration that will work.

I am still looking for more info, but so far I have not found a way to clear the config on those discs without wiping the data.  What I am hoping is that someone has a trick for me to use to get the controller to either reconfigure itself without wiping the data, or a way for me to manually reconfigure the controller such that it has the correct parameters for the old array instead of the new.  Or, failing that, if there is a recommended method for recovering the data using something other than the controller....

The particular controller in question is a CERC ata/100.

Any theories/suggestions would be most welcome...

4 Operator

 • 

1.8K Posts

December 9th, 2008 16:00

First I only use SCSI or SATA raid adapters, no experience with the CERC, but according to the link below, it has the same bios setup as other LSI based adapters....

http://www.thegeekstuff.com/2008/07/step-by-step-guide-to-configure-hardware-raid-on-dell-servers-with-screenshots/ 

Under no circumstances do you want to answer any prompt to initialize with a YES, if you do it is all over. If you do not initialize, data will remain on the disks

Do NOT do any more consistency checks.

Any possibility someone documented the raid configuration or save a raid configuration file? If you can duplicate the same setting, saving the config, without initializing the raid should resurrect, done this multiple times.

Hope you have maintained the order of the disk and as to the slots they originally came from. Seems the CERC supports drive roaming, but I am not sure it does so until the configuration is set.  Have you maintained the disk order?

If you clear the configuration, with the drives in the slots they originally came from, then try to duplicated the setup from scratch, selecting all the disks, selecting the raid 5 as the raid type, save the configuration, (again do NOT initialize)( at this time WB or WT, or read/I/O choices are not important), remember to make sure your machine bios is set to boot from the raid card . Restart the server. If this does not work, again it will not destroy the data,  any config mismatch, select go with the bios setup not the on disk.. If you have not maintained the original disk order, switch disk order, try it all over again, 3 disks do not give you that many possibilities .   

Next step

http://www.runtime.org/raid.htm

Anyone else, feel free to jump in, as I have not been in this situation in many years.

 

3 Posts

December 9th, 2008 21:00

no initializing, I got that...

and I haven't done a consistency check since, thanks for that.

what I know about the raid before is that it was raid5.  I also did maintain the disc order.  I knew that was important.

I tried as you suggested, the config now reports correctly, thank you for that.  and you are correct that it does support roaming.  When I boot, it reports that it cannot find an operating system, and changing the disc order tells me that there is a roaming firmware update.

So I booted into rescue mode from the debian disc that I know has the correct drivers for the raid controller.  fdisk in that environment does detect the raid controller as /dev/sda, but claims there are no partitions on it.  It also claims that the geometry is unusual, larger than normal.  If I do a 'more /dev/sda' though, I do get the appropriate pooploads of garbage I expect to see.  I am at least feeling a little more confident that the data is retrievable.

I downloaded the demo of runtime, according to their documentation, I did not see what I needed to after the analyzing process, so I did not proceed further.  If I do not succeed, I will try their next level of raidprobe.

I am considering that even though there is no parition reported on the /dev/sda device, the data does appear to be there.  maybe I can dd it from the array into an existing partition...

I really appreciate your comments, thank you very much.  If I figure a trick, I will post back...

4 Operator

 • 

1.8K Posts

December 10th, 2008 07:00

Also do not do anything equal to Windows chkdsk

" I also did maintain the disc order."  One less worry... most techs in crisis do not..I didn't on my first raid 20 years ago.

Sounds like the partition info is messed up (OS level).  In the raid bios setup, boot selection, is the correct info set there, sounds like your OK here, but check

Wondering if the CC damaged OS partition info, guessing here.  If you only had one partition on the array, likely you could manually recreate, but if more, near impossible to get it manually.

You may need to recover the files with a raid file recovery program. Any wild chance did you print out any system reports, in the past, which might have disk info contained ?

There are other raid utils out there, search   raid recovery   . As long as they work in "read only" mode your safe.

Let us know how you make out.

See if you can find a forum, specializing in recovery.

http://www.data-recovery-software.net/

You might call Acronis

 

3 Posts

June 12th, 2009 12:00

I ended up using a program called pyflag to do data recovery.  I didn't get all my stuff, but I got some.  I never did recover the full array, though...

4 Operator

 • 

1.8K Posts

June 12th, 2009 15:00

Glad you obtained some of the data back...

In the future, always document the raid setup, and get a configuration file backup, such as you can get with Lsilogic's raid management software. In most of the upgrades I now do, I clone the original array to a stand alone drive hanging off a standard SCSI or SATA interface, I make the clone drive bootable, and start it up to make sure I have a working clone before  any upgrades; I just just hate nightmare situations and >24 hour days. I feel for you, my last nightmare was about 5 years ago...a drive flash went bad, taking the arrays with it ,along with a few SQL databases  which resulted in a 48 hour straight "day" over a Thanksgiving weekend to have everything working by Monday... never again do I do that if I can possibly prevent it. 

 

No Events found!

Top