Unsolved

This post is more than 5 years old

3 Posts

5076

September 21st, 2017 16:00

Replace Hard Drive on PERCH710 PowerEdge R520

I have a PowerEdge R520 with 6, 600 GB SAS drives in RAID5 running as VM host for critical SP and ERP servers. Last week the server crashed. I went to the data center and found out that one of the six drives had a blinking orange light and the server was in BIOS mode unable to load the OS. Upon restarting, "Foreign Configurationmessagege popped up. I impoted the Foregon config and was able to Boot to OS. The server stayed up in this state for few hours before crashed again. I ordered a new drive and took it to the data center today. The server was again in the crashed stage. I inserted the new drive by replacing the old one and got the BIOS message about the "foreign configuration. Pressed C and another messages showed up in BIOS about the controller configuration being changed and it is an irreversible process. I left everything at that point to get some advice before moving forward. Here are two questions I have. 

1) In RAID 5, single disk failure should not take the server down. Why is my server keep crashing? I did not find anything related to the controller when I was able to boot to the OS earlier this week.

2) In which order should I swap the drive. Should lI put the old bad drive back in the system? Import the Foreign config and boot to the OS and do a hot swap? 

3) If i start the server by inserting the new drive then what should I do with the Foreign config message and the message about "you are about to change the configuration on the controller". Is this the safest route.

The whole idea behind this long message is to avoid any DATA LOSS. I have not pressed any wrong keys at this point, therefore all of my current data is intact and I can boot to the OS using the old disk which will crash in few hours. I certainly don't want to lose any data and config at this point. 

Thanks

12 Elder

 • 

6.2K Posts

September 21st, 2017 18:00

Hello

1) In RAID 5, single disk failure should not take the server down. Why is my server keep crashing? I did not find anything related to the controller when I was able to boot to the OS earlier this week.

It is possible for a drive to experience a hardware fault that can disrupt communication on the controller and knock other drives into failed states. Your issue may be a single bad drive, but you may also have other issues.

2) In which order should I swap the drive. Should lI put the old bad drive back in the system? Import the Foreign config and boot to the OS and do a hot swap?

The virtual disk can only be rebuilt from a degraded state. If it is in a foreign state then you can try importing without the old drive installed. If the import fails then you can try again with the old drive inserted.

The whole idea behind this long message is to avoid any DATA LOSS.

You should get the data backed up if it is important. Once the virtual disk is online, the data is backed up, and you have a controller/TTY log for your records you can start troubleshooting the virtual disk issue.

I would start by running a consistency check. This will verify all data and try to correct any issues. Once the consistency check completes save the hardware logs and then clear them. Once the hardware logs are cleared run diagnostics on all of the drives.

The diagnostics and past drive failures should be enough information to determine if a drive needs to be replaced. If you need to replace a drive then offline the drive and hotswap it. If a rebuild does not start automatically then set it as a hot spare to initiate a rebuild. You can find a link to storage controller manuals on our PERC page.

https://www.dell.com/perc

Thanks

3 Posts

September 21st, 2017 21:00

Thanks for your valuable reply. Here is what I have planned to do based your feedback. Let me know if this is the correct path.

I will pull/leave the 6th (bad) hard drive out and start the server. It will error out and ask me to go to the Controller settings since the system has found the Foreign Configuration. With 5 drives, I will import the Foreign Config by pressing C and "hopefully" this will boot up the system to the OS. I have done this twice this week with bad drive in the system. I am assuming import will work for 5 drives. Do I need to run clear first to just import is fine? Once in the OS I will open OMSA and there I will find the VD in the Degraded state. At this point, I will insert the new hard drive in the empty bay where bad drive used to be. This will start building the RAID? Will the VD change its state form degraded mode and become online automatically or do I have to manually do that? 

Do I run win 2008 CHKDSK for consistency check or should I use another utility tool? 

Is there a back up plan if Foreign config Impot Fails? Will the foreign config option be available if it fails for some reason?

My guy who does all of this is out on 2 weeks vacation and I am not up to date with this stuff anymore as I have moved on to the management side of things. Rebuilding the server is not something I am looking forward especially by pressing just a wrong keyboard key. I really appreciate any help on this one. 

Thanks.

3 Posts

September 22nd, 2017 08:00

Few more observatiosn this morning. 

1) Inserted the predictive failure drive back to the server. Imported the Foreign Config and was able to boot to the OS. Under OMSA, VD is still in Degraded mode. Went to the bad drive in OMSA and tried to put it "Offline" but that option is not available. Blink and Unblink are the only options. 

2) Tried to be little brave and pulled the bad drive out without taking it offline. It blue screened the server. Don't know why this happened. One of the other drives in the array which had no orange light turned orange once the bad drive was pulled out and after the server blue screened. 

Not sure how to move forward at this stage. Urgent response is highly appreciated. 

12 Elder

 • 

6.2K Posts

September 22nd, 2017 11:00

1) Inserted the predictive failure drive back to the server. Imported the Foreign Config and was able to boot to the OS. Under OMSA, VD is still in Degraded mode. Went to the bad drive in OMSA and tried to put it "Offline" but that option is not available. Blink and Unblink are the only options. 

If the drive is already in a failed state then it is offline. The only time you can offline a drive is if it is online.

2) Tried to be little brave and pulled the bad drive out without taking it offline. It blue screened the server. Don't know why this happened. One of the other drives in the array which had no orange light turned orange once the bad drive was pulled out and after the server blue screened. 

That is odd behavior. If the virtual disk went foreign again then I would import without the faulty drive inserted.

Also, I would not bother running a consistency check if all of the drives are not coming online when you import. You can run a consistency check once you get the other issues resolved. If there is no redundant data to recover with then the consistency check is not a priority. When you run a consistency check it can be started as a virtual disk task within OpenManage Server Administrator.

Regarding whether or not to use "clear" on the foreign configuration, don't use clear unless you know what you are doing. The clear options deletes the array information from the drives. It doesn't delete the data, it deletes the data that describes the array(drive members, stripe size, RAID level, etc). Once that information is deleted the controller will think the drives are blank. They will go into a ready state.

Thanks

No Events found!

Top