Start a Conversation

Unsolved

T

10 Posts

2404

April 11th, 2022 08:00

RAID 5 - not booting

Poweredge T420 configured with PERC H310 RAID 5.  Originally had three disk drives; I replaced one in predictive failure about a year ago and also added a fourth to the array back then.

This week-end, something must have happened - server won't boot, I get the initial 'Foreign configuration(s) found on adapter.  Press any key to continue or 'C' to load the configuration utility or 'F' to import foreign configuration(s) and continue' and then eventually 'No boot device available'.  The two middle drives are flashing amber/green, which would indicated predictive failure but not failed I believe but they show up as 'Missing' and 'Foreign' in the configuration utility (see attached screenshots).  Shows the two drives with 'Error' under S.M.A.R.T. state.

driveErrors.png

I have one spare drive on hand.  What can I try in order to recover this if anything?  It would be a very difficult system to resetup from scratch and/or a backup.  I'm afraid that if I replace one of the drives, it will make things worse.  I'm not sure what 'import foreign configuration' would do?  If I can get it to boot and replace one of the drives then I can order another to replace the second one. Manually booting from the BIOS doesn't work; reseating the drives also has no effect.

Any help would be greatly appreciated; the whole lab runs on this server!

 

Moderator

 • 

8.4K Posts

April 11th, 2022 12:00

Trishia42,

 

Let me start with clarifying a Foreign Configuration and other details.

There is raid configuration information that is stored on the controller, as well as the drives. Now when the controller sees that the raid configuration information is not the same as what it has stored in the controller, it will flag it as a Foreign Config. 

Now with a Foreign Configuration you will have the option to Clear the Foreign, or to Import the Foreign. What Importing the Foreign does is tell the controller to disregard its own raid configuration data, and to replace it with the hard drives raid configuration data, whereas Clearing a Foreign Configuration does the opposite, it is telling the controller to clear the drives raid configuration data and to replace it with the controllers information.

 

Now a good rule of thumb is that if the Virtual Disk is bootable and the data is intact, you would want to Clear the foreign, as there isn't an issue with booting or reading data. Where if you can't boot then it is a good bet the issue is with the controller configuration data, so an Import would be justified. 

 

Now with that all being said, anytime you work with the raid I would highly suggest having a complete backup prior to doing anything. 

 

Let me know how it goes.



Moderator

 • 

8.4K Posts

April 11th, 2022 13:00

Trishia42,

 

That screen is normal, but so you are aware the rebuild will complete as long as the system is running, so you don't have to remain in the controller BIOS. Now in regards to the amber and green LEDs, likely what has happened is that you have a punctured stripe or double fault. Now if you can run a TSR on the server, I can check to see what the logs show.

 

If you like to the steps for running a TSR are here, and after done you can upload it here.

Afterwards private message me the svc tag used to upload and I can retrieve it. 

 

 

10 Posts

April 11th, 2022 13:00

So went ahead with the Import foreign configurations, said the operation succeeded and now I have the following screen;  

20220411_131307.jpg

Is that screen supposed to update after the rebuild is complete?  It's been three hours but I didn't want to reboot yet in case.  The two middle drives still have the flashing amber/green and it looks now like the fourth drive also has one - obviously not great but if I can just boot then I can start replacing drives.

10 Posts

April 12th, 2022 06:00

So completed overnight, and then upon reboot actually got to the login screen - walked away 10 seconds to grab a drive and when I was back, screen was back to booting, stating a drive was missing again (LCD on server briefly showed a fault on drive in bay 1).  Entering the configuration utility now shows the second drive as 'Missing'/'Foreign' still and the third drive as 'Offline'.  I could either replace drive 3 (the offline one) with a new one and see if I can boot from there or either force drive 3 from offline to online and replace the missing one (although I worry about data corruption in that case), or could try importing the foreign configuration again (but not sure how that will go with one of the drives listed as 'offline'?).  Any advice?  I'm aware it's starting to look bad.

Moderator

 • 

8.4K Posts

April 12th, 2022 07:00

Also, regarding the Debug log, would you follow these steps to extract it and then would you upload the results here.

 

1.  Press F2 during the pre-boot options

2.  Select "Device Settings"

3.  Select the RAID Controller

4.  Select "Controller Management"

5.  Select "Save Debug Log"   at the bottom

       *You may need to scroll down to view the option

6.  Select "Save Log" after noting the filename that will be saved, and choosing the USB directory from the "Select Directory" option

        *the default filename should be:   ttyLog.txt

7.  A message confirms the operation was successful


Afterwards private message me the tag used to upload.

 

 

 

Moderator

 • 

8.4K Posts

April 12th, 2022 07:00

You can run a TSR on the T420, as it has the iDrac7, which the instructions here are for. 

Moderator

 • 

8.4K Posts

April 12th, 2022 07:00

Trishia42,

 

With a drive being out of the Virtual Disk, if you offline or remove another one the Virtual DIsk will fail. 

 

Would you confirm what the LEDs are showing now? 

 

I am not certain that Importing is the thing to do now. If you can would you run the TSR, as I specified previously, and lets look at the logs. As anything now we can cause data loss, if it isn't already corrupted/punctured.

 

Let me know.

 

 

 

 

10 Posts

April 12th, 2022 07:00

I saw the instructions but I don't have this GUI; the instructions state that it's for version 2.3 and higher and/or 2.10 to 2.30 and the screenshot I posted previously shows the version as 1.57.57.  Even if it did, I'm not sure how I would be able to save the results without access to the file system...

Moderator

 • 

8.4K Posts

April 12th, 2022 07:00

What revisions are you on for the BIOS and raid controller as well?

 

10 Posts

April 12th, 2022 07:00

Hi Chris,

LEDs same as yesterday; first drive green/green, second and third drives have the first LED off and the second one flashing green/amber, fourth one has first LED green, second flashing green/amber.

I don't think I can run a TSR in this version - doesn't appear to be an option there (this is a Poweredge T420, so older  model).  There is a 'Save debug log' option but that doesn't work since it can't access the file system (I'm not sure how the TSR would work in that case either).

idracVersion.jpg

I haven't backed up in a while, but there has been almost no changes to the server since then.  I do have a Oracle database on there, and I would prefer not to lose that data since the last backup.

 

10 Posts

April 12th, 2022 08:00

Sent.

Moderator

 • 

8.4K Posts

April 12th, 2022 09:00

I have reviewed the log and from what I see so far we are looking at; 

 

It does not appear that the VD is punctured, yet.

Out of date firmware on the H310.

Uncertified drives

Probably some metadata corruption on the VD.

Unsure about drive firmware, as uncertified.

 

So I think at this time the best option is to try and import again, then reboot and get a backup if possible. Once the backup is complete, we will then delete the VD, update the server, recreate the VD, and then restore from backup. 

 

You may also consider some new drives too:

 

04/11/22 9:02:15: EVT#10467-04/11/22 9:02:15: 96=Predictive failure: PD 01(e0x20/s1)
04/11/22 9:02:15: EVT#10468-04/11/22 9:02:15: 96=Predictive failure: PD 02(e0x20/s2)
04/11/22 9:02:15: EVT#10469-04/11/22 9:02:15: 96=Predictive failure: PD 03(e0x20/s3)

 

Now if you can not get a successful complete backup after importing, then you may want to consider using a 3rd party data recovery service.

 

 

 

10 Posts

April 12th, 2022 10:00

Hi Chris,

Two of the drives are uncertified as I could no longer buy these from Dell (I replaced one about a year ago and also added one for capacity increase).  I did the import again and was able to logon to Windows but shortly after, it crashed with a BSOD (0x000000F4); it doesn't look like it actually created the memory.dmp file.  I am able to boot into safe mode with networking but unfortunately, this does not allow me to run an Acronis image backup or even a built-in Windows server backup, nor does it allow me to backup the Oracle database through SQL commands.

Is there anything I should be attempting to try to stabilize the system long enough to be able to perform backups?  Should I force one of the failed hard drives offline and replace it with the new one I have - not sure 1 or 2 (only currently have one on hand)?

Thanks again.

Moderator

 • 

8.4K Posts

April 12th, 2022 12:00

Trishia42,

 

If you are able to see the data from Safe Mode, then it may be that something with the OS got messed up, and we might try repairing it can get it to boot without safe mode. 

As far as offlining the drive, if the drive is failed the it is already offline. You would only need to offline a drive if it is online and a pred fail, then you would offline it prior to replacing. 

 

 

Moderator

 • 

3.4K Posts

April 14th, 2022 06:00

Hello,

unfortunately I cannot give you a information about how long a rebuild will last. There are too much variables (VD usage, hard drive speeds, size etc).

I suggest as you said, to wait and get back to us if after the weekend, as you said, it didn't finish the rebuild. Agree for the moment, do not force a restart.

Thanks

Marco

No Events found!

Top