I wouldn't rule it out, it's something I've seen happen, but it isn't something I would necessarily expect. All the data that is missing as a result of the disk failure is supposed to be recoverable, due to RAID 5 parity bits. When you replace that failed disk, the rebuild operation occurs at the bit level and wouldn't ever address the files needed to boot, if that makes sense.
That having been said, there have been a few instances where rebuilding has helped, but they were more the exception than the rule. Do you happen to know what RAID controller you're working with? If it is a hardware PERC, it maintains a storage log that you can grab through the SupportAssist function in the iDRAC. Reviewing that can provide a significantly better look into the health of your array.
Thanks for the reply, Dylan! The RAID controller is a PERC H-310.
I will look for the Support Assist. It sounds like what you're suggesting is that I should look for indications that the PERC might be faulty?
I guess the underlying question is, when Dell ships servers with a RAID array, is the OS saved across the array or on a single HDD? If across the array, replacing the faulty drive should restore the missing OS files, I would think.
A single drive failing from a raid 5 shouldn't cause the Virtual Disk to not boot. If it was just a single drive failing from the VD that occured, then it should have continued to run fine, then when the replacement was added the redundant data would have been rebuilt to the drive. With the Virtual Disk configuration not being found being displayed I would start with accessing the controller BIOS to verify if you see the Virtual Disk details, as well as to confirm the status of the Physical Disks, such as if any of the drives are showing as Foreign?
I ask as it sounds like the drive failing may have caused the controller to lose the Virtual Disk configuration metadata, causing it to not know how to properly boot the VD.
I looked for SupportAssist as Dylan had suggested but didn't find it (this server is 8 years old and thus might predate the release of SupportAssist), but I did run tests from the LifeCycle Controller. The following pics are the results of those tests and will helpfully give some insight as to what might be going on.
message at boot upfrom LifeCycle Test #1From LifeCycle Test #2From LifeCycle Test #3
2) The RAID controller is listed and is the first option in the list.
3) All the NICs are not PXE enabled.
As a matter of note.... This system is 8 years old. F2 on my system is System Setup and F11 is BIOS Boot Manager (if that makes any difference).
The Boot Manager is set to BIOS (not UEFI). Any time I boot, the optical drive is the object of the boot device. I wonder if trying a factory reset would be in order?
If you reboot the server do you see a CTRL-R prompt to access the controller? If so what do you see in there in regards to the Virtual DIsk status, as well as the Physical Disks status?
So, it looks like the drives are in a foreign state. You can try to go into the controller during post and try to import the foreign configuration. There are some screenshots in this post https://dell.to/3sH6A0s
It would appear that I need to replace the 01:02 drive with a new one to rebuild the missing drive before attempting to import the Foreign Configuration?
The images with the 0142 alerts are messages I find concerning. Those can indicate drive failures. With seeing those on two different drives, this may be a real problem. You can try importing a foreign config without changing any hardware, the controller will basically just scan the metadata and attempt to bring the drives online. If that fails, you can certainly try replacing a disk to see if that changes any behavior, but I'd hesitate to get expectations too high.
If you're interested, you can boot your server to the Support Live Image and use it to export a controller log. This controller log will be the absolute best resource for determining what's going on with your array. You can of course review that yourself, but I wanted to offer to give it a look, too. Knowing what to look for in a log is super helpful, and these logs are things I've spent a lot of time with.
You should have a desktop environment and OpenManage should be available to you there. With this being a live environment, you'll want to grab a USB drive to save the file to.
Dell-DylanJ
4 Operator
•
2.9K Posts
0
March 30th, 2021 14:00
Hello,
I wouldn't rule it out, it's something I've seen happen, but it isn't something I would necessarily expect. All the data that is missing as a result of the disk failure is supposed to be recoverable, due to RAID 5 parity bits. When you replace that failed disk, the rebuild operation occurs at the bit level and wouldn't ever address the files needed to boot, if that makes sense.
That having been said, there have been a few instances where rebuilding has helped, but they were more the exception than the rule. Do you happen to know what RAID controller you're working with? If it is a hardware PERC, it maintains a storage log that you can grab through the SupportAssist function in the iDRAC. Reviewing that can provide a significantly better look into the health of your array.
jak320
16 Posts
0
March 31st, 2021 07:00
Thanks for the reply, Dylan! The RAID controller is a PERC H-310.
I will look for the Support Assist. It sounds like what you're suggesting is that I should look for indications that the PERC might be faulty?
I guess the underlying question is, when Dell ships servers with a RAID array, is the OS saved across the array or on a single HDD? If across the array, replacing the faulty drive should restore the missing OS files, I would think.
Thanks!
John
DELL-Chris H
Moderator
•
9.7K Posts
0
March 31st, 2021 08:00
Jak320,
A single drive failing from a raid 5 shouldn't cause the Virtual Disk to not boot. If it was just a single drive failing from the VD that occured, then it should have continued to run fine, then when the replacement was added the redundant data would have been rebuilt to the drive. With the Virtual Disk configuration not being found being displayed I would start with accessing the controller BIOS to verify if you see the Virtual Disk details, as well as to confirm the status of the Physical Disks, such as if any of the drives are showing as Foreign?
I ask as it sounds like the drive failing may have caused the controller to lose the Virtual Disk configuration metadata, causing it to not know how to properly boot the VD.
Let me know what you see.
jak320
16 Posts
0
March 31st, 2021 10:00
Chris and Dylan,
I looked for SupportAssist as Dylan had suggested but didn't find it (this server is 8 years old and thus might predate the release of SupportAssist), but I did run tests from the LifeCycle Controller. The following pics are the results of those tests and will helpfully give some insight as to what might be going on.
Thanks,
John
DELL-Chris H
Moderator
•
9.7K Posts
0
March 31st, 2021 12:00
I assume you aren't trying to PXE boot, so would you confirm the Boot order for me? Do you see Hard drive C listed?
Also, would you confirm if when you access the BIOS (F2) and then select Device Settings, is the raid controller listed there?
Lastly, if you aren't PXE booting you can access the BIOS and set the nics to Enabled without PXE.
Let me know what you see.
jak320
16 Posts
0
April 1st, 2021 07:00
Chris,
1) No drive C: listed. only the optical Drive.
2) The RAID controller is listed and is the first option in the list.
3) All the NICs are not PXE enabled.
As a matter of note.... This system is 8 years old. F2 on my system is System Setup and F11 is BIOS Boot Manager (if that makes any difference).
The Boot Manager is set to BIOS (not UEFI). Any time I boot, the optical drive is the object of the boot device. I wonder if trying a factory reset would be in order?
John
DELL-Chris H
Moderator
•
9.7K Posts
0
April 1st, 2021 09:00
Thank you for confirming that.
If you reboot the server do you see a CTRL-R prompt to access the controller? If so what do you see in there in regards to the Virtual DIsk status, as well as the Physical Disks status?
DELL-Josh Cr
Moderator
•
9.5K Posts
0
April 1st, 2021 10:00
So, it looks like the drives are in a foreign state. You can try to go into the controller during post and try to import the foreign configuration. There are some screenshots in this post https://dell.to/3sH6A0s
jak320
16 Posts
0
April 1st, 2021 10:00
Hope this helps....
Won't be back in the office until Tuesday, April 6th.
jak320
16 Posts
0
April 6th, 2021 06:00
Chris,
Yes, CTRL-R got me into the controller. attached are the pictures of the VD, HD and Foreign Config.
John
Virtual Drive
Physical Drive
Foreign Config #1
Foreign Config #2
jak320
16 Posts
0
April 6th, 2021 07:00
Chris and Josh,
It would appear that I need to replace the 01:02 drive with a new one to rebuild the missing drive before attempting to import the Foreign Configuration?
John
Dell-DylanJ
4 Operator
•
2.9K Posts
0
April 6th, 2021 07:00
Hi John,
The images with the 0142 alerts are messages I find concerning. Those can indicate drive failures. With seeing those on two different drives, this may be a real problem. You can try importing a foreign config without changing any hardware, the controller will basically just scan the metadata and attempt to bring the drives online. If that fails, you can certainly try replacing a disk to see if that changes any behavior, but I'd hesitate to get expectations too high.
If you're interested, you can boot your server to the Support Live Image and use it to export a controller log. This controller log will be the absolute best resource for determining what's going on with your array. You can of course review that yourself, but I wanted to offer to give it a look, too. Knowing what to look for in a log is super helpful, and these logs are things I've spent a lot of time with.
jak320
16 Posts
0
April 6th, 2021 08:00
Dylan,
I will try the SLI solution you suggested. Thanks....
John
jak320
16 Posts
0
April 6th, 2021 09:00
Dylan,
I burned a disk with SLI and booted to it. Where will I find the log you're looking for or what utility will I need to run?
I selected the optical drive in the BIOS boot manager and let it run until it booted to the LINUX UI.
What's next?
John
Dell-DylanJ
4 Operator
•
2.9K Posts
0
April 6th, 2021 09:00
You should have a desktop environment and OpenManage should be available to you there. With this being a live environment, you'll want to grab a USB drive to save the file to.
https://dell.to/3ut1Hsi