Start a Conversation

Unsolved

This post is more than 5 years old

71157

July 17th, 2015 13:00

Dell Poweredge PERC S100 Degraded

I have RAID1 set up on a 2008 server.  It was working yesterday but this morning it fails to boot to the operating system as it can not find the hard disk.  No obvious reason for this so I'm assuming random disk failure.

I have accessed the PERC S100 control panel and noted that it has a status of degraded, with disk 0.0 as on-line and disk 0.1 as Ready.

I have two questions

1. Why is it not booting into the operating system.  I thought that that was why it was RAIDed. So when one disk fails the other other still works.  Or does this point to second problem?

2. I understand that my best line of action is to replace disk 0.1 which has the status of "ready".  Can someone confirm these steps.

a. Powerdown, 

b. Repalce disk 0.1 with new disk

c. Delete the virtual disk that is automatically created and initialise disk (I assume it is possible to do all this from the PERC S100 management console as I can not boot the operating system to access the open mange software)

d. Rebuild array.

e. Cross fingers and hope server boots!

I have a backup of data (7 days old) but the time to have to rebuild server would be massive if the data off these disks is deleted.  Any advice on best practice for this scenario would be greatly appreciated.

David

Moderator

 • 

6.2K Posts

July 17th, 2015 15:00

Hello David

1. Why is it not booting into the operating system.  I thought that that was why it was RAIDed. So when one disk fails the other other still works.  Or does this point to second problem?

Yes, that is the way RAID 1 is designed to work. You are dealing with more than one issue.

2. I understand that my best line of action is to replace disk 0.1 which has the status of "ready".  Can someone confirm these steps.

Those steps are correct for the S100/S300 controllers in a cabled configuration. If you have a backplane then you would want to hotswap drives rather than power down. The S100/S300 will typically put new drives into a non-RAID virtual disk that you will need to delete as mentioned in step c. Make sure you delete the correct virtual disk. If the new disk is in a ready state then you do not need to delete anything. Also, initialization should occur automatically during rebuild/array creation, so that step is not necessary.

e. Cross fingers and hope server boots!

Bringing the RAID 1 to an optimal state will not correct your OS issue. You need to find out if the OS is not detecting the HDD or if the system is not finding a valid boot device to hand off to. I would suggest booting to your Windows DVD and checking to see if the hard drive is detected. You may need to refer to Microsoft KB articles about how to recover the boot record.

It is likely that whatever issue caused one of your drives to drop out of the array has likely caused the boot record or other system files to become corrupted. Since this is a driver based controller it is also possible that a software issue caused a communication issue with the controller that took the second drive offline.

I would run diagnostics on the drives. If there are no errors then I would not replace anything. I would go ahead and set the ready state drive as a hot spare and let it rebuild back into the array. You will then need to try to recover the OS or restore from backup if that is not possible.

Thanks

3 Posts

July 18th, 2015 01:00

Hi Dell-Daniel

Thank you for taking the time to respond much appreciated.  

I'm back onsite this evening and will action some of your ideas and let you know.

From research I've done I read that when the virtual disk (created in perc) is degraded the operating system will sometimes not boot from it however by going to boot sequence at boot (f11) you can choose to boot from the one good drive? This makes sense to me.... Well I'm hopeful as it then means just a standard drive failure which at this stage makes more sense than any other type of failure given that there was no obvious reason for the failure.

I'm also thinking that best practice might be to replace disk 0.1 as it's practical to do it,  and if it was failing then it takes it out of the equation for trouble shooting if there are other issues.

I'll let you know what I find.....

4 Operator

 • 

1.8K Posts

July 18th, 2015 10:00

Best practice....

Once you get the initial issue resolved, purchase a new Dell retail disk, swap it with one of the raid 1 disks, store it away safely. The stored disk is a bootable image of the array.

3 Posts

July 18th, 2015 14:00

Hi

So to update (it has a happy ending)

1. I tried booting to C drive using F11 boot options - did not work but I do have a question...

2. I tried to initialise disk 0.1 which it did but the array did not automatically rebuild. 

3. I swapped the disk with a new disk WD 1Tb red. Initialised but also did not rebuild.  I tried setting it as hot swap also no change.

4. I downloaded the S100 drivers from Dell (called SAS RAID dirvers) and loaded those on the windows repair screen.  Windows then did a repair on the operating system and I can now boot off the 1 good drive in the degraded Raid1 array (that's always a good feeling - seeing the windows logon screen).

5. I've backed up all relevant files and ran disk check on disk

My question now is how to get the RAID to rebuild.  i have installed open manage and can see that the RAID is degraded.  However no option to rebuild.  I have set the new disk to be a hotswap (its initialised already from the PERC menu in Bios).  

I have a cable connection so have to power down to replace (can't hotswap).  There was a error message about the disks not being appropriate to set as a hot swap disk but all the reasons they gave were not relevant and I could tell it to go ahead and set as hot swap.  Does it really need to be a Dell disk I swap it with?

Thanks again for your help.

Moderator

 • 

6.2K Posts

July 18th, 2015 16:00

There was a error message about the disks not being appropriate to set as a hot swap disk but all the reasons they gave were not relevant and I could tell it to go ahead and set as hot swap.

If you receive an error about the disk not being appropriate for a hot spare then the controller doesn't think the disk is the appropriate size or type to function in any of the virtual disks. It could also be having a communication issue with the drive and be unable to determine characteristics of the drive.

The S100 is a driver based controller. Some options are only available from within the operating system by using the storage management utility within OpenManage Server Administrator. You may need to initiate the rebuild using OMSA.

http://www.dell.com/support/home/us/en/04/product-support/product/poweredge-rc-s100/manuals

Thanks

No Events found!

Top