GeforceXP
1 Nickel

Degraded RAID 1 Array:PowerEdge 2800

Jump to solution

Hello to everyone. I hope someone can shed some light on this fairly unusual situation.

I have a PowerEdge 2800 which is due to be replaced before May this year, it's a great server and done a great job over the last 7 years.

Ok to the problem; Server Administrator was predicting a failure on 0:0

In the server we have:

0:0 Maxtor 36GB 15k 0:1 Fujitsu 73GB 15k 1:2 Something .. 1:3 Something ..

So, I popped 0:0 and the server was working fine with a degraded array, and after 15 seconds I inserted it back in to the PE2800. After approximately 45 minutes the RAID 1 array was rebuilt, but it still said predicted failure. So I thought I best order some new drives - I received them today; 2 x 146GB 15k Seagate Cheetahs.

Ok, so I popped 0:0 drive and inserted one of the 146GB cheetah's and the server crashed - would not boot at all, and was giving me an error saying the logical drive had failed.

So, in the PERC4i BIOS, I forced drive 0:1 online, and the server is now back up and running albeit with a degraded array.

So this is where I'm at, what choices do I have?

Insert the old drive with a predicted failure, leave it alone, or try the 146GB cheetah again. I don't really fancy doing any of them, but I will order another 73GB Fujitsu in the meantime.

What I want to know is why did the server crash after inserting the cheetah drive? I have swapped hot pluggable hard drives many-a-time in the past in many dell severs and never experienced anything like this.

Currently running degraded, so any replies would be massively appreciated.

Thanks a lot.

0 Kudos
1 Solution

Accepted Solutions
theflash1932
5 Iridium

RE: Degraded RAID 1 Array:PowerEdge 2800

Jump to solution

"So, I'm now back to square one, where I have a RAID1 system with one predicted disk failure.  What would you do from here on in?"

Another thing that many people don't know is that with a pf drive, it should be forced offline before replacing, otherwise the pf flag can be assigned to the new disk (until an actual 'rescan' is performed) and can result in some odd behavior (the controller is more sensitive to potential failure of pf drives).

"Common sense says I cannot mix-match 10k/15k rpm drives!"

You CAN mix/match speeds and/or sizes (there is nothing wrong with it 'technically'), but in real-world scenarios, you would never want to mix speeds (depending on the configuration, it would degrade the speed of the entire array).  One thing you CAN'T do is mix U320 and U160 SCSI drives on a backplane/controller.

"What would you do from here on in?"

I guess from here depends on what you feel most comfortable doing.  There is a chance that the pf drive will operate just fine until you are done with the server (or until you can get another replacement - if you feel another replacement would be better than the ones you currently have), although it could cause additional issues if its status degrades.  There is a chance that it could happen again if you try to replace it, but I believe it is unlikely to happen again.  If it were me (and it's important to remember that it's not :)), I would force offline disk 0 and replace it "hot" with the other 146GB drive.  Another thing you could do, which would probably feel less risky, is to simply insert the 146GB drive into an open slot, assign it as a hot-spare, then force offline disk 0, which will cause a rebuild to automatically start with disk 0.

 

0 Kudos
22 Replies
theflash1932
5 Iridium

RE: Degraded RAID 1 Array:PowerEdge 2800

Jump to solution

Important:  Did you replace disk 0 "hot", or did you power down the server to replace the drive?

0 Kudos
GeforceXP
1 Nickel

RE: Degraded RAID 1 Array:PowerEdge 2800

Jump to solution
Hot. As I always do.
0 Kudos
theflash1932
5 Iridium

RE: Degraded RAID 1 Array:PowerEdge 2800

Jump to solution

You didn't say so specifically, and although people often throw around the term "hot-swap", they often don't realize what it really means, especially in the context of how the RAID controller uses that to manage the arrays.  Don't be offended that I asked - it is a VERY common mistake, made even by experienced IT staff.  You did it right ... just making sure 🙂

1. It could have been a fluke.  Firmware, static electricity, poor Moon/Venus alignment, etc.  You could try it again to confirm the issue.

2. Could be something about the 146GB drive it doesn't like.  Is this a Dell-certified drive?  Do you have another system you can test the drive with?  It could be bad (shorts or other power issues can result in exactly what you saw).  You could try the other 146GB.

3. REALLY old PERC or BIOS firmware.  Support for 146GB and 300GB drives was added early on in the 28x0 lifecycle, as were a number of fixes for the controller - recovery, performance, etc.  What is your PERC and BIOS firmware at?

4. Could be a slot issue.  The original drive may not be bad - it may simply be assigned bad because of a fault with the slot.  You could put the 146GB drive in another slot, then assign it as a hot-spare to begin the rebuild.

Obviously, these are some educated guesses at possible causes - what you experienced is NOT normal, so educated guesses as to things that could cause such an issue is all we have to go on.

 

0 Kudos
GeforceXP
1 Nickel

RE: Degraded RAID 1 Array:PowerEdge 2800

Jump to solution

0. No offence taken 🙂

1. That was my first thought.

2. Yes I'm sure it's certified - I got it from a certified partner for out-of-warranty Dell Spares, and yes I have another 2800 I can test it on before I do anything else, that was my second thought.

3. BIOS Firmware A07, PERC 4e/Di Firmware 5B2D, Driver Version 6.46.2.32, Storport Driver Version 5.2.3790.3959, then it says Minimum required storport driver version 5.2.3790.4173?

4. Could be. This is a Mail Server for 100 employee's - downtime is ultra critical - That's why I'm half tempted to leave it well alone until we can get a new server sorted, but ultimately I don't want to be leaving a degraded array realistically. A rock and a hard place comes to mind.

I totally appreciate your immediate replies by the way, thank you.

0 Kudos
Highlighted
GeforceXP
1 Nickel

RE: Degraded RAID 1 Array:PowerEdge 2800

Jump to solution

0. No offence taken 🙂

1. That was my first thought.

2. Yes I'm sure it's certified - I got it from a certified partner for out-of-warranty Dell Spares, and yes I have another 2800 I can test it on before I do anything else, that was my second thought.

3. BIOS Firmware A07, PERC 4e/Di Firmware 5B2D, Driver Version 6.46.2.32, Storport Driver Version 5.2.3790.3959, then it says Minimum required storport driver version 5.2.3790.4173?

4. Could be. This is a Mail Server for 100 employee's - downtime is ultra critical - That's why I'm half tempted to leave it well alone until we can get a new server sorted, but ultimately I don't want to be leaving a degraded array realistically. A rock and a hard place comes to mind.

I totally appreciate your immediate replies by the way, thank you!

0 Kudos
theflash1932
5 Iridium

RE: Degraded RAID 1 Array:PowerEdge 2800

Jump to solution

0. No offence taken 🙂

1. It could have been a fluke. Firmware, static electricity, poor Moon/Venus alignment, etc. You could try it again to confirm the issue.

- That was my first thought.

2. Yes I'm sure it's certified - I got it from a certified partner for out-of-warranty Dell Spares, and yes I have another 2800 I can test it on before I do anything else, that was my second thought.

3. BIOS Firmware A07, PERC 4e/Di Firmware 5B2D, Driver Version 6.46.2.32, Storport Driver Version 5.2.3790.3959, then it says Minimum required storport driver version 5.2.3790.4173?

4. Could be. This is a Mail Server for 100 employee's - downtime is ultra critical - That's why I'm half tempted to leave it well alone until we can get a new server sorted, but ultimately I don't want to be leaving a degraded array realistically. A rock and a hard place comes to mind.

I totally appreciate your immediate replies by the way, thank you.

3. You should look into updating the storport driver once this is resolved:
http://www.dell.com/support/drivers/us/en/04/DriverDetails/Product/poweredge-2800?driverId=GDCG3&osC...

4. Understood.  You could try plugging in the drive after hours.  If it is a power issue with the drive, it will likely do the same thing when you plug it in.  If not, it will simply sit there until you do something with it.  You may not have issues with it over the next few months, but as an mail server, it will get a lot of activity, and every second you are degraded, you run the risk of permanent data loss from read/write errors ... rock/hard place ... I get it 🙂

0 Kudos
GeforceXP
1 Nickel

RE: Degraded RAID 1 Array:PowerEdge 2800

Jump to solution

Not sure what happened to 1 and 2 from your reply, but thanks anyway.

Just to add some futher info;

I've reinserted the "predicted failure" drive back in to the PE2800 and it says it's rebuild the array successfully........but, even though Virtual Disk 0 has a green "check mark", when you click Virtual Disk 0, it does not list 0:0, only 0:1, yet the logs say that it has rebuilt successfully?

Does that now mean I have to re-scan via the Open Server Administrator console?

0 Kudos
theflash1932
5 Iridium

RE: Degraded RAID 1 Array:PowerEdge 2800

Jump to solution

I only responded to 3 and 4 🙂

0 Kudos
theflash1932
5 Iridium

RE: Degraded RAID 1 Array:PowerEdge 2800

Jump to solution

It may simply need to be "refreshed".  You can try a rescan though, if refreshing doesn't work ... there is no harm in it.  What version of OMSA do you have?  Have you tried closing OMSA altogether and reopening?  You should be able to see the drive if it has rebuilt.

0 Kudos