GeforceXP
1 Nickel

Degraded RAID 1 Array:PowerEdge 2800

Hello to everyone. I hope someone can shed some light on this fairly unusual situation.

I have a PowerEdge 2800 which is due to be replaced before May this year, it's a great server and done a great job over the last 7 years.

Ok to the problem; Server Administrator was predicting a failure on 0:0

In the server we have:

0:0 Maxtor 36GB 15k 0:1 Fujitsu 73GB 15k 1:2 Something .. 1:3 Something ..

So, I popped 0:0 and the server was working fine with a degraded array, and after 15 seconds I inserted it back in to the PE2800. After approximately 45 minutes the RAID 1 array was rebuilt, but it still said predicted failure. So I thought I best order some new drives - I received them today; 2 x 146GB 15k Seagate Cheetahs.

Ok, so I popped 0:0 drive and inserted one of the 146GB cheetah's and the server crashed - would not boot at all, and was giving me an error saying the logical drive had failed.

So, in the PERC4i BIOS, I forced drive 0:1 online, and the server is now back up and running albeit with a degraded array.

So this is where I'm at, what choices do I have?

Insert the old drive with a predicted failure, leave it alone, or try the 146GB cheetah again. I don't really fancy doing any of them, but I will order another 73GB Fujitsu in the meantime.

What I want to know is why did the server crash after inserting the cheetah drive? I have swapped hot pluggable hard drives many-a-time in the past in many dell severs and never experienced anything like this.

Currently running degraded, so any replies would be massively appreciated.

Thanks a lot.

0 Kudos
theflash1932
5 Iridium

RE: Degraded RAID 1 Array:PowerEdge 2800

Important:  Did you replace disk 0 "hot", or did you power down the server to replace the drive?

0 Kudos
GeforceXP
1 Nickel

RE: Degraded RAID 1 Array:PowerEdge 2800

Hot. As I always do.
0 Kudos
theflash1932
5 Iridium

RE: Degraded RAID 1 Array:PowerEdge 2800

You didn't say so specifically, and although people often throw around the term "hot-swap", they often don't realize what it really means, especially in the context of how the RAID controller uses that to manage the arrays.  Don't be offended that I asked - it is a VERY common mistake, made even by experienced IT staff.  You did it right ... just making sure 🙂

1. It could have been a fluke.  Firmware, static electricity, poor Moon/Venus alignment, etc.  You could try it again to confirm the issue.

2. Could be something about the 146GB drive it doesn't like.  Is this a Dell-certified drive?  Do you have another system you can test the drive with?  It could be bad (shorts or other power issues can result in exactly what you saw).  You could try the other 146GB.

3. REALLY old PERC or BIOS firmware.  Support for 146GB and 300GB drives was added early on in the 28x0 lifecycle, as were a number of fixes for the controller - recovery, performance, etc.  What is your PERC and BIOS firmware at?

4. Could be a slot issue.  The original drive may not be bad - it may simply be assigned bad because of a fault with the slot.  You could put the 146GB drive in another slot, then assign it as a hot-spare to begin the rebuild.

Obviously, these are some educated guesses at possible causes - what you experienced is NOT normal, so educated guesses as to things that could cause such an issue is all we have to go on.

 

0 Kudos
GeforceXP
1 Nickel

RE: Degraded RAID 1 Array:PowerEdge 2800

0. No offence taken 🙂

1. That was my first thought.

2. Yes I'm sure it's certified - I got it from a certified partner for out-of-warranty Dell Spares, and yes I have another 2800 I can test it on before I do anything else, that was my second thought.

3. BIOS Firmware A07, PERC 4e/Di Firmware 5B2D, Driver Version 6.46.2.32, Storport Driver Version 5.2.3790.3959, then it says Minimum required storport driver version 5.2.3790.4173?

4. Could be. This is a Mail Server for 100 employee's - downtime is ultra critical - That's why I'm half tempted to leave it well alone until we can get a new server sorted, but ultimately I don't want to be leaving a degraded array realistically. A rock and a hard place comes to mind.

I totally appreciate your immediate replies by the way, thank you.

0 Kudos
GeforceXP
1 Nickel

RE: Degraded RAID 1 Array:PowerEdge 2800

0. No offence taken 🙂

1. That was my first thought.

2. Yes I'm sure it's certified - I got it from a certified partner for out-of-warranty Dell Spares, and yes I have another 2800 I can test it on before I do anything else, that was my second thought.

3. BIOS Firmware A07, PERC 4e/Di Firmware 5B2D, Driver Version 6.46.2.32, Storport Driver Version 5.2.3790.3959, then it says Minimum required storport driver version 5.2.3790.4173?

4. Could be. This is a Mail Server for 100 employee's - downtime is ultra critical - That's why I'm half tempted to leave it well alone until we can get a new server sorted, but ultimately I don't want to be leaving a degraded array realistically. A rock and a hard place comes to mind.

I totally appreciate your immediate replies by the way, thank you!

0 Kudos
theflash1932
5 Iridium

RE: Degraded RAID 1 Array:PowerEdge 2800

0. No offence taken 🙂

1. It could have been a fluke. Firmware, static electricity, poor Moon/Venus alignment, etc. You could try it again to confirm the issue.

- That was my first thought.

2. Yes I'm sure it's certified - I got it from a certified partner for out-of-warranty Dell Spares, and yes I have another 2800 I can test it on before I do anything else, that was my second thought.

3. BIOS Firmware A07, PERC 4e/Di Firmware 5B2D, Driver Version 6.46.2.32, Storport Driver Version 5.2.3790.3959, then it says Minimum required storport driver version 5.2.3790.4173?

4. Could be. This is a Mail Server for 100 employee's - downtime is ultra critical - That's why I'm half tempted to leave it well alone until we can get a new server sorted, but ultimately I don't want to be leaving a degraded array realistically. A rock and a hard place comes to mind.

I totally appreciate your immediate replies by the way, thank you.

3. You should look into updating the storport driver once this is resolved:
http://www.dell.com/support/drivers/us/en/04/DriverDetails/Product/poweredge-2800?driverId=GDCG3&osC...

4. Understood.  You could try plugging in the drive after hours.  If it is a power issue with the drive, it will likely do the same thing when you plug it in.  If not, it will simply sit there until you do something with it.  You may not have issues with it over the next few months, but as an mail server, it will get a lot of activity, and every second you are degraded, you run the risk of permanent data loss from read/write errors ... rock/hard place ... I get it 🙂

0 Kudos
GeforceXP
1 Nickel

RE: Degraded RAID 1 Array:PowerEdge 2800

Not sure what happened to 1 and 2 from your reply, but thanks anyway.

Just to add some futher info;

I've reinserted the "predicted failure" drive back in to the PE2800 and it says it's rebuild the array successfully........but, even though Virtual Disk 0 has a green "check mark", when you click Virtual Disk 0, it does not list 0:0, only 0:1, yet the logs say that it has rebuilt successfully?

Does that now mean I have to re-scan via the Open Server Administrator console?

0 Kudos
theflash1932
5 Iridium

RE: Degraded RAID 1 Array:PowerEdge 2800

I only responded to 3 and 4 🙂

0 Kudos
theflash1932
5 Iridium

RE: Degraded RAID 1 Array:PowerEdge 2800

It may simply need to be "refreshed".  You can try a rescan though, if refreshing doesn't work ... there is no harm in it.  What version of OMSA do you have?  Have you tried closing OMSA altogether and reopening?  You should be able to see the drive if it has rebuilt.

0 Kudos