Unsolved

This post is more than 5 years old

15409

May 31st, 2005 14:00

Perc errors - Is this serious?

I got the following errors on one of my servers.  It was limited to just a few lines and nothing is scrolling repeatedly indicating drive problems on the server.  The hardware light is still green as well.  From what I can tell a bad block was found and it attempted to recover from it but I am not convinced it was able to given the 'read retries exhausted' final message.
 
Can anyone give me feedback on how I should proceed with this?
Cheers,
Dale
 
5-30-2005   4:08:08 pm:    PERC2-2.70-0
   Severity = 0
   ID(0:02:0); Error Event [command:0x28]
 5-30-2005   4:08:09 pm:    PERC2-2.70-0
   Severity = 0
   ID(0:02:0); Medium Error [k:0x3,c:0x11,q:0x1]
 5-30-2005   4:08:09 pm:    PERC2-2.70-0
   Severity = 0
   ID(0:02:0); Read Retries Exhausted
 5-30-2005   4:08:09 pm:    PERC2-2.70-0
   Severity = 0
   ID(0:02:0) Medium Error, LBN Range 66776256:66776319
 5-30-2005   4:08:10 pm:    PERC2-2.70-0
   Severity = 0
   ID(0:02:0) Starting BBR sequence
 5-30-2005   8:49:16 pm:    PERC2-2.70-0
   Severity = 0
   ID(0:02:0); Error Event [command:0x28]
 5-30-2005   8:49:17 pm:    PERC2-2.70-0
   Severity = 0
   ID(0:02:0); Medium Error [k:0x3,c:0x11,q:0x1]
 5-30-2005   8:49:17 pm:    PERC2-2.70-0
   Severity = 0
   ID(0:02:0); Read Retries Exhausted

May 31st, 2005 20:00

Dale,

  ID 0,2 looks like it's failing, it's in a RAID array with redundancy, so it's not going to show up at the OS level, other than an event in the system log. Serious, well not as serious as a down server, but you should run diagnostics on the drive and replace it if it fails. After replacing the drive, you could stand to upgrade the F/W, driver and management to version 2.8...

---------------------------------

What would you recommend for diags on this driver?  It is on a NetWare server so I am using the fastcli tool.  I will order a replacement drive to prepare for a possible failure.

720 Posts

May 31st, 2005 20:00

Dale,

  ID 0,2 looks like it's failing, it's in a RAID array with redundancy, so it's not going to show up at the OS level, other than an event in the system log. Serious, well not as serious as a down server, but you should run diagnostics on the drive and replace it if it fails. After replacing the drive, you could stand to upgrade the F/W, driver and management to version 2.8...

 

Warwizard DCSP

720 Posts

May 31st, 2005 21:00

Use the command "Disk show Defects (0,2,0)" any grown defects represent media errors... repeat for the other drives in the system.
If more than 1 drive has errors, call Dell.
 
Warwizard

June 1st, 2005 13:00

All drives had primary defects, but none had GROWN defects. From what I have read primary "originate during manufacturing" while grown "originate after manufacturing".  Does that mean the coast is clear or would you recommend I replace the drive anyways?

Cheers
Dale

720 Posts

June 1st, 2005 14:00

Dale,

  That just means the controller has not detected a media error, you are still having a error message from that drive. I advise going ahead and calling Dell for assistance in obtaining and running the HD diagnostics at a time that you can down the server.

http://support.dell.com/support/edocs/storage/RAID/romb26/cli-rg/index.htm is the referance guide for the CLI (ver 2.6, I know you have 2.7, but Dell does not list a guide for the 2.7 version)

you can pull the controller logs with the diagnostic show history /old command. Dell tech support may want to have the text file resulting sent to them.

 

Good luck,

Warwizard

 

June 1st, 2005 16:00

Thx Warwizard.  I am in a bit of sticky sit though as warranty just expired in the last few weeks on this server.  I am getting it recertified but don't have tech supp right now until this is finalized.
 
If I want to replace this drive can I simply do a
 
enclosure prepare slot 2
 
to take that slot offline (because the drive has not failed)
 
then remove the drive and insert the replacement drive at which time I should see it automatically start a rebuild operation?  It has been a while since we have replaced a drive so I am a bit rusty in my command recollection.

Cheers,
Dale

104 Posts

June 1st, 2005 19:00

Hello Dale - I wouldn't worry unless you see this happen a few more times in the near future. Bad blocks as grown defects doesn't mean a drive will imminently fail. It's the sign of an aging drive. All drives over time will experience some degradation - that's why the drive is designed to detect the bad block and perform a Bad Block Reallocation (BBR).

June 1st, 2005 19:00

Actually , it did happen again today.  It happened in the same circumstance, an IT staff member was running a disk scan with a program called Spacewatch just to tally space stats.  Tells me this drive has some problem of some sort.
 
6-01-2005  10:57:10 am:    PERC2-2.70-0
   Severity = 0
   ID(0:02:0); Error Event [command:0x28]
 6-01-2005  10:57:10 am:    PERC2-2.70-0
   Severity = 0
   ID(0:02:0); Medium Error [k:0x3,c:0x11,q:0x1]
 6-01-2005  10:57:10 am:    PERC2-2.70-0
   Severity = 0
   ID(0:02:0); Read Retries Exhausted

104 Posts

June 2nd, 2005 14:00

However this works out for you, I would strongly recommend that you get moved to the latest firmware and driver for this controller. You're right - this drive looks worrisome.

720 Posts

June 2nd, 2005 16:00

Concerning your expired service contract, go ahead and call in, Dell will still talk to PowerEdge Server customers for free, and give you all the help you need over the phone. You can even have a service call that you pay for that qualifies as the recertification.

Warwizard

No Events found!

Top