This post is more than 5 years old

6 Posts

249218

March 26th, 2012 02:00

Weird RAID / disk fault issue

Hi all,

just a month ago a disk in our raid-1 setup (SAS6/ir @ R300, 2x SATA 250B) failed. We ordered 2 SATA 500GB replacement disks from Dell, and proceeded to hot-swap the first one in place of the faulted one. The controller started re-syncing, but the new disk failed at 61% progress. I had it start again, but failed again at 61%. So i tried the 2nd newly ordered disk, but even this one failed (this time, at 57%).

Dell diagnostics reports no error on the disks, but nevertheless they are routinely marked as failed when they reach the aforementioned progress during re-syncing. Any idea ??

Thanks for the effort,
Mauro

7 Practitioner

 • 

9.7K Posts

 • 

48K Points

March 27th, 2012 08:00

Mauro,

Was/Is drive 0:0:1 flashing amber and green? It may be a punctured stripe, I would need the controller log to confirm that. The sense key was indicating a Medium Unrecovered Read Error.  The reason Dset failed was the CentOS. What I would like to do is boot to this ISO, it is OMSA Live - linux.dell.com/.../omsa601usa.iso

Then run the dset from within the OMSA Live environment. Email the zip file to me.

Thank you.

7 Practitioner

 • 

9.7K Posts

 • 

48K Points

March 26th, 2012 09:00

Mauro,

What is the OS on the server, and is it accessable? If so we can run a dset report and pull the controller log to see why the drives failed. Did the original drives show an amber light or an amber and green?

6 Posts

March 26th, 2012 10:00

Hi Chris, the OS is a Linux CentOS 5.5 and is accessable, as the raid is degraded but fully functional.

The original disk failed electrically (i suppose) , as it showed no blink activity, neither it was detected by the controller.

The replacement ones switch to an amber blinking light when they fail during resyncing

7 Practitioner

 • 

9.7K Posts

 • 

48K Points

March 26th, 2012 10:00

Mauro,

Download the Dset tool and run the report.

Dset - www.dell.com/.../DriverDetails

Dset instructions - www.dell.com/.../DriverDetails

I am going to send you an email with my contact information. Could you reply to that email with the report and I can take a look at it for you?

6 Posts

March 27th, 2012 03:00

Hi Chris,

i was unable to get dset running, anyway i recovered a storage log from with the omsa software. I've mailed it to you. The interesting bits are:

(0:0:0 is the working pdisk from the original raid set)

(0:0:1 is the replacement pdisk)

Here is when the original pdisk at 0:0:0 goes missing in action.

2057 Mon Feb 20 10:14:16 2012 Storage Service Virtual disk degraded: Virtual Disk 0 (Virtual Disk 0) Controller 0 (SAS 6/iR Adapter)

2057 Mon Feb 20 10:35:41 2012 Storage Service Virtual disk degraded: Virtual Disk 0 (Virtual Disk 0) Controller 0 (SAS 6/iR Adapter)

2329 Mon Feb 20 11:39:27 2012 Storage Service SAS port report: SAS wide port 1 lost link on PHY 0.: Controller 0 (SAS 6/iR Adapter)

2330 Mon Feb 20 11:39:37 2012 Storage Service SAS port report: SAS wide port 1 restored link on PHY 0.: Controller 0 (SAS 6/iR Adapter)

2057 Mon Mar 19 09:05:28 2012 Storage Service Virtual disk degraded: Virtual Disk 0 (Virtual Disk 0) Controller 0 (SAS 6/iR Adapter)

2330 Mon Mar 19 18:58:25 2012 Storage Service SAS port report: SAS wide port 1 restored link on PHY 0.: Controller 0 (SAS 6/iR Adapter)

2199 Mon Mar 19 18:58:26 2012 Storage Service The virtual disk cache policy has changed.: Virtual Disk 0 (Virtual Disk 0) Controller 0 (SAS 6/iR Adapter)

The next month, the replacement disks are shipped, and i replace the seemingly failed pdisk at 0:0:0

2052 Mon Mar 19 18:58:26 2012 Storage Service Physical disk inserted: Physical Disk 0:0:0 Controller 0, Connector 0

2065 Mon Mar 19 18:58:27 2012 Storage Service Physical disk Rebuild started: Physical Disk 0:0:0 Controller 0, Connector 0

The rebuild starts, and i start getting hundreds of log lines like:

2095 Mon Mar 19 20:02:47 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0

2095 Mon Mar 19 20:02:52 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0

2350 Mon Mar 19 20:02:56 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:1 Controller 0, Connector 0

2350 Mon Mar 19 20:03:00 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:0 Controller 0, Connector 0

2095 Mon Mar 19 20:03:01 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0

2095 Mon Mar 19 20:03:02 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0

2095 Mon Mar 19 20:03:02 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0

2350 Mon Mar 19 20:03:02 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:1 Controller 0, Connector 0

2350 Mon Mar 19 20:03:04 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:0 Controller 0, Connector 0

After about 20 minutes of rebuilding (progress 61%), something goes horribly wrong:

2306 Mon Mar 19 20:27:04 2012 Storage Service Bad block table is 80% full.: Physical Disk 0:0:0 Controller 0, Connector 0

2306 Mon Mar 19 20:27:05 2012 Storage Service Bad block table is 80% full.: Physical Disk 0:0:1 Controller 0, Connector 0

Again hundreds of lines about media and sense errors:

2350 Mon Mar 19 20:27:05 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:1 Controller 0, Connector 0

2350 Mon Mar 19 20:27:06 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:0 Controller 0, Connector 0

2095 Mon Mar 19 20:27:06 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0

2095 Mon Mar 19 20:27:14 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0

After another minute, disaster:

2307 Mon Mar 19 20:28:16 2012 Storage Service Bad block table is full. Unable to log block 0000000011DE5A11.: Physical Disk 0:0 Controller 0, Connector 0

2307 Mon Mar 19 20:28:16 2012 Storage Service Bad block table is full. Unable to log block 0000000011DE5A11.: Physical Disk 0:0 Controller 0, Connector 0

And finally:

2048 Mon Mar 19 20:28:17 2012 Storage Service Device failed: Physical Disk 0:0:0 Controller 0, Connector 0

Same story repeated rebooting and having the sync start again. My conclusion would be: get a new server, but i'm open to your interpretations.

6 Posts

March 27th, 2012 03:00

Sorry, i swapped the pdisk ids:

0:0:1 is the working pdisk from the original raid set

0:0:0 is the failed/replaced pdisk

6 Posts

May 8th, 2012 10:00

It was a punctured raid issue indeed

No Events found!

Top