This post is more than 5 years old
6 Posts
0
249218
March 26th, 2012 02:00
Weird RAID / disk fault issue
Hi all,
just a month ago a disk in our raid-1 setup (SAS6/ir @ R300, 2x SATA 250B) failed. We ordered 2 SATA 500GB replacement disks from Dell, and proceeded to hot-swap the first one in place of the faulted one. The controller started re-syncing, but the new disk failed at 61% progress. I had it start again, but failed again at 61%. So i tried the 2nd newly ordered disk, but even this one failed (this time, at 57%).
Dell diagnostics reports no error on the disks, but nevertheless they are routinely marked as failed when they reach the aforementioned progress during re-syncing. Any idea ??
Thanks for the effort,
Mauro
No Events found!


DELL-Chris H
7 Practitioner
•
9.7K Posts
•
48K Points
1
March 27th, 2012 08:00
Mauro,
Was/Is drive 0:0:1 flashing amber and green? It may be a punctured stripe, I would need the controller log to confirm that. The sense key was indicating a Medium Unrecovered Read Error. The reason Dset failed was the CentOS. What I would like to do is boot to this ISO, it is OMSA Live - linux.dell.com/.../omsa601usa.iso
Then run the dset from within the OMSA Live environment. Email the zip file to me.
Thank you.
DELL-Chris H
7 Practitioner
•
9.7K Posts
•
48K Points
0
March 26th, 2012 09:00
Mauro,
What is the OS on the server, and is it accessable? If so we can run a dset report and pull the controller log to see why the drives failed. Did the original drives show an amber light or an amber and green?
maurotdo
6 Posts
0
March 26th, 2012 10:00
Hi Chris, the OS is a Linux CentOS 5.5 and is accessable, as the raid is degraded but fully functional.
The original disk failed electrically (i suppose) , as it showed no blink activity, neither it was detected by the controller.
The replacement ones switch to an amber blinking light when they fail during resyncing
DELL-Chris H
7 Practitioner
•
9.7K Posts
•
48K Points
0
March 26th, 2012 10:00
Mauro,
Download the Dset tool and run the report.
Dset - www.dell.com/.../DriverDetails
Dset instructions - www.dell.com/.../DriverDetails
I am going to send you an email with my contact information. Could you reply to that email with the report and I can take a look at it for you?
maurotdo
6 Posts
0
March 27th, 2012 03:00
Hi Chris,
i was unable to get dset running, anyway i recovered a storage log from with the omsa software. I've mailed it to you. The interesting bits are:
(0:0:0 is the working pdisk from the original raid set)
(0:0:1 is the replacement pdisk)
Here is when the original pdisk at 0:0:0 goes missing in action.
2057 Mon Feb 20 10:14:16 2012 Storage Service Virtual disk degraded: Virtual Disk 0 (Virtual Disk 0) Controller 0 (SAS 6/iR Adapter)
2057 Mon Feb 20 10:35:41 2012 Storage Service Virtual disk degraded: Virtual Disk 0 (Virtual Disk 0) Controller 0 (SAS 6/iR Adapter)
2329 Mon Feb 20 11:39:27 2012 Storage Service SAS port report: SAS wide port 1 lost link on PHY 0.: Controller 0 (SAS 6/iR Adapter)
2330 Mon Feb 20 11:39:37 2012 Storage Service SAS port report: SAS wide port 1 restored link on PHY 0.: Controller 0 (SAS 6/iR Adapter)
2057 Mon Mar 19 09:05:28 2012 Storage Service Virtual disk degraded: Virtual Disk 0 (Virtual Disk 0) Controller 0 (SAS 6/iR Adapter)
2330 Mon Mar 19 18:58:25 2012 Storage Service SAS port report: SAS wide port 1 restored link on PHY 0.: Controller 0 (SAS 6/iR Adapter)
2199 Mon Mar 19 18:58:26 2012 Storage Service The virtual disk cache policy has changed.: Virtual Disk 0 (Virtual Disk 0) Controller 0 (SAS 6/iR Adapter)
The next month, the replacement disks are shipped, and i replace the seemingly failed pdisk at 0:0:0
2052 Mon Mar 19 18:58:26 2012 Storage Service Physical disk inserted: Physical Disk 0:0:0 Controller 0, Connector 0
2065 Mon Mar 19 18:58:27 2012 Storage Service Physical disk Rebuild started: Physical Disk 0:0:0 Controller 0, Connector 0
The rebuild starts, and i start getting hundreds of log lines like:
2095 Mon Mar 19 20:02:47 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0
2095 Mon Mar 19 20:02:52 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0
2350 Mon Mar 19 20:02:56 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:1 Controller 0, Connector 0
2350 Mon Mar 19 20:03:00 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:0 Controller 0, Connector 0
2095 Mon Mar 19 20:03:01 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0
2095 Mon Mar 19 20:03:02 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0
2095 Mon Mar 19 20:03:02 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0
2350 Mon Mar 19 20:03:02 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:1 Controller 0, Connector 0
2350 Mon Mar 19 20:03:04 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:0 Controller 0, Connector 0
After about 20 minutes of rebuilding (progress 61%), something goes horribly wrong:
2306 Mon Mar 19 20:27:04 2012 Storage Service Bad block table is 80% full.: Physical Disk 0:0:0 Controller 0, Connector 0
2306 Mon Mar 19 20:27:05 2012 Storage Service Bad block table is 80% full.: Physical Disk 0:0:1 Controller 0, Connector 0
Again hundreds of lines about media and sense errors:
2350 Mon Mar 19 20:27:05 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:1 Controller 0, Connector 0
2350 Mon Mar 19 20:27:06 2012 Storage Service There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:0:0 Controller 0, Connector 0
2095 Mon Mar 19 20:27:06 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0
2095 Mon Mar 19 20:27:14 2012 Storage Service Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0, Connector 0
After another minute, disaster:
2307 Mon Mar 19 20:28:16 2012 Storage Service Bad block table is full. Unable to log block 0000000011DE5A11.: Physical Disk 0:0 Controller 0, Connector 0
2307 Mon Mar 19 20:28:16 2012 Storage Service Bad block table is full. Unable to log block 0000000011DE5A11.: Physical Disk 0:0 Controller 0, Connector 0
And finally:
2048 Mon Mar 19 20:28:17 2012 Storage Service Device failed: Physical Disk 0:0:0 Controller 0, Connector 0
Same story repeated rebooting and having the sync start again. My conclusion would be: get a new server, but i'm open to your interpretations.
maurotdo
6 Posts
0
March 27th, 2012 03:00
Sorry, i swapped the pdisk ids:
0:0:1 is the working pdisk from the original raid set
0:0:0 is the failed/replaced pdisk
maurotdo
6 Posts
0
May 8th, 2012 10:00
It was a punctured raid issue indeed