lrogers03
1 Nickel

PowerEdge r710 rebooting - storage error

I have a R710 with a Perc 6/i controller and 6 Hard Drives.  Firmware, BIOS and Drivers are up to date.

I have experienced 3 reboots in the past 2 days. 

I am seeing quite a few event 2234 in Open Manage

Controller event log: Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 31 27 00 00 59 00, Sense: 3/11/00: Controller 0 (PERC 6/i Integrated)

These are then followed up by event IDs 2235 and/or 2236

Controller event log: Fatal firmware error: Driver detected possible FW hang, halting FW. : Controller 0 (PERC 6/i Integrated)

Somewhere I read that PD 04 might be pointing to disk 4.   Is this possible?

Any help is much appreciated!

 

0 Kudos
9 Replies
8 Krypton

Re: PowerEdge r710 rebooting - storage error

Yes, it is indicating disk 4 (that is disk NUMBER 4, NOT the 4th disk) ... they are labeled starting with 0, so the 4th disk would be disk 03.

What RAID level are you using?

0 Kudos
lrogers03
1 Nickel

Re: PowerEdge r710 rebooting - storage error

This drive is part of a RAID 5 disk.

This makes sense because this disk showed failed and I reseated it about 30 days ago to make sure it was truly bad.  It came back online and showed no issues, but clearly it has problems now.

Can you confirm the replacement procedures?  Is it best to just shut down and replace?  Or can I just take the drive offline and then replace?

Thanks

 

0 Kudos
8 Krypton

Re: PowerEdge r710 rebooting - storage error

Is the drive showing predicted failure in OMSA?

Never shutdown to introduce a hot-swappable drive. In OMSA, force the drive offline (prepare to remove, etc.), then pull the drive, wait at least 60 seconds, then put the new drive in. It will probably rebuild automatically, but if it doesn't, you will need to assign the new disk as a hot-spare to start the rebuild.

Problem is that issues with a single disk should not cause the server to reboot, so you may have some corruption in your actual array. If you check the controller log, you may see references to a "puncture".

0 Kudos
lrogers03
1 Nickel

Re: PowerEdge r710 rebooting - storage error

Where do I find the controller logs?

0 Kudos
Highlighted
8 Krypton

Re: PowerEdge r710 rebooting - storage error

OpenManage, Storage, PERC, Information/Configuration (link at top of page), Export Log from the drop-down menu of Available Tasks for the controller.

0 Kudos
lrogers03
1 Nickel

Re: PowerEdge r710 rebooting - storage error

Got it.  No word "puncture" in the log.  Not exactly sure how to read it and what bits to show.  But here is a bit of the log that is repeated MANY times.

again, thanks!

 

07/31/18  9:13:01: Raw Sense for PD 4: f0 00 03 0d 04 31 22 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 83 00 02 6d 00 00

07/31/18  9:13:01: DEV_REC:Medium Error DevId[4] Tgt 4 RDM=a054c200 retires=0

07/31/18  9:13:01: MedErr is for: cmdId=3a9, ld=1, src=1, cmd=1, lba=270c9300, cnt=80, rmwOp=0

07/31/18  9:13:01: ErrLBAOffset (0) LBA(d043122) BadLba=d043122

07/31/18  9:13:02: EVT#2736599-07/31/18  9:13:02: 113=Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 23 ce 00 00 32 00, Sense: 3/11/00

07/31/18  9:13:02: Raw Sense for PD 4: f0 00 03 0d 04 23 ce 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 81 00 00 b7 00 00

07/31/18  9:13:02: DEV_REC:Medium Error DevId[4] Tgt 4 RDM=a0525200 retires=0

07/31/18  9:13:02: MedErr is for: cmdId=30c, ld=1, src=1, cmd=1, lba=270c6980, cnt=280, rmwOp=0

07/31/18  9:13:02: ErrLBAOffset (0) LBA(d0423ce) BadLba=d0423ce

07/31/18  9:13:03: EVT#2736600-07/31/18  9:13:03: 113=Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 24 2e 00 00 52 00, Sense: 3/11/00

07/31/18  9:13:03: Raw Sense for PD 4: f0 00 03 0d 04 24 2e 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 81 00 01 17 00 00

07/31/18  9:13:03: DEV_REC:Medium Error DevId[4] Tgt 4 RDM=a08fbe00 retires=0

07/31/18  9:13:03: MedErr is for: cmdId=12b, ld=1, src=1, cmd=1, lba=270c6c00, cnt=180, rmwOp=0

07/31/18  9:13:03: ErrLBAOffset (0) LBA(d04242e) BadLba=d04242e

07/31/18  9:13:04: EVT#2736601-07/31/18  9:13:04: 113=Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 26 35 00 00 4b 00, Sense: 3/11/00

07/31/18  9:13:04: Raw Sense for PD 4: f0 00 03 0d 04 26 35 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 81 00 03 1e 00 00

07/31/18  9:13:04: DEV_REC:Medium Error DevId[4] Tgt 4 RDM=a08c7400 retires=0

07/31/18  9:13:04: MedErr is for: cmdId=26e, ld=1, src=1, cmd=1, lba=270c7180, cnt=300, rmwOp=0

07/31/18  9:13:04: ErrLBAOffset (0) LBA(d042635) BadLba=d042635

07/31/18  9:13:05: EVT#2736602-07/31/18  9:13:05: 113=Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 29 39 00 00 47 00, Sense: 3/11/00

07/31/18  9:13:05: Raw Sense for PD 4: f0 00 03 0d 04 29 3a 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 81 00 06 23 00 00

07/31/18  9:13:05: DEV_REC:Medium Error DevId[4] Tgt 4 RDM=a0556c00 retires=0

07/31/18  9:13:05: MedErr is for: cmdId=1ff, ld=1, src=1, cmd=1, lba=270c7a80, cnt=200, rmwOp=0

07/31/18  9:13:05: ErrLBAOffset (1) LBA(d042939) BadLba=d04293a

07/31/18  9:13:06: EVT#2736603-07/31/18  9:13:06: 113=Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 2a 38 00 00 48 00, Sense: 3/11/00

07/31/18  9:13:06: Raw Sense for PD 4: f0 00 03 0d 04 2a 38 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 82 00 01 52 00 00

07/31/18  9:13:06: DEV_REC:Medium Error DevId[4] Tgt 4 RDM=a0526800 retires=0

07/31/18  9:13:06: MedErr is for: cmdId=2b3, ld=1, src=1, cmd=1, lba=270c7d80, cnt=280, rmwOp=0

07/31/18  9:13:06: ErrLBAOffset (0) LBA(d042a38) BadLba=d042a38

07/31/18  9:13:07: EVT#2736604-07/31/18  9:13:07: 113=Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 31 23 00 00 5d 00, Sense: 3/11/00

07/31/18  9:13:07: Raw Sense for PD 4: f0 00 03 0d 04 31 23 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 83 00 02 6e 00 00

07/31/18  9:13:07: DEV_REC:Medium Error DevId[4] Tgt 4 RDM=a054c200 retires=0

07/31/18  9:13:07: MedErr is for: cmdId=3a9, ld=1, src=1, cmd=1, lba=270c9300, cnt=80, rmwOp=0

07/31/18  9:13:07: ErrLBAOffset (0) LBA(d043123) BadLba=d043123

07/31/18  9:13:08: EVT#2736605-07/31/18  9:13:08: 113=Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 23 cf 00 00 31 00, Sense: 3/11/00

07/31/18  9:13:08: Raw Sense for PD 4: f0 00 03 0d 04 23 d0 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 81 00 00 b9 00 00

07/31/18  9:13:08: DEV_REC:Medium Error DevId[4] Tgt 4 RDM=a0525200 retires=0

07/31/18  9:13:08: MedErr is for: cmdId=30c, ld=1, src=1, cmd=1, lba=270c6980, cnt=280, rmwOp=0

07/31/18  9:13:08: ErrLBAOffset (1) LBA(d0423cf) BadLba=d0423d0

07/31/18  9:13:09: EVT#2736606-07/31/18  9:13:09: 113=Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 24 2f 00 00 51 00, Sense: 3/11/00

07/31/18  9:13:09: Raw Sense for PD 4: f0 00 03 0d 04 24 2f 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 81 00 01 18 00 00

07/31/18  9:13:09: DEV_REC:Medium Error DevId[4] Tgt 4 RDM=a08fbe00 retires=0

07/31/18  9:13:09: MedErr is for: cmdId=12b, ld=1, src=1, cmd=1, lba=270c6c00, cnt=180, rmwOp=0

07/31/18  9:13:09: ErrLBAOffset (0) LBA(d04242f) BadLba=d04242f

07/31/18  9:13:09: EVT#2736607-07/31/18  9:13:09: 113=Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 26 36 00 00 4a 00, Sense: 3/11/00

07/31/18  9:13:10: Raw Sense for PD 4: f0 00 03 0d 04 26 36 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 81 00 03 1f 00 00

07/31/18  9:13:10: DEV_REC:Medium Error DevId[4] Tgt 4 RDM=a08c7400 retires=0

07/31/18  9:13:10: MedErr is for: cmdId=26e, ld=1, src=1, cmd=1, lba=270c7180, cnt=300, rmwOp=0

07/31/18  9:13:10: ErrLBAOffset (0) LBA(d042636) BadLba=d042636

07/31/18  9:13:11: EVT#2736608-07/31/18  9:13:11: 113=Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 29 3b 00 00 45 00, Sense: 3/11/00

07/31/18  9:13:11: Raw Sense for PD 4: f0 00 03 0d 04 29 3c 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 81 00 06 25 00 00

07/31/18  9:13:11: DEV_REC:Medium Error DevId[4] Tgt 4 RDM=a0556c00 retires=0

07/31/18  9:13:11: MedErr is for: cmdId=1ff, ld=1, src=1, cmd=1, lba=270c7a80, cnt=200, rmwOp=0

07/31/18  9:13:11: ErrLBAOffset (1) LBA(d04293b) BadLba=d04293c

07/31/18  9:13:12: EVT#2736609-07/31/18  9:13:12: 113=Unexpected sense: PD 04(e0x20/s4) Path 5000cca00fa12901, CDB: 28 00 0d 04 2a 39 00 00 47 00, Sense: 3/11/00

07/31/18  9:13:12: Raw Sense for PD 4: f0 00 03 0d 04 2a 3a 18 00 00 00 00 11 00 00 80 00 4d 00 00 f7 2d 00 00 00 41 82 00 01 54 00 00



Tags (1)
0 Kudos
8 Krypton

Re: PowerEdge r710 rebooting - storage error

Yeah, that is where your 3/11/0 SC is coming from. May want to run a Consistency Check while waiting for the replacement drive.

0 Kudos
lrogers03
1 Nickel

Re: PowerEdge r710 rebooting - storage error

Ok. So, Ive never run the consistency check.  I see where I can go into the specific VD and select Check Consistency.  I also googled and see that it will cause a bit of performance issue, but any idea how long it might take to run for 4x600gb drives (1.5tb array)?

Im also assuming there is no big chance that something bad will go wrong when doing this.  I do have backups, but hate to fall back to that.

 

0 Kudos
8 Krypton

Re: PowerEdge r710 rebooting - storage error

Yes, there will be a performance hit, so run it after hours if it is heavily used ... should finish overnight. I've heard of one or two people saying a drive went offline during the CC, but I've never seen that myself. Without it, there is the chance of not reconciling the corruption before replacing the drive when it becomes impossible to do so, causing a rebuild of a new drive to fail.

0 Kudos