Calc8

18 Posts

1462

March 17th, 2021 08:00

Dell Poweredge T620 - Data Drives not mounting

We have a T620, running SBS2011.

It's due to be replaced literally within the next couple of weeks, but following a reboot, the data drives are not accessible.

Log messages include:

Unrecoverable medium error during recovery on PD 06(e0x20/s6) at 280: Controller 0 (PERC H710 Adapter)

Controller event log: Unexpected sense: PD 06(e0x20/s6) Path 5000cca041254641, CDB: 28 00 00 00 02 80 00 00 08 00, Sense: 3/11/14: Controller 0 (PERC H710 Adapter)

Checking OMSA we have 2 disks in a Raid 1 - all good. 4 disks in a Raid 5 - degraded.

Looking at the physical disks, one of them is showing 'foreign'. Power is spun up. Status is non-critical.

Is this likely to be what's causing the data drives to be inaccessible?

Can we do something to get the data back up?

Can we survive two or maybe 3 weeks like this, or do we need to replace the drive?

We back up to USB every night.

Anything else I need to know / do?!

Many thanks.

Responses(23)

C

Calc8

18 Posts

0

March 17th, 2021 11:00

Log extract below, if that helps?

I'm wondering if there is also an error on the No. 6 HDD - could that be causing the problem if the No. 5 HDD has just failed?

Would replacing the failed 'foreign' drive help.

Does a replacement need to be identical to the one it replaces - or will any 600GB SAS 15,000 drive be OK?

Thanks.

PERC H710 Adapter 0:
4: Raw Sense for PD 6: f0 00 03 00 00 02 80 18 00 00 00 00 11 14 00 00 00 00 00 00 f7 ca 00 00 00 00 01 00 02 80 00 00
03/17/21 16:20:14: DEV_REC:Medium Error DevId[6] devHandle b RDM=8047d200 retires=0
03/17/21 16:20:14: DM_PerformSenseDataRecovery: MedErr is for: cmdId=2400, ld=1, src=4, cmd=1, lba=5b9f20, cnt=20, rmwOp=0
03/17/21 16:20:14: -> recoveryChild: ld=1 orgLi=0 recPhysArm=0 badPhysArm=0 doneFun=0 sRef=0 eRef=0 recFlags=0
03/17/21 16:20:14: -> RecParent: cmdId=f3, src=1, cmd=1, lba=880, cnt=8, rmwOp=0, refs=0/7
03/17/21 16:20:14: ErrLBAOffset (0) LBA(280) BadLba=280
03/17/21 16:20:14: EVT#303869-03/17/21 16:20:14: 111=Unrecoverable medium error during recovery on PD 06(e0x20/s6) at 280
03/17/21 16:20:14: Invalidating Line due to failed XOR!!! cmd=843efb20
03/17/21 16:20:14: EVT#303870-03/17/21 16:20:14: 113=Unexpected sense: PD 06(e0x20/s6) Path 5000cca041254641, CDB: 28 00 00 00 02 88 00 00 08 00, Sense: 3/11/14
03/17/21 16:20:14: Raw Sense for PD 6: f0 00 03 00 00 02 88 18 00 00 00 00 11 14 00 00 00 00 00 00 f7 ca 00 00 00 00 01 00 02 88 00 00
03/17/21 16:20:14: DEV_REC:Medium Error DevId[6] devHandle b RDM=8047b600 retires=0
03/17/21 16:20:14: DM_PerformSenseDataRecovery: MedErr is for: cmdId=2400, ld=1, src=4, cmd=1, lba=5b9f20, cnt=20, rmwOp=0
03/17/21 16:20:14: -> recoveryChild: ld=1 orgLi=0 recPhysArm=0 badPhysArm=0 doneFun=0 sRef=0 eRef=0 recFlags=0
03/17/21 16:20:14: -> RecParent: cmdId=f3, src=1, cmd=1, lba=888, cnt=8, rmwOp=0, refs=8/f
03/17/21 16:20:14: ErrLBAOffset (0) LBA(288) BadLba=288
03/17/21 16:20:14: EVT#303871-03/17/21 16:20:14: 111=Unrecoverable medium error during recovery on PD 06(e0x20/s6) at 288
03/17/21 16:20:14: Invalidating Line due to failed XOR!!! cmd=843efb20
03/17/21 16:20:14: EVT#303872-03/17/21 16:20:14: 113=Unexpected sense: PD 06(e0x20/s6) Path 5000cca041254641, CDB: 28 00 00 00 02 80 00 00 08 00, Sense: 3/11/14
03/17/21 16:20:14: Raw Sense for PD 6: f0 00 03 00 00 02 80 18 00 00 00 00 11 14 00 00 00 00 00 00 f7 ca 00 00 00 00 01 00 02 80 00 00
03/17/21 16:20:14: DEV_REC:Medium Error DevId[6] devHandle b RDM=80469000 retires=0
03/17/21 16:20:14: DM_PerformSenseDataRecovery: MedErr is for: cmdId=2400, ld=1, src=4, cmd=1, lba=5b9f20, cnt=20, rmwOp=0
03/17/21 16:20:14: -> recoveryChild: ld=1 orgLi=0 recPhysArm=0 badPhysArm=0 doneFun=0 sRef=0 eRef=0 recFlags=0
03/17/21 16:20:14: -> RecParent: cmdId=c4, src=1, cmd=1, lba=880, cnt=8, rmwOp=0, refs=0/7
03/17/21 16:20:14: ErrLBAOffset (0) LBA(280) BadLba=280

DELL-Chris H

Moderator

•

8.8K Posts

0

March 17th, 2021 13:00

Calc8,

Would you access the controller and then under Physical Disks let me know the status of all the drives? Also would you confirm the status of the Virtual Disk as well? I ask as losing just a single drive from a 4 drive raid 5 shouldn't cause you to not be able to access the data.

Let me know what you see.

C

Calc8

18 Posts

0

March 18th, 2021 02:00

Hi Chris

Thanks for your response.

2x OS discs and 3x data discs are all:

State: Online
Power Status: Spun Up

The 1x data disc we know about is:

State: Foreign
Power Status: Spun Up

Virtual Disks 0: Ready

Virtual Disk 1: Degraded.

If I expand Disk 1, I can see the 3x good data disks online.

I have a replacement HDD due here today.

C

Calc8

18 Posts

0

March 18th, 2021 02:00

In Windows Logs/System I'm getting some errors:

There was an unrecoverable disk media error during the rebuild or recovery operation: Physical Disk 0:1:6 Controller 0, Connector 0
Unexpected sense. SCSI sense data: Sense key: 3 Sense code: 11 Sense qualifier: 14: Physical Disk 0:1:6 Controller 0, Connector 0

Disk 0:1:6 is one of the 'good' data disks...

DiegoLopez

4 Operator

•

2.7K Posts

0

March 18th, 2021 08:00

Ok, so let's answer some of your questions first:

Is this likely to be what's causing the data drives to be inaccessible? -> Exactly. Probable there were several disks damaged either at the same time or one after anoter until the PERC started to send the errors. Maybe one of the disks was in predictive failure some time ago and you didn't noticed. It's hard to tell without having complete PERC logs.

Can we do something to get the data back up? -> Yes, you can try to import the foreign configuration.

Can we survive two or maybe 3 weeks like this, or do we need to replace the drive? -> Eventually, you will need to replace the drives that had media errors. You can try not do it, but that will be a risk. So it will depend on the importance of the service running on this server for you.

So, first of all, make sure to have a backup as soon as possible. The data might be already damaged or could potentially be completely inaccessible during these operations.

Does a replacement need to be identical to the one it replaces - or will any 600GB SAS 15,000 drive be OK? -> It should be a replacement disk. As similar as possible. We could let you know some Dell part number of drives similar to the one you have if you send us a photo of the label of the disk.

Check this article about importing a Foreign Config:
How to import a “Foreign Configuration” in the RAID Controller using the System Setup Menu https://dell.to/2QjVd0j.
Also in video: https://dell.to/3ty4mk3
More updated video: https://dell.to/2QavAyM

If you are able to import the drive foreign configuration, then you should check and let us know PERC and Hard drives firmware version. They might be outdated.

Hope this helps.
Regards.

C

Calc8

18 Posts

0

March 18th, 2021 09:00

Hi Diego

Thanks for this. I've now gone through the steps and the drive which was 'foreign' is now 'rebuilding' (50% so far).

We still don't have access to the data drive - is that because it needs to complete the rebuild first?

PERC firmware is 21.2.0-0007

Driver 6.801.05.00

Storport Driver 6.1.7601.18386

I think the HDD driver is:

6.1.7601.18231

Do we need to do anything to upgrade these?

Do I need to do anything to get access again if it doesn't automatically happen once the rebuilding is complete?

I have a replacement drive. Should I install this asap after taking a back-up?

Do I need to do anything other than hot-swap it?

How do I know which physical drive to swap? They are numbered 0:1:4 / 0:1:5 / 0:1:6 / 0:1:7 and in a bank together with the 2 OS disks above. Logic says 05 would be the second in, but there is seemingly nothing else to confirm this.

I have to go out now, so it will be a while before I can respond to anything else you say.

Many thanks for your help so far.

Calc

C

Calc8

18 Posts

0

March 18th, 2021 10:00

I've also just notices a message:

The Virtual Disk has bad blocks. For more details, see the Virtual Disk Bad Block Management section in the Online Help

Any advice?

Dell-DylanJ

4 Operator

•

2.9K Posts

0

March 18th, 2021 10:00

Hi Calc,

RAID 5 implements a distributed parity system, in order to achieve it's 1-drive fault tolerance. With the first drive failure, the maximum fault tolerance was reached. However, the virtual disk bad block message would indicate that there are further errors in the array. Likely, what you would see are bad blocks on other drives that corrupt a data stripe.

If this data is critical, I would highly recommend contacting a data recovery service at this point at time. If you'd like to send me a PM, I'd be happy to work out a way to review a controller log (if you're interested).

I'd also give this article a look. The double-fault is what we'd be interested in for this case.

https://dell.to/30Wn7la

Dell-DylanJ

4 Operator

•

2.9K Posts

0

March 18th, 2021 15:00

I'd expect that the array has a double fault and would recommend recreating it, since you've got the backup available. Before you do that though, I'm guessing that backup is file level? If it is, you can proceed without any additional concerns. If it's a block level backup, then there is a possibility that the backup would have backed up bad data.

Without more information, I have a hard time saying it couldn't be anything else, but this would be my first suspect. Nothing else really comes to mind as a suspicion, though. With this message, I've seen some folks clear the bad blocks and drive on without issue, other times, it's a much larger problem. The log I mentioned earlier is what would really reveal the extent of the issue.

C

Calc8

18 Posts

0

March 18th, 2021 15:00

Thanks. All 4 physical data disks are now showing spun up and on line. But they still don't seem to have become available at all.

The virtual disk is showing the bad blocks message.

We take daily full back ups to an external HDD. The last was about 4 days ago and so, because of the problems we've had for the last couple of days, will actually be pretty much 100%up to date.

Is there something else which could be preventing the disks from being available at all?

Thanks

S

sajithk

1 Rookie

•

6 Posts

0

March 18th, 2021 21:00

degrading RAID with one disk not make unable to access, you should have able to access the date in it

and also one disk gone foreign this can be related to that restart you mentioned.

that foreign disk is this the original HDD on the server or its disk you replaced with a failed one, if so you need to clear it, then it will start rebuilding with your RIAD 5

C

Calc8

18 Posts

0

March 19th, 2021 09:00

Hi Dylan

Thanks for your suggestions and looking at the log for me - although unfortunately I understand that was completely flooded out with a repeating failed command error.

Can you please advise what you'd suggest we try next?

Many thanks

C

Calc8

18 Posts

0

March 19th, 2021 09:00

Hi sajithk - thanks for your comments.

C

Calc8

18 Posts

0

March 19th, 2021 10:00

Do you have a link to a guide for how to recreate it, as suggested above?

Many thanks

DELL-Chris H

Moderator

•

8.8K Posts

0

March 19th, 2021 11:00

The link here can be used to recreate the virtual disk.

Let us know how it goes.

1
2

View All

No Events found!

PowerEdge HDD/SCSI/RAID

Dell Poweredge T620 - Data Drives not mounting