Unsolved
18 Posts
0
1462
Dell Poweredge T620 - Data Drives not mounting
We have a T620, running SBS2011.
It's due to be replaced literally within the next couple of weeks, but following a reboot, the data drives are not accessible.
Log messages include:
Unrecoverable medium error during recovery on PD 06(e0x20/s6) at 280: Controller 0 (PERC H710 Adapter)
Controller event log: Unexpected sense: PD 06(e0x20/s6) Path 5000cca041254641, CDB: 28 00 00 00 02 80 00 00 08 00, Sense: 3/11/14: Controller 0 (PERC H710 Adapter)
Checking OMSA we have 2 disks in a Raid 1 - all good. 4 disks in a Raid 5 - degraded.
Looking at the physical disks, one of them is showing 'foreign'. Power is spun up. Status is non-critical.
Is this likely to be what's causing the data drives to be inaccessible?
Can we do something to get the data back up?
Can we survive two or maybe 3 weeks like this, or do we need to replace the drive?
We back up to USB every night.
Anything else I need to know / do?!
Many thanks.
Calc8
18 Posts
0
March 17th, 2021 11:00
Log extract below, if that helps?
I'm wondering if there is also an error on the No. 6 HDD - could that be causing the problem if the No. 5 HDD has just failed?
Would replacing the failed 'foreign' drive help.
Does a replacement need to be identical to the one it replaces - or will any 600GB SAS 15,000 drive be OK?
Thanks.
PERC H710 Adapter 0:
4: Raw Sense for PD 6: f0 00 03 00 00 02 80 18 00 00 00 00 11 14 00 00 00 00 00 00 f7 ca 00 00 00 00 01 00 02 80 00 00
03/17/21 16:20:14: DEV_REC:Medium Error DevId[6] devHandle b RDM=8047d200 retires=0
03/17/21 16:20:14: DM_PerformSenseDataRecovery: MedErr is for: cmdId=2400, ld=1, src=4, cmd=1, lba=5b9f20, cnt=20, rmwOp=0
03/17/21 16:20:14: -> recoveryChild: ld=1 orgLi=0 recPhysArm=0 badPhysArm=0 doneFun=0 sRef=0 eRef=0 recFlags=0
03/17/21 16:20:14: -> RecParent: cmdId=f3, src=1, cmd=1, lba=880, cnt=8, rmwOp=0, refs=0/7
03/17/21 16:20:14: ErrLBAOffset (0) LBA(280) BadLba=280
03/17/21 16:20:14: EVT#303869-03/17/21 16:20:14: 111=Unrecoverable medium error during recovery on PD 06(e0x20/s6) at 280
03/17/21 16:20:14: Invalidating Line due to failed XOR!!! cmd=843efb20
03/17/21 16:20:14: EVT#303870-03/17/21 16:20:14: 113=Unexpected sense: PD 06(e0x20/s6) Path 5000cca041254641, CDB: 28 00 00 00 02 88 00 00 08 00, Sense: 3/11/14
03/17/21 16:20:14: Raw Sense for PD 6: f0 00 03 00 00 02 88 18 00 00 00 00 11 14 00 00 00 00 00 00 f7 ca 00 00 00 00 01 00 02 88 00 00
03/17/21 16:20:14: DEV_REC:Medium Error DevId[6] devHandle b RDM=8047b600 retires=0
03/17/21 16:20:14: DM_PerformSenseDataRecovery: MedErr is for: cmdId=2400, ld=1, src=4, cmd=1, lba=5b9f20, cnt=20, rmwOp=0
03/17/21 16:20:14: -> recoveryChild: ld=1 orgLi=0 recPhysArm=0 badPhysArm=0 doneFun=0 sRef=0 eRef=0 recFlags=0
03/17/21 16:20:14: -> RecParent: cmdId=f3, src=1, cmd=1, lba=888, cnt=8, rmwOp=0, refs=8/f
03/17/21 16:20:14: ErrLBAOffset (0) LBA(288) BadLba=288
03/17/21 16:20:14: EVT#303871-03/17/21 16:20:14: 111=Unrecoverable medium error during recovery on PD 06(e0x20/s6) at 288
03/17/21 16:20:14: Invalidating Line due to failed XOR!!! cmd=843efb20
03/17/21 16:20:14: EVT#303872-03/17/21 16:20:14: 113=Unexpected sense: PD 06(e0x20/s6) Path 5000cca041254641, CDB: 28 00 00 00 02 80 00 00 08 00, Sense: 3/11/14
03/17/21 16:20:14: Raw Sense for PD 6: f0 00 03 00 00 02 80 18 00 00 00 00 11 14 00 00 00 00 00 00 f7 ca 00 00 00 00 01 00 02 80 00 00
03/17/21 16:20:14: DEV_REC:Medium Error DevId[6] devHandle b RDM=80469000 retires=0
03/17/21 16:20:14: DM_PerformSenseDataRecovery: MedErr is for: cmdId=2400, ld=1, src=4, cmd=1, lba=5b9f20, cnt=20, rmwOp=0
03/17/21 16:20:14: -> recoveryChild: ld=1 orgLi=0 recPhysArm=0 badPhysArm=0 doneFun=0 sRef=0 eRef=0 recFlags=0
03/17/21 16:20:14: -> RecParent: cmdId=c4, src=1, cmd=1, lba=880, cnt=8, rmwOp=0, refs=0/7
03/17/21 16:20:14: ErrLBAOffset (0) LBA(280) BadLba=280
DELL-Chris H
Moderator
Moderator
•
8.8K Posts
0
March 17th, 2021 13:00
Calc8,
Would you access the controller and then under Physical Disks let me know the status of all the drives? Also would you confirm the status of the Virtual Disk as well? I ask as losing just a single drive from a 4 drive raid 5 shouldn't cause you to not be able to access the data.
Let me know what you see.
Calc8
18 Posts
0
March 18th, 2021 02:00
Hi Chris
Thanks for your response.
2x OS discs and 3x data discs are all:
The 1x data disc we know about is:
Virtual Disks 0: Ready
Virtual Disk 1: Degraded.
If I expand Disk 1, I can see the 3x good data disks online.
I have a replacement HDD due here today.
Calc8
18 Posts
0
March 18th, 2021 02:00
In Windows Logs/System I'm getting some errors:
Disk 0:1:6 is one of the 'good' data disks...
DiegoLopez
4 Operator
4 Operator
•
2.7K Posts
0
March 18th, 2021 08:00
Ok, so let's answer some of your questions first:
Is this likely to be what's causing the data drives to be inaccessible? -> Exactly. Probable there were several disks damaged either at the same time or one after anoter until the PERC started to send the errors. Maybe one of the disks was in predictive failure some time ago and you didn't noticed. It's hard to tell without having complete PERC logs.
Can we do something to get the data back up? -> Yes, you can try to import the foreign configuration.
Can we survive two or maybe 3 weeks like this, or do we need to replace the drive? -> Eventually, you will need to replace the drives that had media errors. You can try not do it, but that will be a risk. So it will depend on the importance of the service running on this server for you.
So, first of all, make sure to have a backup as soon as possible. The data might be already damaged or could potentially be completely inaccessible during these operations.
Does a replacement need to be identical to the one it replaces - or will any 600GB SAS 15,000 drive be OK? -> It should be a replacement disk. As similar as possible. We could let you know some Dell part number of drives similar to the one you have if you send us a photo of the label of the disk.
Check this article about importing a Foreign Config:
How to import a “Foreign Configuration” in the RAID Controller using the System Setup Menu https://dell.to/2QjVd0j.
Also in video: https://dell.to/3ty4mk3
More updated video: https://dell.to/2QavAyM
If you are able to import the drive foreign configuration, then you should check and let us know PERC and Hard drives firmware version. They might be outdated.
Hope this helps.
Regards.
Calc8
18 Posts
0
March 18th, 2021 09:00
Hi Diego
Thanks for this. I've now gone through the steps and the drive which was 'foreign' is now 'rebuilding' (50% so far).
We still don't have access to the data drive - is that because it needs to complete the rebuild first?
PERC firmware is 21.2.0-0007
Driver 6.801.05.00
Storport Driver 6.1.7601.18386
I think the HDD driver is:
6.1.7601.18231
Do we need to do anything to upgrade these?
Do I need to do anything to get access again if it doesn't automatically happen once the rebuilding is complete?
I have a replacement drive. Should I install this asap after taking a back-up?
Do I need to do anything other than hot-swap it?
How do I know which physical drive to swap? They are numbered 0:1:4 / 0:1:5 / 0:1:6 / 0:1:7 and in a bank together with the 2 OS disks above. Logic says 05 would be the second in, but there is seemingly nothing else to confirm this.
I have to go out now, so it will be a while before I can respond to anything else you say.
Many thanks for your help so far.
Calc
Calc8
18 Posts
0
March 18th, 2021 10:00
I've also just notices a message:
The Virtual Disk has bad blocks. For more details, see the Virtual Disk Bad Block Management section in the Online Help
Any advice?
Dell-DylanJ
4 Operator
4 Operator
•
2.9K Posts
0
March 18th, 2021 10:00
Hi Calc,
RAID 5 implements a distributed parity system, in order to achieve it's 1-drive fault tolerance. With the first drive failure, the maximum fault tolerance was reached. However, the virtual disk bad block message would indicate that there are further errors in the array. Likely, what you would see are bad blocks on other drives that corrupt a data stripe.
If this data is critical, I would highly recommend contacting a data recovery service at this point at time. If you'd like to send me a PM, I'd be happy to work out a way to review a controller log (if you're interested).
I'd also give this article a look. The double-fault is what we'd be interested in for this case.
https://dell.to/30Wn7la
Dell-DylanJ
4 Operator
4 Operator
•
2.9K Posts
0
March 18th, 2021 15:00
I'd expect that the array has a double fault and would recommend recreating it, since you've got the backup available. Before you do that though, I'm guessing that backup is file level? If it is, you can proceed without any additional concerns. If it's a block level backup, then there is a possibility that the backup would have backed up bad data.
Without more information, I have a hard time saying it couldn't be anything else, but this would be my first suspect. Nothing else really comes to mind as a suspicion, though. With this message, I've seen some folks clear the bad blocks and drive on without issue, other times, it's a much larger problem. The log I mentioned earlier is what would really reveal the extent of the issue.
Calc8
18 Posts
0
March 18th, 2021 15:00
Thanks. All 4 physical data disks are now showing spun up and on line. But they still don't seem to have become available at all.
The virtual disk is showing the bad blocks message.
We take daily full back ups to an external HDD. The last was about 4 days ago and so, because of the problems we've had for the last couple of days, will actually be pretty much 100%up to date.
Is there something else which could be preventing the disks from being available at all?
Thanks
sajithk
1 Rookie
1 Rookie
•
6 Posts
0
March 18th, 2021 21:00
degrading RAID with one disk not make unable to access, you should have able to access the date in it
and also one disk gone foreign this can be related to that restart you mentioned.
that foreign disk is this the original HDD on the server or its disk you replaced with a failed one, if so you need to clear it, then it will start rebuilding with your RIAD 5
Calc8
18 Posts
0
March 19th, 2021 09:00
Hi Dylan
Thanks for your suggestions and looking at the log for me - although unfortunately I understand that was completely flooded out with a repeating failed command error.
Can you please advise what you'd suggest we try next?
Many thanks
Calc8
18 Posts
0
March 19th, 2021 09:00
Hi sajithk - thanks for your comments.
Calc8
18 Posts
0
March 19th, 2021 10:00
Do you have a link to a guide for how to recreate it, as suggested above?
Many thanks
DELL-Chris H
Moderator
Moderator
•
8.8K Posts
0
March 19th, 2021 11:00
The link here can be used to recreate the virtual disk.
Let us know how it goes.