Hot spare does not take over when failed disk, virtual disk goes to fail state

Question

Hi everyone,

Need some help on my issue below.

We have PowerVault MD3200 and MD1200 as extension. Disk used is Seagate 600GB SAS.

Earlier, there was unreadable sector error on the system. From log, it shows physical disk 1,3 was a culprit. I had marked this disk as failed and use option Replace in MDSM to replace this disk with our unassigned disk at 1,11. Now Recover Guru does not report any unreadable sector.

But, the virtual disk still shows as degraded although reconstruction already finished after some 1 hour +. Then, we hit another issue when physical disk at 1,3 was failed and our virtual disk goes to failed state. The problem is when failed disk happen, our spare disk does not take over. I try to manually fail the disk after revive it back but spare disk still silent.

Basically, we hit two problems here and I would like to know what action can I take in order to resolve this issue

1. Virtual disk goes to degraded state after reconstruction finish before another failed disk happen. How to ensure virtual disk goes to optimal state?

2. Failed disk happen and spare disk does not take over, resulting in my virtual disk goes to failed state.

Thank you

DELL-Josh Cr · Answer

Hi,

What RAID level are you using? Are there any errors in the logs? It could be that there are multiple drives with bad blocks that are then copying bad blocks to that drive when the drive tries to rebuild. Did you replace the drive in 1:3?

DELL-Josh Cr · Answer

You can try to run a consistency check and see if it can fix the errors.

stress_043 · Answer

Hi Josh,

Sorry I think I need to correct my statement earlier.

Physical disk 1:2 was a culprit based on unreadable sector error log. So I had used Replaced function to replaced this disk with physical disk 1:9. I am using RAID 10. The unreadable error log did not show any error after that but virtual disk still in degraded. After some time, physical disk 1:3 goes to failed and our same virtual disk was on failed status.

Because it using RAID 10, physical disk 1:9 mirrored to 1:3. Because 1:9 contained bad sector data that was copied from 1:2 couple with failed disk on 1:3, the mirror does not work. That's what I can think of.

I did not replace 1:3 because I am afraid of data loss. As workaround, I just revive back disk 1:3 so that system can go live again but it can goes down anytime.

I had another system with exactly same issue, but the reconstruction happen and complete when we assign two hot spare. As for our current system, it had 1 spare.

stress_043 · Answer

Dear Josh,

Sorry for late response.

After some time, I had decided to propose destroy and recreate VD to user since nothing much I can do.

User turn off the VM then we do power cycle to ensure no more I/O. After that we destroy the VD, replaced physical disk at slot 3 and then recreate back the VD.

Then the system goes back to optimal without any signs of unreadable sector error.

Thank you.

PowerVault

Hot spare does not take over when failed disk, virtual disk goes to fail state

Was this post helpful?