Re: What might cause multiple failures of drives in the same slot?

Thanks Glen

I would be keen to restore the ATS functionality (which I think is HardwareAcceleratedLocking - see ETA 207784: VNX: Storage Processors may restart if VMware vStorage APIs for Array Integration (VAAI)...) - I think our decision was based on that discussion or an earlier version of the ETA before there was a hotfix available for our OE version, I'll try to tackle getting that hotfix and installing it so we can turn on ATS again!

I have replaced the disk , but as I can't reopen the "CallHome" SR (it keeps getting automatically closed because I've been sent a replacement disk!) I've now opened another SR (76519062) to chase the underlying problem. I think this saga might be a case study worthy of review by someone who has oversight of the EMC support process...


John, keep in mind that the hotfix install is not that much different to a full OE upgrade...SP Reboots included.

Out of interest, what is the drive type that's failing ? We've had particular issues with 600GB 15k drives not long ago. There's an ETA for that too. (blogged about here EMC VNX/VNX2/VMAX: 600GB 15k Drive Increased failure rates – with fix. – Pragmatic IO

The resolution involved drive firmware upgrades and a "double binding" procedure.

You'd expect replacement drives to be shipped with newer firmware, but something to be aware of.

From the latest spcollects the drive in 0-2-5 is a 3TB Seagate NL-SAS. The fact that multiple drives have failed in this one location does seem to point to that particular slot but it could also be one of the LCC's (there are two in the DAE).

The reset/replacement of the LCC cables (the external DAE cables between DEA's) may have an effect, but the cables carry the data between DAEs. The LCC card might be a better place to look as the LCC card is the one that is connected internally to each disk slot (it's like a switch). Re-seating the LCC's might be a better choice before replacing the whole DAE as the LCC's connect to each disk slot.

Whenever you re-seat a cable or the LCC, you will break the connection on that bus side for any DAE after Bus 0 Enclosure 2 (I see that there is not an enclosure 3, so that should not be an issue). So if you re-seat the cable on the A-Side, you lose connection to the drives from the SPA side, but the SPB side is still active (the LUNs would trespass to the SPB side in this case). You should treat the re-seat like an NDU - perform it during a slow period.