1 Nickel

What might cause multiple failures of drives in the same slot?

I wonder if anyone has seen a similar thing before. I have a VNX5500 to which I added an additional enclosure and drives back in June. On the 29th August, one of the drives in the new enclosure failed (prior to this, there were numerous Soft SCSI Bus Errors for the drive concerned):

Brief Description: CLARiiON call-home event number 0x712789a0 Host SPA Storage Array CKMxxxxxx SP N/A SoftwareRev 7.33.2 (0.51) BaseRev Description Drive(Bus 0 Encl 2 Slot 5) taken offline. SN:Z1xxxxxx . TLA:005050144PWR. Reason:Drive Handler(0x00c3)

A replacement drive was shipped, but when inserted on 2nd September it wasn't recognised (it appeared to fail before the replacement wizard completed). Another replacement was shipped and failed about 6 hours after installation. Another replacement was shipped and lasted around 9 days, until the 18th September when it failed too. On that occasion we noticed that several LUNs went intermittently unavailable for a couple of minutes (causing some user disruption) when the drive "failed". Eventually, as part of some information gathering, a colleague inadvertently pulled out and reseated the "failed" drive - at which point (the 29th October)  it came back online and has worked until a couple of days ago (when it went offline with similar LUN issues, though these were less disruptive as they happened in the middle of the night), I think another drive is on its way to us as I write!

Interestingly, when it failed one of the SPs logged a set of Unit Shutdown errors for the *adjacent* drive... whether those are the cause of the LUN disruption, I don't know.

I'm assuming this is not a common situation, but I've made little headway in getting it investigated so wonder if the wider community has any ideas?



Labels (1)
Tags (1)