Highlighted
Not applicable

Multiple disk failures since deployment in 02/2017

I am curious to know if any other vxRail environments out there have had multiple disk failures since they were deployed.  We deployed our 4 node vxRail s470s back in February of this year.  We have had disk failures occur three times so far in April, May, and now earlier this week.  We also had a disk fail the Component metadata health check in June.  We had the failed disks replaced in April and May and I am going to assume once I hear back from Dell/EMC/VMware that may be the case this time too.  To correct the Component metadata health check failure the disk was removed from the Disk Group and added back.

Has anyone else out there experienced this many disk issues in their vxRail environment?

Thanks

0 Kudos
4 Replies
Highlighted
4 Beryllium

Re: Multiple disk failures since deployment in 02/2017

The frequency of disk failure you observed seems to be normal for me.

Of course it depends on the number of total disks and other environment-specific factor.

Component meta data issue you experienced is very common issue around vSAN environment.

See EMC KB#502281.

This is not actual disk issue so remove/add disk from/to disk group is right action to fix it

0 Kudos
Highlighted
Not applicable

Re: Multiple disk failures since deployment in 02/2017

Where are you seeing the disk failures? We experience disk error alerts in VxRail Manager multiple times a week that are false positive. This is a known issue with the current iDRAC version on the DELL/EMC nodes that apparently will be resolved in the next iDRAC version.

Cheers,

Brad

0 Kudos
Highlighted
4 Beryllium

Re: Multiple disk failures since deployment in 02/2017

you mean this KB?

VxRail: MYSTIC020002 Codes Get Generated Often When no Issue is Present.

Article Number 000503744

0 Kudos
Highlighted
Not applicable

Re: Multiple disk failures since deployment in 02/2017

This cracks me up, "The frequency of disk failure you observed seems to be normal for me."  We have had SANs in place that are not Dell/EMC and they don't have disks fail for years.  This environment hasn't been in production for a little more than 6 months.

We worked with vxRail Dell/EMC and VMware all weekend in working to recover the environment but all efforts failed.  Long story, short we are in recovery mode right now.  We have so many disks show as failed (including a Flash/Cache disk) and disk groups unmounted that we are working through recovery using a separate environment.

My advice to you is that if you have had disk failures more than once or hosts becoming unresponsive you should reach out to vxRail Dell/EMC.  I would hate for what has happened to us to happen to others.  Good luck!

0 Kudos