Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products

VxRail: How to Determine if a Storage Drive Requires Replacing in a VxRail vSAN Cluster and the Preparation

Summary: This article's intention is general awareness of VxRail Remote Support's procedure for determining when a storage drive should be replaced. Verification steps, information about log collection, and special circumstances are addressed. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Instructions

About VxRail Remote Support's failed drive troubleshooting and hardware replacement process:

Timely resolutions are a main focus of VxRail Support's approach to situations where drives may require replacement. In general, all that is required to send replacement drives is to identify a faulted disk and ensure that the correct parts are matched up for work order. Usually, this can be done quickly and arrangements should be made to send any needed replacements quickly.

Ensure that drive replacement is the appropriate solution and the correct hardware information has been identified to avoid further issues. Sometimes, it may appear that a drive has failed because a vCenter or host clients show a disk or disk group in an unexpected state. Even if a drive shows Permanent Device Loss (PDL), that drive may not be faulted, and a replacement may not resolve the issue.

The below steps are intended to be quick, and maximize the chance of identifying the root issue. This provides an appropriate resolution, and any additional impacts or problems that should be addressed. It is expected that all verification steps (barring complications) are done and a determination is made to send a disk replacement in less than 30 minutes. This is unless checks indicate an issue that requires a different approach. 


Steps to determine if hardware replacement should resolve the issue:

  1. Check in iDRAC (Dell) or BMC (Phoenix or Quanta) or their hardware logs for drives marked as failed. Other than SATADOM, M.2, or other devices unrelated to the vSAN storage, if drives are marked as failed here, this generally means that the hardware is faulted and must be replaced. 
Note: Even if a drive shows failure in hardware sensors and on the vSAN level, sometimes it shows the same even after replacing the drive. In these cases, the expected next point of troubleshooting is the backplane on the host.
 
  1. Check vCenter and the host to determine if a vSAN drive is marked PDL. On a host with a suspected drive failure, use command line:
vdq -qH                        (check for PDL of '1' indicating drive failure)
esxcli vsan storage list       (if 'CMMDS' parameter is "false" then communication to that disk is lost)
  1. Determine if the 'failed' disk is real or if it is the faulted drive: 
  • If the Name of a drive is missing (the name should start with naa) and only the UUID appears, this may occur due to a bad disk partition and or disk group metadata. This does not include any hardware faults. See Dell VxRail: VSAN drives 'Not mounted on this host' and 'Ineligible for use by VSAN' cannot be added to VSAN disk groups (Customer Correctable) about these "ghost devices" or "phantom drives" and how to address them. Sometimes, if steps in the article do not resolve these situations, a factory reset of the node is the best way to move forward.
  • For clusters with Deduplication and Compression enabled, failure may be indicated on a cache drive when a fault is present on a capacity drive in the disk group.
  • When disk groups are formatted to be able to deduplicate (Deduplication and Compression' enabled in the cluster), a failed capacity drive may be removed from the vSAN, but the remaining drives from its former disk group no longer function in the vSAN without it there. You may see failures or alerts related to other drives' vSAN operational state. This does not mean that other drives have hardware faults. With deduplication, the entire disk group must be re-created. See the process at Guide to manually replace disks using vCenter on Dell VxRail clusters 
Screenshot of absent VSAN drives 
In this image, the bottom drive is a "ghost device" where a failed capacity drive used to be. The top drive shows failures in 'Operational…' columns but does not indicate a hardware fault. Removing the disk group from the vSAN should result in the physically healthy disks showing up again. You can re-create the disk group later, after replacing any drives that are failed.


Steps AFTER determining that drive replacement should resolve the issue:  

  1. The VxRail Support Engineer (TSE) checks for and notes any additional issues that might impact drive replacement or that must be addressed afterwards.
  2. Collect and upload logs to the Service Request (SR) if not done already. Expected logs are TSR logs (if Dell hardware environments), the VxRail Manager log bundle, and vCenter + host logs.
  3. TSE verifies user contact information, and shipping details are correct.
  4. The user chooses between replacing drives themselves or if a field resource must be sent on site to do it.
  5. TSE updates case notes with details about the situation, upcoming disk replacement, and any special considerations. 
  6. TSE provides node replacement documentation (users can also download from SolVe). If 'De-duplication and Compression' is enabled on the cluster, the TSE should attach the article Guide to manually replace disks using vCenter on Dell VxRail to the SR, ensure it is provided to the user, and include it in 'CE Instructions' if a field resource is sent.
  7. The VxRail Support Engineer creates a work order for parts and labor (if needed). The SR ownership moves from VxRail 'Remote Support' to 'Field Support'. If the original SR was not for a failed drive, the TSE should create a new SR for the work order and retain ownership of the original one.
  8. Dell's scheduling team reaches out to the user contact to finalize shipping and work order details.
  9. VxRail Remote Support may be reengaged as needed by the user and or partner or by on-site field resources through the original SR or a new one.

Affected Products

VxRail 460 and 470 Nodes, VxRail Appliance Family, VxRail Appliance Series, VxRail D560F, VxRail E Series Nodes, VxRail E460

Products

VxRail Appliance Family, VxRail G Series Nodes, VxRail D560, VxRail E460, VxRail E560, VxRail E560F, VxRail E560N, VxRail E660, VxRail E660F, VxRail E660N, VxRail E665, VxRail E665F, VxRail E665N, VxRail G560, VxRail G560F, VxRail P Series Nodes , VxRail P470, VxRail P570, VxRail P570F, VxRail P580N, VXRAIL P670F, VxRail P670N, VxRail P675F, VxRail P675N, VxRail S Series Nodes, VxRail S470, VxRail S570, VxRail S670, VxRail Software, VxRail V Series Nodes, VxRail V470, VxRail V570, VxRail V570F, VXRAIL V670F, VxRail VD-4510C, VxRail VD-4520C, VxRail VE-660, VxRail VE-6615, VxRail VP-760, VxRail VP-7625, VxRail VS-760 ...
Article Properties
Article Number: 000158405
Article Type: How To
Last Modified: 16 Apr 2025
Version:  10
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.