Dell EMC VxRail: Virtual Machine (VM) is inaccessible during VSAN cluster node expansion test

Résumé: During a VSAN cluster node expansion task, a production VM became inaccessible for several hours resulting in data unavailable.

Cet article concerne Cet article ne concerne pas Cet article n’est associé à aucun produit spécifique. Toutes les versions du produit ne sont pas identifiées dans cet article.

Symptômes

During a VSAN cluster node expansion task, a production VM became inaccessible for several hours resulting in data unavailable.

During the node expansion test, three nodes are removed from maintenance mode after the VxRail expansion task. Nonproduction VMs are migrated to these nodes. 
Simultaneously, DRS (which was set to Fully Automated) began to move workload.
There was also a network configuration issue on this migrated VMs.
To resolve the network configuration on the VMs, the nonproduction VMs were migrated back to the existing nodes.
During that time, customer observed VMs becoming inaccessible, which caused Data Unavailable (DU).

2021-08-23T16:46:03.444+08:00 INFO vsan-mgmt[08861] [VsanHealthSummaryLogUtil::PrintHealthResult opID=noOpId] Cluster VxRail-Virtual-SAN-Cluster  Overall Health : red
   Group data health : red
      Test objecthealth health : red
         Overview: Health/Objects  ObjectCount
                   (Healthy, 413), (Datamove, 13), (Reduced-Availability-With-No-Rebuild-Delay-Timer, 84),

2021-08-23T16:50:23.911+08:00 INFO vsan-mgmt[08861] [VsanHealthSummaryLogUtil::PrintHealthResult opID=noOpId] Cluster VxRail-Virtual-SAN-Cluster  Overall Health : red
   Group data health : red
      Test objecthealth health : red
         Overview: Health/Objects  ObjectCount
                   (Healthy, 364), (Datamove, 4), (Reduced-Availability-With-Active-Rebuild, 1), (Reduced-Availability-With-No-Rebuild-Delay-Timer, 131), (Inaccessible, 11),

2021-08-23T16:53:41.081+08:00 INFO vsan-mgmt[08861] [VsanHealthSummaryLogUtil::PrintHealthResult opID=noOpId] Cluster VxRail-Virtual-SAN-Cluster  Overall Health : red
   Group data health : red
      Test objecthealth health : red
         Overview: Health/Objects  ObjectCount
                   (Healthy, 318), (Datamove, 2), (Reduced-Availability-With-Active-Rebuild, 3), (Reduced-Availability-With-No-Rebuild-Delay-Timer, 158), (Inaccessible, 29),


Test nodes are put into maintenance mode by no data migration mode
2021-08-23T07:14:28.848Z: [UserLevelCorrelator] 12121590652us: [esx.audit.maintenancemode.exited] The host has exited maintenance mode.
2021-08-23T08:52:28.717Z: [UserLevelCorrelator] 18001459300us: [esx.audit.maintenancemode.entering] The host has begun entering maintenance mode.
2021-08-23T08:52:30.346Z: [UserLevelCorrelator] 18003088181us: [esx.audit.maintenancemode.entered] The host has entered maintenance mode.
2021-08-23T11:48:19Z bootstop: Host is rebooting
2021-08-23T11:52:22.478Z: [UserLevelCorrelator] 28795220410us: [esx.audit.maintenancemode.exited] The host has exited maintenance mode.
mode 0
2021-08-23T08:52:18.681Z info clomd[2167677] [Originator@6876] CLOMWhatIfEntityDecom: Starting decom on entity 611fefdd-3160-e572-eb47-78ac444cf5b0, mode 0, ensureDurability 0 wipeDisk 0, entity type is CdbObjectNode, use static dedupRatio 1.000000, what-if reason 0, dedupScope 0, encryption 0



The production inaccessible VM will not come back until all these newly added nodes are taken out of maintenance mode.
2021-08-23T19:52:45.831+08:00 INFO vsan-mgmt[08861] [VsanHealthSummaryLogUtil::PrintHealthResult opID=noOpId] Cluster VxRail-Virtual-SAN-Cluster
Overall Health : red
   Group data health : red
      Test objecthealth health : red
         Overview: Health/Objects  ObjectCount
                   (Healthy, 474), (Datamove, 10), (Reduced-Availability-With-Active-Rebuild, 14), (Reduced-Availability-With-No-Rebuild, 3), (Inaccessible, 11),

Cause

This occurred due to the nodes being placed into maintenance mode with "No data migration". 

The production VMS during the test were migrated to new nodes under RAID 5. When the nodes were placed into maintenance mode to adjust the network configuration of the VMs, the existing cluster nodes lost control over the production VM data blocks.

Résolution

When using RAID 5 protection mode, use "Ensure accessibility" when placing the node in Maintenance mode. 
Check that there is no existing VSAN resynchronization data activity and that DRS is not running.
If using vSphere 7.x, check Skyline Health and Data and or VSAN Object Health. Do not proceed with any activity if any errors exist.

Produits concernés

VMware VSAN
Propriétés de l’article
Numéro d’article: 000191096
Type d’article: Solution
Dernière modification: 16 Feb 2026
Version:  3
Trouvez des réponses à vos questions auprès d’autres utilisateurs Dell
Services de support
Vérifiez si votre appareil est couvert par les services de support.