After checking the disks with command "vdq -qH" or "vdq -Hi," you find one or more drives that show "Reason: Not mounted on this host." These, and some others show "State: Ineligible for use by VSAN." Among those that are Ineligible, there may be Boss cards, Satadom, RecoverPoint disks, and other types that should show that state. Within vCenter, you may see more drives than are expected, though the extra ones are not available to add to disk groups.
New drives may not be visible after being added or replaced following a drive failure. Drives that have nothing after "Name:" should be "naa.<numbers&letters>." These are commonly called "ghost disks" or "phantom drives." They are in a place where a now-lost device used to be. Having ghost disks present can result in various issues such as long boot times, failing validations, inability to "ensure accessibility," and host crashes in some situations.
Removing ghost drives is often possible through vCenter or by command line. However, sometimes they fail to remove as well, and in these cases you often have to fix the "Not mounted…" drive first.
If the VSAN detects failure to write to a drive, it may kick it out of the VSAN even if the hardware sensors have not seen a condition that would cause them to mark the disk as faulted yet. If that disk is a cache drive or if using deduplication and compression, the VSAN have to take the entire disk group offline. While this can lead to the above-mentioned conditions, it is not the underlying cause. The cause is corrupted metadata or disks that still have partitions (from their former configuration). They are not recovered and ready to be added back to the VSAN. This can also occur for other reasons when something is inadvertently written over disk metadata improperly. The data is intact but no longer accessible, and the VSAN have to recover storage policy compliance with a resync.
A drive with this type of partition may believe it is part of a disk group and show a cache drive where there should not be one. This cache drive does not have normal information like the capacity or name (naa info missing). You cannot remove it, however, due to the host thinking there is a drive there which is not mounted. You are also unable to correct this by re-scanning the storage controllers (this can cause a host crash) or by rebooting the host.
Fix: Any "Not mounted…" drives must have their partitions removed or hidden and any ghost disks must be removed from the environment. If partitions are masked, this should still allow them to show up as 'Eligible for use by VSAN' again. Adding them to a disk group should wipe anything that was on them during the process. After fixing that, and removing any ghost disks, you may need to reboot the host. This is done after everything is showing up properly on the host. In vCenter's Cluster > Configure > Disk Management area, you can create a disk group as normal.
Steps:
Place the host into Maintenance Mode (Ensure Accessibility). This protects data on the host from any mistakes or unexpected issues. Ensure that the rest of the VSAN is healthy as well. If there is a VSAN resync going on, this must complete before any disks or disk groups with data on them can be removed from the VSAN.
Broadcom has introduced a simple feature "Erase ESXi storage Devices" in versions 7.X and above.
Erase ESXi Storage Devices (vmware.com)
If the above does not work, use the manual steps below:
Run the below command on the host (in Putty) and copy the output to a document. Putty is not required but being able to copy and paste is helpful.
vdq -qH
Identify drives that are "Ineligible for Use by VSAN" AND either show "Reason: Not mounted on this host" or have nothing in the Name field (no naa).
Correct drives showing "Not mounted…" first:
partedUtil mklabel /dev/disks/<naa.#'s> gpt
vdq -qH
*If not, a reboot and then repeat of the previous step is needed. You should attempt to remove ghost disks before rebooting to avoid a long boot process as the host initializes disks and vSAN services attempt to start.
Remove ghost disks. You can usually do this in the same Disk Management area. If not, use command line on the host.
$ esxcli vsan storage remove -u <UUID>
Note the UUID of disks without naa names, from your output in step 1.
Check that everything is looking how it is supposed to. Refresh vCenter and check Disk Management again as well as run the command "vdq -qH" on the host to ensure that all expected drives appear and show "Eligible for use by VSAN" now. If not, reboot the host as some drives may not have been initialized yet and check again.
Create the disk group or add disks to existing disk groups as normal (if using deduplication and or compression, full disk group recreation is needed).