VxRail: Troubleshooting Guide for VMware ESXi hosts in a "Not Responding" state
Summary: This article provides answers to issues with virtual machines (VMs) residing on unresponsive ESXi hosts and helps return the ESXi hosts to a stable state.
Symptoms
- ESXi host reported as Not Responding on vSphere Web Client.
- Unable to manage ESXi hosts from a vSphere Web Client.
- Able to ping the Not Responding ESXi host and the VMs on it.
- Able to access the VMs through SSH or Remote Desktop Protocol (RDP) running on the ESXi host.
Cause
One of the management services on the host may have failed or gone into a Not responding state.
This requires root cause analysis about why the management services failed or stopped responding. Collect the vCenter Server, and the ESXi host logs by referencing VMware article Collecting diagnostic information for VMware ESXi before bringing the cluster and ESXi to a stable state.
Sometimes this is impossible as the node is not responding to the commands to collect logs. Review the information in the Resolution section of this article. Logs can be analyzed through Dell Support if necessary to understand the root cause, to check if the below known issues are related to the issue.
Resolution
Restarting hostd or vpxa on the host may help bring back the host manageability on the vSphere Client. This is done using an SSH session to the ESXi host.
One other response to a Not Responding ESXi host is to restart the Management agents on the ESXi hosts.
services.sh restart command. Restarting the host's management agents may impact running tasks, including impact on the guest VMs on the host.
Restarting Management agents of ESXi hosts are done directly using a CLI or SSH session to the ESXi (if SSH is enabled prior to the issue).
However, if SSH is not enabled, accessing the BMC-iDRAC Port gives access to the ESXi DCUI Screen where the management services can be restarted. See VMware article 1003490 for Restarting the Management agents in ESXi.
The DCUI may become unresponsive. Manual shutdown of the VMs using SSH or RDP is the only other option to bring back the environment to a stable state. Once done, using BMC-iDRAC Power Control, Power Cycle (reboot) the ESXi host to get the ESXi host to a stable state.
Register the VMs to stable hosts directly after shutting down the VMs thereby reducing the downtime for VMs prior to rebooting the Not Responding ESXi host. (After which, the ESXi host with issues can be rebooted.) Follow the VMware article on How to register or add a Virtual Machine (VM) to the vSphere Inventory in vCenter Server
If you are unable to power off the VMs using SSH or RDP, terminate a VM through the ESXi host SSH session. See VMware article Unable to Power off a Virtual Machine in an ESXi host
Virtual machine issues:
- Unable to power off the VMs in an ESXi host - 1014165
- Powering off an unresponsive VM on an ESXi host - 1004340
- VMs appear as invalid or orphaned in vCenter Server - 1003742
- VMs appear to be running or registered on multiple ESX/ESXi Servers - 319918
- Powering on a VM from the command line when the host cannot be managed using vSphere Client - 1038043
- Troubleshooting a VM that has stopped responding - 1007819