VxRail: Redirecting system logs to a vSAN object causes an ESXi host lock up
Summary: Host going into non-responsive state and unable to run any cli commands.
Symptoms
When systems logs or the ESXi host scratch partition are stored on the vSAN datastore, it can cause the ESXi host to become unresponsive to vCenter Server and the host CLI.
SSH to the affected node and run ls -ltrh which lists the file partition details.
To verify if the host is configured with a default scratch partition run the below command and check the local log output field.
localcli system syslog config get
Example:
SyslogConfiguration:
Default Network Retry Timeout: 180
Dropped Log File Rotation Size: 100
Dropped Log File Rotations: 10
Enforce SSL Certificates: false
Local Log Output: /scratch/log
Local Log Output Is Configured: false
Local Log Output Is Persistent: true
Local Logging Default Rotation Size: 1024
Local Logging Default Rotations: 8
Log To Unique Subdirectory: false
Message Queue Drop Mark: 90
Remote Host: udp://192.168.10.XXX:514Cause
Resolution
If this issue occurs, reboot the ESXi host to recover manageability of the host.
After reboot change back the scratch location to default /scratch/log , run the below command to set /scratch/log
esxcli system syslog config set --logdir=/scratch/log
To prevent this issue from occurring, do not store system logs or scratch locations on the vSAN datastore.
For more information about configuring system scratch partitions and the syslog configuration, see:
Additional Information
To prevent this issue occurring, do not store system logs or scratch locations on the vSAN datastore. See VMware KB: Redirecting system logs to a vSAN object causes an ESXi host lock up