VxRail: ESXi SSH is left enabled on primary VxRail node after nightly advisory report on 7.0.45x
Summary: ESXi SSH is left enabled on the primary VxRail node after nightly advisory report on 7.0.45x.
Symptoms
This knowledge article is for customers who have security concerns if the SSH service is in an enabled state on ESXi. It is possible that the nightly VxRail advisory report check is leaving SSH in an enabled state on the primary VxRail node.
This is seen when health check sub modules are temporarily enabling TSM-SSH on the ESXi nodes simultaneously.
Scenario 1:
The health check sub component "Radar" (part of the VxRail health-checks, pre-checks, and ADC) has multiple modules that are part of its automation.
The modules in question are "vsan_disk_utilization_check" and "VxVerify" that are conflicting with each other.
Example log snippet on enabling SSH for module "vsan_disk_utilization_check":
Radar.log from VxRail Manager: 2023-06-22 00:20:52,403.403Z INFO [vsan_disk_utilization_check] Running cmd on: hst011.cwf.fr > esxcli vsan health cluster get --test=diskspace|grep % [remote_utils.py:114] shell.log from primary VxRail node: 2023-06-21T00:20:53.214Z SSH[6061650]: SSH login enabled
An example log snippet on detecting the SSH status on all nodes for VxVerify to revert its SSH status when finished:
VxVerify vxv.log from the VxRail Manager: 2023-06-22 00:20:59-INFO [host_ssh_check] SSH status for node01.domain.tld is initially enabled
Example log snippet on disabling SSH for module "vsan_disk_utilization_check"
Radar.log from VxRail Manager: 2023-06-22 00:21:35,582.582Z INFO [vsan_disk_utilization_check] Host hst011.cwf.fr ran cmd. Response: b'59.53% (145196.93GB of 243916.11GB) green 170741.28 219524.50 \n',b'Could not create directory \'/home/tcserver/.ssh\'.\r\nload pubkey "/home/tcserver/.ssh/id_ shell.log from primary VxRail node: 2023-06-21T00:21:36.870Z SSH[6061853]: SSH login disabled
Scenario 2:
Run Quick boot script failed
2025-01-29T10:02:32.563Z <586bc3767ee3ca82a80f6e3ecac4b0b6> lcm [INFO] <189> NodeSSHManagementService.java enableSSHServiceWithOriginalStatusReturn() (152): SSH state: true
2025-01-29T10:02:32.563Z <586bc3767ee3ca82a80f6e3ecac4b0b6> lcm [ERROR] <189> QuickBootCompatibilityService.java runQuickbootScript() (70): get script Error
Cause
While enabling SSH, run the command and then disabling SSH for "vsan_disk_utilization_check" it would have already detected in VxVerify (before disabling SSH from vsan_disk_utilization_check) that the primary VxRail node has its SSH enabled.
At the end of the VxVerify, it tries to restore or leave the ssh in an enabled state. That was the state of the SSH service at the start of VxVerify.
Resolution
Scenario 1:
There are two ways of resolving this problem: One is to change the configuration to disable the "vsan_disk_utilization_check," and the other is to upgrade the ADC to the fixed version.
Resolution 1:
Commands to disable and validate the "vsan_disk_utilization_check" disabled state:
su to root
vxrm0:/mystic/radar # cd /mystic/radar
vxrm0:/mystic/radar # grep -i vsan_disk_utilization_check /mystic/radar/conf/profile/advisory-report.yml
- vsan_disk_utilization_check
vxrm0:/mystic/radar # cp /mystic/radar/conf/profile/advisory-report.yml /home/mystic/advisory-report.bckup
vxrm0:/mystic/radar # sed -i 's/^\( *\)- vsan_disk_utilization_check/# - vsan_disk_utilization_check/' /mystic/radar/conf/profile/advisory-report.yml
vxrm0:/mystic/radar # grep -i vsan_disk_utilization_check /mystic/radar/conf/profile/advisory-report.yml
# - vsan_disk_utilization_check
Disable and validate disabled state for ESXi TSM-SSH on all nodes.
Wait for the overnight VxRail "Advisory Report" to run and validate if the SSH is left in an SSH disabled state.
You may need to reapply this resolution again since /mystic/radar/conf/profile/advisory-report.yml will be reverted during ADC update.
Resolution 2:
See KB# https://www.dell.com/support/kbdoc/000019890 to upgrade the ADC to latest version to fix this issue.
Scenario 2:
Upgrade VxRail to 8.0.330 or later version.