PowerFlex ESXi hosts become unresponsive during NFC backup or replication operations

Summary: This article describes a cause for hostd service on ESXi to be non-responsive, and cause the server to become unresponsive to management operations from vCenter. The particular scenarios where this happens is when NFC operations such as backup or replication jobs are performed on disks with an IO filter attached. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

ESXi hosts become unresponsive with hostd failing or stopping responding when NFC operations such as backup or replication jobs are performed on disks with an IO filter attached. Messages in /var/run/log/hostd.log indicating IoTracker hostd threads taking increasingly long amounts of time to complete:

2022-06-23T09:59:46.880Z warning hostd[3044912] [Originator@6876 sub=IoTracker] In thread 3049831, preadv("/vmfs/devices/vdfm/1322229d-vdfm") took over 19214 sec.
2022-06-23T09:59:56.883Z warning hostd[3045356] [Originator@6876 sub=IoTracker] In thread 3049832, preadv("/vmfs/devices/vdfm/1322229d-vdfm") took over 19224 sec.
2022-06-23T09:59:56.883Z warning hostd[3045356] [Originator@6876 sub=IoTracker] In thread 3049833, preadv("/vmfs/devices/vdfm/1322229d-vdfm") took over 19224 sec.
2022-06-23T09:59:56.883Z warning hostd[3045356] [Originator@6876 sub=IoTracker] In thread 3049831, preadv("/vmfs/devices/vdfm/1322229d-vdfm") took over 19224 sec.

The host can eventually be recovered into a responding state by rebooting it, after shutting down and moving any active VMs on it.

Cause

This issue happens when a hostd worker thread limits exhaustion during specific NFC operations

Resolution

Workaround:

There is a workaround provided in the VMware article. It is worth monitoring the backup and replication operation to observe if there is any impact on the normal state of the operation.

User can work around this issue by limiting NFCs' ability to overwhelm hostd. Set the maximum asynchronous NFC threads to 2 in the hostd configuration.

1. Export the hostd configuration settings from ConfigStore to a json file using the following command.  

configstorecli config current get -c esx -g services -k hostd -outfile tmp.json

2. Edit the tmp.json file:  

vi tmp.json

3. Add the "max_async_threads": Two lines to the "nfcsvc"  section as seen below and save the file (the other options may differ in your environment).

"nfcsvc": {
         "log_level": "INFO",
         "max_memory": 100663296,
         "max_stream_memory": 35651584,
         "max_async_threads": 2

4. Run the following command to apply the file to the ConfigStore database:  

configstorecli config current set -c esx -g services -k hostd -infile tmp.json

5. Run the following command to restart hostd service:  

/etc/init.d/hostd restart

This is based on a VMware KB article https://kb.vmware.com/s/article/89650

Affected Products

VxBlock and Vblock Systems, PowerFlex rack

Products

VxFlex Ready Nodes, PowerFlex custom node, PowerFlex appliance R650, PowerFlex appliance R6525, PowerFlex appliance R660, PowerFlex appliance R6625, Powerflex appliance R750, PowerFlex appliance R760, PowerFlex appliance R7625, PowerFlex custom node , PowerFlex custom node R650, PowerFlex custom node R6525, PowerFlex custom node R660, PowerFlex custom node R6625, PowerFlex custom node R750, PowerFlex custom node R760, PowerFlex custom node R7625, PowerFlex rack HW, PowerFlex Software, VxFlex Ready Node, VxFlex Ready Node R640, VxFlex Ready Node R740xd, VMware ESXi 6.5.X, VMware ESXi 6.7.X, VMware ESXi 6.x, VMware ESXi 7.x, VMware ESXi 8.x, PowerFlex appliance R640, PowerFlex appliance R740XD, PowerFlex appliance R7525, PowerFlex appliance R840, VxFlex Ready Node R840 ...
Article Properties
Article Number: 000219067
Article Type: Solution
Last Modified: 12 Nov 2024
Version:  1
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.