ESXi hosts using iSCSI protocol, on failover access to volumes paused for 35 seconds
Summary: System administrators of ESXi hosts connected to an iSCSI Storage Area Network (SAN) may notice an apparent pause in host IO for 35 seconds in the event of controller module failover on the SAN. This is due to default iSCSI keepalive (NOP-in, NOP-out) timeout settings on the ESXi host. ...
Instructions
Broadcom vSphere administrators can defer to Listing and Setting iSCSI Parameters with ESXCLI
The values concerned are:
NoopOutInterval Time interval, in seconds, between NOP-Out requests sent from your iSCSI initiator to an iSCSI target. The NOP-Out requests serve as the ping mechanism to verify that a connection between the iSCSI initiator and the iSCSI target is active. Supported only at the initiator level
NoopOutTimeout Amount of time, in seconds, that can lapse before your host receives a NOP-In message. The message is sent by the iSCSI target in response to the NOP-Out request. When the NoopTimeout limit is exceeded, the initiator terminates the current session and starts a new one. Supported only at the initiator level
RecoveryTimeout Amount of time, in seconds, that can lapse while a session recovery is performed. If the timeout exceeds its limit, the iSCSI initiator terminates the session.
These values can be listed from the output of the command esxcli iscsi adapter param get -A vmhbaXY (where XY is the iSCSI HBA)
Here are the default values, notice that they add up to 35 seconds, this is the pause the user observes.
NoopOutInterval - 15 seconds
NoopOutTimeout - 10 seconds
RecoveryTimeout - 10 seconds
Here are the commands to change those values.
For example:
esxcli iscsi adapter param set -A vmhbaXY -k NoopOutInterval -v 1esxcli iscsi adapter param set -A vmhbaXY -k NoopOutTimeout -v 10esxcli iscsi adapter param set -A vmhbaXY -k RecoveryTimeout -v 1
An ESXi host restart is required for the settings to take effect.
Each customer environment is different, and administrators must test and tune the values accordingly. The ESXi default values are to allow for recovery and convergence on Ethernet switches when using Ethernet switches to carry iSCSI traffic.
When using Ethernet switches to carry iSCSI traffic consider the following.
- If the iSCSI initiators are connected to ME5 Series storage systems through the network switches, ensure that your switches support IEEE 802.3x flow control. Also, ensure that the flow control is enabled for both sending and receiving on all switch ports and server NIC ports.
- If you do not enable the flow control, your iSCSI storage may experience degradation of the I/O performance
- In addition to enabling the Ethernet IEEE 802.3x flow control, Dell Technologies recommends that you disable unicast broadcast storm control on the switch ports that are connected to the iSCSI initiators and target storage systems. Dell also recommends turning on the "
PortFast" mode of the spanning tree protocol (STP) on the switch ports that are connected to the iSCSI initiators and target system - Turning on the
PortFastmode is different from turning off the whole operation of STP on the switch. WithPortFaston, STP is still enabled on the switch ports. Turning off STP may affect the entire network and can leave the network vulnerable to physical topology loops.
There are some switch configurations shown here for reference Switch configuration guides for SC Series or PS Series SANs. The same principles apply to other switch models, see your switch vendors documentation for specific commands.
Host configuration best practices for Dell storage models may be found at Storage | Dell Technologies Info Hub