Hi, I have gone through some the articles about best practice performance configuration for iSCSI on VNX. All of the articles suggest that we should disable TCP delayed-Ack on the connecting host. I just don't understand why? How does TCP delayed-Ack impact iSCSI performance?
According to those articles, it was due to 200ms delay that SP port needs to wait for the second packet coming in to trigger a immediate Ack. Why are those KB articles assume there would be no second packet coming in and the 200ms delay is a necessary?
could anyone share some lights on this?
Seems the question you mentioned might also happen on ESX hosts, VMware KB: ESX/ESXi hosts might experience read or write performance issues with certain storage arr....
The affected iSCSI arrays in question take a slightly different approach to handling congestion. Instead of implementing either the slow start algorithm or congestion avoidance algorithm, or both, these arrays take the very conservative approach of retransmitting only one lost data segment at a time and waiting for the host's ACK before retransmitting the next one. This process continues until all lost data segments have been recovered.
Coupled with the delayed ACK implemented on the ESXi/ESX host, this approach slows read performance to a halt in a congested network. Consequently, frequent timeouts are reported in the kernel log on hosts that use this type of array. Most notably, the VMFS heartbeat experiences a large volume of timeouts because VMFS uses a short timeout value. This configuration also experiences excessively large maximum read-response times (on the order of several tens of seconds) reported by the guest. This problem is exacerbated when reading data in large block sizes. In this case, the higher bandwidth contributes to network congestion, and each I/O is comprised of many more data segments, requiring longer recovery times.
Until the iSCSI array vendors address this issue on their end, consider designing your IP storage network with enough capacity to account for the peak usage and lower the risk of congestion. If you experience the problem described in this article and are unable to alter your network configuration or find ways to guarantee a congestion-free environment, you can experiment with the following workaround.
This workaround involves disabling delayed ACK on your ESX/ESXi host through a configuration option. By setting this option, you completely disable delayed ACK regardless of whether TCP is attempting congestion recovery and operating normally. This option affects the entire TCP/IP stack and, therefore, all TCP/IP applications are impacted if you implement this change. As a consequence, setting this option might degrade the performance of iSCSI and other applications—for example, NFS and vMotion—as well as contribute to the congestion. Nonetheless, the option allows a speedy recovery from congestion and a reasonable performance during other transactions with the affected iSCSI arrays.
Note: Please contact your storage vendor to confirm if disabling Delayed ACK is a workaround that they recommend for their storage array.
We have had delayed ack turned off on the Windows SQL boxes for quite some time however I never got around to turning if off on the ESX hosts.
Looking at M&R and/or vSphere charts we would see really spikey latency at relatively low iops (100-200 IOPS). After turning it off on 6 ESX hosts I saw an immediate huge difference in how the graphs looked in M&R and vSphere. Much lower latency and much more consistent latency. I did not take measurements from the actual VMs but since the VMHost showed lower latency on those datastores I think it filtered down to the VMs as well.
One note though, every time I run a high IOPS workload I'd get pretty good performance & iops regardless of delayed ack. Seems to be really helpful for the lower IOPS luns or maybe environments that have some Ethernet packet loss.
Thanks for the feedback
Always good to hear from actual customer experience
Also good to hear once in while what works well – on a forum like that you are bound to get much more people posting what doesn’t work ☺
I've attached a M&R report that I sent out to the support team after making the changes. The LUN in question does about 200 IOPS, you can clearly see the change in behavior.