Troubleshoot TCP zero window update packets sent by a PowerScale node

Summary: This article discusses the reason for zero window counters from the netstat command. It also discusses possible points of interest while researching and investigating why these values are increasing. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Instructions

The output of the
 'netstat -anT -p tcp'
command from a node shows a count of TCP zero window packets (0-win column). The values in the 0-win column indicate the amount of times that the node of the TCP connection (local address) to the remote device (foreign address) had sent a TCP zero window update packet. This occurs when the node's TCP receive window has been reduced to zero, or to a size too low to fit a full-sized data segment.
Example:
Cluster-1# netstat -anT -p tcp
Active Internet connections (including servers)
Proto Rexmit OOORcv 0-win  maxswnd maxseg     srtt   srtvar    rexmt  sndwnd sncwnd rcvwnd   delack SR SS ND AS Local Address          Foreign Address
tcp4       0      0   1001 2097920   1460     47ms     23ms    342ms 2097664 190488 131400     99ms  X  X  X  X 100.89.53.100.445       100.90.164.11.52765
...
The net result of this is that the remote device will be unable to transmit data, introducing delays that result in elevated (write) latency, until the node sends a TCP window update indicating how much data it can now receive. 
In most cases, TCP zero window update packets sent by the node indicate that the receiving application (process) on the node (NFS, SMB, etc) is slow to pull data out of the receive buffer. This can be indicated by a consistent non-zero value seen in the Recv-Q column for the connection in the output of the
 'netstat -an tcp'
command. For example, running the following command several times to see if the Recv-Q is consistently full.
Example:
Cluster-1# netstat -an tcp
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)
tcp4  131400      0 100.89.53.100.445       100.90.164.11.52765    ESTABLISHED
...
This is a real-time counter so this command will need to be ran while TCP zero window update packets are being sent by the node for the connection. The following is a sample script to retrieve live stats from:
  • Recv
  • Send-Q's
  • Out of Orders (OOO)
  • Zero Windows (0-win)
  • Retransmits
# mkdir /ifs/data/Isilon_Support/$(date +%d-%m-%Y)/
echo; while sleep 10 ; do echo "######### Live Send Rec Queue Q: #########"; date ; netstat -an4x -p tcp | awk '{ if (( $2 != 0 ) || ($3 != 0)) print $0 }'; echo; sleep 1; echo "######### Live OoO / 0-win / Retrans: #########" ; date; netstat -an4T -p tcp | awk '{ if (( $2 != 0 ) || ($3 != 0) || ($4 != 0)) print $0 }'; done >> `hostname`.TCP_specs.out
A constantly elevated Recv-Q means that the data has been put on the receive buffer, but the application has not called recv() to copy it from receive buffer to the application buffer. This is an indication that application is overloaded or otherwise unable to timely process incoming data. As soon as data arrives in the receive queue, it should be processed immediately, if the application is not doing that, then it is being asked to do more work than it can handle. 
In summary, if the Recv-Q value remains elevated for the connection while TCP zero window update packets are being sent for the connection, an investigation into bottlenecks should be performed at the receiving application, CPU, disks, etc.
If the Recv-Q value remains at zero for the connection, then TCP zero window update packets sent by the node may also indicate that the TCP receive window on the node-side of the connection is too low to begin with for the bandwidth-delay-product (BDP) of the connection between the node and remote destination, and some node TCP tunings may need consideration.

Additional Information

See the "Latency, Bandwidth and Throughput" section of the following guide for further information:

https://www.delltechnologies.com/asset/en-us/products/storage/industry-market/h16463-isilon-advanced-networking-fundamentals.pdf

Affected Products

PowerScale OneFS
Article Properties
Article Number: 000221738
Article Type: How To
Last Modified: 19 Apr 2024
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.