Troubleshoot TCP zero window update packets sent by a PowerScale node
Summary: This article discusses the reason for zero window counters from the netstat command. It also discusses possible points of interest while researching and investigating why these values are increasing. ...
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Instructions
The output of the
Example:
In most cases, TCP zero window update packets sent by the node indicate that the receiving application (process) on the node (NFS, SMB, etc) is slow to pull data out of the receive buffer. This can be indicated by a consistent non-zero value seen in the Recv-Q column for the connection in the output of the
Example:
In summary, if the Recv-Q value remains elevated for the connection while TCP zero window update packets are being sent for the connection, an investigation into bottlenecks should be performed at the receiving application, CPU, disks, etc.
If the Recv-Q value remains at zero for the connection, then TCP zero window update packets sent by the node may also indicate that the TCP receive window on the node-side of the connection is too low to begin with for the bandwidth-delay-product (BDP) of the connection between the node and remote destination, and some node TCP tunings may need consideration.
'netstat -anT -p tcp'command from a node shows a count of TCP zero window packets (0-win column). The values in the 0-win column indicate the amount of times that the node of the TCP connection (local address) to the remote device (foreign address) had sent a TCP zero window update packet. This occurs when the node's TCP receive window has been reduced to zero, or to a size too low to fit a full-sized data segment.
Example:
Cluster-1# netstat -anT -p tcp Active Internet connections (including servers) Proto Rexmit OOORcv 0-win maxswnd maxseg srtt srtvar rexmt sndwnd sncwnd rcvwnd delack SR SS ND AS Local Address Foreign Address tcp4 0 0 1001 2097920 1460 47ms 23ms 342ms 2097664 190488 131400 99ms X X X X 100.89.53.100.445 100.90.164.11.52765 ...The net result of this is that the remote device will be unable to transmit data, introducing delays that result in elevated (write) latency, until the node sends a TCP window update indicating how much data it can now receive.
In most cases, TCP zero window update packets sent by the node indicate that the receiving application (process) on the node (NFS, SMB, etc) is slow to pull data out of the receive buffer. This can be indicated by a consistent non-zero value seen in the Recv-Q column for the connection in the output of the
'netstat -an tcp'command. For example, running the following command several times to see if the Recv-Q is consistently full.
Example:
Cluster-1# netstat -an tcp
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 131400 0 100.89.53.100.445 100.90.164.11.52765 ESTABLISHED
... This is a real-time counter so this command will need to be ran while TCP zero window update packets are being sent by the node for the connection. The following is a sample script to retrieve live stats from:
- Recv
- Send-Q's
- Out of Orders (OOO)
- Zero Windows (0-win)
- Retransmits
# mkdir /ifs/data/Isilon_Support/$(date +%d-%m-%Y)/
echo; while sleep 10 ; do echo "######### Live Send Rec Queue Q: #########"; date ; netstat -an4x -p tcp | awk '{ if (( $2 != 0 ) || ($3 != 0)) print $0 }'; echo; sleep 1; echo "######### Live OoO / 0-win / Retrans: #########" ; date; netstat -an4T -p tcp | awk '{ if (( $2 != 0 ) || ($3 != 0) || ($4 != 0)) print $0 }'; done >> `hostname`.TCP_specs.out A constantly elevated Recv-Q means that the data has been put on the receive buffer, but the application has not called recv() to copy it from receive buffer to the application buffer. This is an indication that application is overloaded or otherwise unable to timely process incoming data. As soon as data arrives in the receive queue, it should be processed immediately, if the application is not doing that, then it is being asked to do more work than it can handle.
In summary, if the Recv-Q value remains elevated for the connection while TCP zero window update packets are being sent for the connection, an investigation into bottlenecks should be performed at the receiving application, CPU, disks, etc.
If the Recv-Q value remains at zero for the connection, then TCP zero window update packets sent by the node may also indicate that the TCP receive window on the node-side of the connection is too low to begin with for the bandwidth-delay-product (BDP) of the connection between the node and remote destination, and some node TCP tunings may need consideration.
Additional Information
See the "Latency, Bandwidth and Throughput" section of the following guide for further information:
https://www.delltechnologies.com/asset/en-us/products/storage/industry-market/h16463-isilon-advanced-networking-fundamentals.pdf
https://www.delltechnologies.com/asset/en-us/products/storage/industry-market/h16463-isilon-advanced-networking-fundamentals.pdf
Affected Products
PowerScale OneFSArticle Properties
Article Number: 000221738
Article Type: How To
Last Modified: 19 Apr 2024
Version: 2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.