Troubleshoot TCP zero window update packets sent by a PowerScale node

摘要: This article discusses the reason for zero window counters from the netstat command. It also discusses possible points of interest while researching and investigating why these values are increasing. ...

本文适用于 本文不适用于 本文并非针对某种特定的产品。 本文并非包含所有产品版本。

说明

The output of the
 'netstat -anT -p tcp'
command from a node shows a count of TCP zero window packets (0-win column). The values in the 0-win column indicate the amount of times that the node of the TCP connection (local address) to the remote device (foreign address) had sent a TCP zero window update packet. This occurs when the node's TCP receive window has been reduced to zero, or to a size too low to fit a full-sized data segment.
Example:
Cluster-1# netstat -anT -p tcp
Active Internet connections (including servers)
Proto Rexmit OOORcv 0-win  maxswnd maxseg     srtt   srtvar    rexmt  sndwnd sncwnd rcvwnd   delack SR SS ND AS Local Address          Foreign Address
tcp4       0      0   1001 2097920   1460     47ms     23ms    342ms 2097664 190488 131400     99ms  X  X  X  X 100.89.53.100.445       100.90.164.11.52765
...
The net result of this is that the remote device will be unable to transmit data, introducing delays that result in elevated (write) latency, until the node sends a TCP window update indicating how much data it can now receive. 
In most cases, TCP zero window update packets sent by the node indicate that the receiving application (process) on the node (NFS, SMB, etc) is slow to pull data out of the receive buffer. This can be indicated by a consistent non-zero value seen in the Recv-Q column for the connection in the output of the
 'netstat -an tcp'
command. For example, running the following command several times to see if the Recv-Q is consistently full.
Example:
Cluster-1# netstat -an tcp
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)
tcp4  131400      0 100.89.53.100.445       100.90.164.11.52765    ESTABLISHED
...
This is a real-time counter so this command will need to be ran while TCP zero window update packets are being sent by the node for the connection. The following is a sample script to retrieve live stats from:
  • Recv
  • Send-Q's
  • Out of Orders (OOO)
  • Zero Windows (0-win)
  • Retransmits
# mkdir /ifs/data/Isilon_Support/$(date +%d-%m-%Y)/
echo; while sleep 10 ; do echo "######### Live Send Rec Queue Q: #########"; date ; netstat -an4x -p tcp | awk '{ if (( $2 != 0 ) || ($3 != 0)) print $0 }'; echo; sleep 1; echo "######### Live OoO / 0-win / Retrans: #########" ; date; netstat -an4T -p tcp | awk '{ if (( $2 != 0 ) || ($3 != 0) || ($4 != 0)) print $0 }'; done >> `hostname`.TCP_specs.out
A constantly elevated Recv-Q means that the data has been put on the receive buffer, but the application has not called recv() to copy it from receive buffer to the application buffer. This is an indication that application is overloaded or otherwise unable to timely process incoming data. As soon as data arrives in the receive queue, it should be processed immediately, if the application is not doing that, then it is being asked to do more work than it can handle. 
In summary, if the Recv-Q value remains elevated for the connection while TCP zero window update packets are being sent for the connection, an investigation into bottlenecks should be performed at the receiving application, CPU, disks, etc.
If the Recv-Q value remains at zero for the connection, then TCP zero window update packets sent by the node may also indicate that the TCP receive window on the node-side of the connection is too low to begin with for the bandwidth-delay-product (BDP) of the connection between the node and remote destination, and some node TCP tunings may need consideration.

其他信息

See the "Latency, Bandwidth and Throughput" section of the following guide for further information:

https://www.delltechnologies.com/asset/en-us/products/storage/industry-market/h16463-isilon-advanced-networking-fundamentals.pdf

受影响的产品

PowerScale OneFS
文章属性
文章编号: 000221738
文章类型: How To
上次修改时间: 19 4月 2024
版本:  2
从其他戴尔用户那里查找问题的答案
支持服务
检查您的设备是否在支持服务涵盖的范围内。