Unsolved
This post is more than 5 years old
1 Message
0
58150
September 21st, 2015 13:00
Caution conditions: TCP retransmittions
Receiving numerous notifications on TCP retransmittals, TCP retransmit percentage of 1.9%.
How can I fix this error messages
No Events found!



Joe S586
7 Technologist
•
729 Posts
0
September 23rd, 2015 10:00
Take a look at the following first and see if this helps you narrow down the retransmits further.
Title
How to troubleshoot retransmits on Equallogic or San HQ
Description
Retransmit errors in San HQ
On many networks, there can be an imbalance in the network traffic between the devices that send traffic and the devices that receive the traffic. This is often the case in SAN configurations in which many servers (initiators) are communicating with storage devices (targets). If senders transmit data simultaneously, they may exceed the throughput capacity of the receiver. When this occurs, the receiver may drop packets, forcing senders to retransmit the data after a delay. Although this will not result in any loss of data, latency will increase because of the retransmissions, and I/O performance will degrade.
Flow Control eliminates this problem. When enabled, Flow Control allows the receiver to instruct the sender to “throttle back” when the receiver senses that it is being overwhelmed by packets. The receiver does this by sending “pause frames” to the sender, which causes the sender to slow packet transmission for a short period of time. This lets the receiver process its backlog so it can later resume accepting input. The amount of delay introduced by this action is dramatically less than the overhead caused by TCP/IP packet retransmission.
Network issue
On many networks, there can be an imbalance in the network traffic between the devices that send traffic and the devices that receive the traffic. This is often the case in SAN configurations in which many servers (initiators) are communicating with storage devices (targets). If senders transmit data simultaneously, they may exceed the throughput capacity of the receiver. When this occurs, the receiver may drop packets, forcing senders to retransmit the data after a delay. Although this will not result in any loss of data, latency will increase because of the retransmissions, and I/O performance will degrade.
Flow Control eliminates this problem. When enabled, Flow Control allows the receiver to instruct the sender to “throttle back” when the receiver senses that it is being overwhelmed by packets. The receiver does this by sending “pause frames” to the sender, which causes the sender to slow packet transmission for a short period of time. This lets the receiver process its backlog so it can later resume accepting input. The amount of delay introduced by this action is dramatically less than the overhead caused by TCP/IP packet retransmission.
Solution
To create a list of IPs requesting retransmitted packets log into eqLoris Online tool and upload files. You see a report named retrans.txt. Look for the Non-Equallogic IPs and determine which IPs report = or > .10%. On eqltoolset.us.dell run the diag-unpack to create analysis folder with the retrains.txt. If you only have access to the raw files of diags then search in Report.txt for the words “retransmitted” for a % and “retransmit timeouts”.
The first step and most successful step is to enable flow control on the iSCSI nics and iSCSI switch ports. This will enable pause frames to be sent to the array in times of congestion on the network. Note: The Equallogic arrays honor pause frames, but does not send them.
If this does not resolve the problem and retransmits are reported on most of the initiators. Then refer to Storage Array Network Performance Guidelines tech report form Equallogic.com for general guide lines in configuring network switches and create a service request with switch vendor to check firmware and updates for switch model.
If problem appears on a specific host IPs or a few hosts, then troubleshoot the nics and iSCSI initiators. Start at top of list.
For Microsoft servers:
· Start with updating nic drivers and firmware to most current from OEM server site. Example Support.Dell.com, Cisco, IBM, or HP website. Do not use Broadcom Drivers on Broadcom site unless specified by a Senior Engineer.
· Enable flow control for TX and RX on the iSCSI nics.
· If using more than one iSCSI nic on the host, confirm latest Hit Kit installed and supported on the version of operating system running on the server or VM.
· Confirm multipathing is set to round-robin or Least Queue Depth in the HIT kit if using more than one iSCSI nic.
· Windows Server 2003, confirm it is running version 2.08 Microsoft initiator or above.To check the initiator version C:\>iscsicli versioninfo Confirm 2003 server is running MS 2.08 11/13/2008 5.2.3790.3825 BUILD 3825
· As a test, Disable TOE (TCP offload engine) and LSO (Large Send Offload) on Broadcom nics to see if it reduces retransmits.
· Confirm that interrupt Moderation is enabled. TCP retransmits may occur more frequently when Interrupt Moderation is DISABLED, especially on backup servers and high demand server applications like SQL.
· Check Event Viewer System logs for any iSCSI or MPIO errors. Search for “ms patches” in Salesforce for list of hotfixes for Microsoft.
On VMware ESX Hosts:
· Confirm flow control enabled on nics used for iSCSI kb.vmware.com/.../search.do;cmd=displayKC&externalId=1013413
· Confirm that Broadcom and Intel drivers are at latest for Version of ESX. Intel drivers are usually native and are updated thru the patch build of ESX. Broadcom drivers are updated via the Driver CD for Broadcom NetXtreme on VMware web site and updated frequently. kb.vmware.com/.../search.do;externalId=1027206
· In vmware vm-support file look at /tmp/esxcli-swiscsi-nic-list to find driver versions of iSCSi nics in ESX 4.x. In ESX 5.0 and 5.1 look in /commands/localcli_iscsi-networkportal-list.txt
· Confirm ESX version is at latest patch build www.vmware.com/.../download.portal
For Linux Hosts:
· Enable pause frames with ethtool –A ethX rx on tx on
· Install Linux Hit kit if more than one iSCSI nic installed on host
· Check Kernel change log for network fixes or nic driver updates.
-Joe