Avamar: NDMP NetApp backup times out after setting the TCP/IP parameter
Summary: NDMP NetApp backup times out after setting the TCP/IP parameter
Symptoms
Network Data Management Protocol (NDMP) accelerator is slow to request more data after setting Transmission Control Protocol/Internet Protocol (TCP/IP) parameter win=1. This causes the Network-attached storage (NetApp) to experience a connection timeout and close the connection.
NetApp NDMP Backup Issues
NetApp NAS backups may complete with exceptions, and the NetApp TCP/IP connection may time out and close.The following messages may appear in the avndmp logs:2019-02-13 10:51:11 avndmp Info <0000>: [snapup-/X/Y] NDMP: DUMP: Wed Feb 13 10:51:11 2019 : We have written 4340067215 KB.
2019-02-13 10:56:32 avndmp Error <0000>: [snapup-/X/Y] NDMP: NDMP: DUMP: Message from Write Dirnet: Interrupted system call
2019-02-13 10:56:32 avndmp Error <0000>: [snapup-/X/Y] NDMP: DUMP: DUMP IS ABORTED
2019-02-13 10:56:37 avndmp Info <0000>: [snapup-/X/Y] NDMP: DUMP: Deleting "/X/Y/../snapshot_for_backup.53960" snapshot.
2019-02-13 10:56:39 avtar Info <7061>: Canceled by '7003-Netapp Filer' - exiting...
2019-02-13 10:56:39 avtar Info <9772>: Starting graceful (staged) termination, cancel request (wrap-up stage)
2019-02-13 10:56:39 avtar Info <19165>: Staging can run is false, possibly due to cancel, inform ddboost
Cause
Root Cause
The issue seems to be caused because the accelerator node is getting busy. When this happens, it sets the TCP/IP window size to 1 (win=1), indicating that it cannot receive any more data.
This causes the NetApp side to continue trying to send data, but it is rejected by the accelerator node. Eventually, the NetApp TCP/IP times out and closes the connection.
The NetApp side could not set the option for TCP retransmissions, which contributed to the issue.
Error messages in the avndmp logs show:
2019-02-13 10:56:32 avndmp Error <0000>: [snapup-/X/Y] NDMP: NDMP: DUMP: Message from Write Dirnet: Interrupted system call
2019-02-13 10:56:32 avndmp Error <0000>: [snapup-/X/Y] NDMP: DUMP: DUMP IS ABORTED
Resolution
NDMP Backup Resolution
To resolve the issue of NetApp timing out and closing the connection during NDMP backup, follow these steps:
On the Avamar side, add the following option to force avndmp to continue reading data even if the application is busy:
--avndmp]backup-stream-buffering-period=1
The /usr/local/avamar/var/CLIENT/avndmp.cmd option can be added in the dataset, under Advanced Options, or on the accelerator node, under the client directory.
For example, the avndmp.cmd file should contain the following line:
--backup-stream-buffering-period=1
After adding this option, verify that the issue has been resolved by checking the avndmp logs for any error messages related to the connection being closed.