Avamar: NDMP NetApp backup times out after setting the TCP/IP parameter
Summary: NDMP NetApp backup times out after setting the TCP/IP parameter
Symptoms
Network Data Management Protocol (NDMP) accelerator is slow to request more data after setting Transmission Control Protocol/Internet Protocol (TCP/IP) parameter win=1. This causes the Network-attached storage (NetApp) to experience a connection timeout and close the connection.
NetApp NDMP Backup Issues:
NetApp NAS backups may complete with exceptions, and the NetApp TCP/IP connection may time out and close.
The following messages may appear in the avndmp logs:
2019-02-13 10:51:11 avndmp Info <0000>: [snapup-/X/Y] NDMP: DUMP: Wed Feb 13 10:51:11 2019 : We have written 4340067215 KB.
2019-02-13 10:56:32 avndmp Error <0000>: [snapup-/X/Y] NDMP: NDMP: DUMP: Message from Write Dirnet: Interrupted system call
2019-02-13 10:56:32 avndmp Error <0000>: [snapup-/X/Y] NDMP: DUMP: DUMP IS ABORTED
2019-02-13 10:56:37 avndmp Info <0000>: [snapup-/X/Y] NDMP: DUMP: Deleting "/X/Y/../snapshot_for_backup.53960" snapshot.
Also, the following messages may appear on the avtar side:
2019-02-13 10:56:39 avtar Info <7061>: Canceled by '7003-Netapp Filer' - exiting...
2019-02-13 10:56:39 avtar Info <9772>: Starting graceful (staged) termination, cancel request (wrap-up stage)
2019-02-13 10:56:39 avtar Info <19165>: Staging can run is false, possibly due to cancel, inform ddboost
The issue may occur for random backups and on random volumes.
Cause
The issue appears to be caused by the accelerator node getting busy. When this happens, it sets the TCP/IP window size to 1 (win=1), indicating that it cannot receive any more data.
This causes the NetApp side to continue trying to send data, but it is rejected by the accelerator node. Eventually, the NetApp TCP/IP times out and closes the connection.
The NetApp side could not set the option for TCP retransmissions, which contributed to the issue.
Error messages in the avndmp logs show:
2019-02-13 10:56:32 avndmp Error <0000>: [snapup-/X/Y] NDMP: NDMP: DUMP: Message from Write Dirnet: Interrupted system call
2019-02-13 10:56:32 avndmp Error <0000>: [snapup-/X/Y] NDMP: DUMP: DUMP IS ABORTEDResolution
NDMP Backup Resolution
To resolve the issue of NetApp timing out and closing the connection during NDMP backup, follow these steps:
On the Avamar side, add the following option to force avndmp to continue reading data even if the application is busy:
backup-stream-buffering-period=1
This can be put in the /usr/local/avamar/var/CLIENT_NAME/avndmp.cmd file on the accelerator node.
The avndmp.cmd file would contain the following line:
--backup-stream-buffering-period=1
Alternatively it can be added in the dataset, under Advanced Options.
parameter: [avndmp]backup-stream-buffering-period value: 1
After adding this option, verify that the issue has been resolved by checking the avndmp logs for any error messages related to the connection being closed.