NetWorker: NDMP restore fails with "Timeout reading"
Summary: NetWorker Data Management Protocol (NMDP) DAR recovery fails with the error "Timeout reading"
Symptoms
Large NDMP DAR recovery is failing.
When performing a large NDMP DAR recovery, the restore fails with the following error:
42572:nsrndmp_recover:ssid'582506817': Timeout reading 42572:nsrndmp_recover:ssid'582506817': Timeout reading 42572:nsrndmp_recover:ssid'582506817': Timeout reading 42589:nsrndmp_recover:ssid'582506817': reply message for sequence 8 is not received. 42590:nsrndmp_recover:ssid'582506817': Timeout to receive any message from server. 42596:nsrndmp_recover:ssid'582506817': data start recover: communication failure. 42849:nsrndmp_recover:ssid'582506817': Unable to start the NDMP restore process. 42572:nsrndmp_recover:ssid'582506817': Timeout reading 42572:nsrndmp_recover:ssid'582506817': Timeout reading 42572:nsrndmp_recover:ssid'582506817': Timeout reading 42589:nsrndmp_recover:ssid'582506817': reply message for sequence 9 is not received. 42590:nsrndmp_recover:ssid'582506817': Timeout to receive any message from server. 42572:nsrndmp_recover:ssid'582506817': Timeout reading 42572:nsrndmp_recover:ssid'582506817': Timeout reading 42572:nsrndmp_recover:ssid'582506817': Timeout reading 42589:nsrndmp_recover:ssid'582506817': reply message for sequence 10 is not received. 42590:nsrndmp_recover:ssid'582506817': Timeout to receive any message from server. 42596:nsrndmp_recover:ssid'582506817': data abort: communication failure. 42871:nsrndmp_recover:ssid'582506817': Error during File NDMP Extraction. 86724:nsrdsa_recover: DSA listening at: host '<backup server>', IP address '<ip address>', port '<port>'. nsrdsa_recover : Aborted 42840:nsrndmp_recover:ssid'582506817': NDMP recover failed. 42880:nsrndmp_recover:ssid'582506817': Error during NDMP recover 16279:recover: NDMP retrieval: child failed with status of 1
Cause
RCA unknown however possible NetApp limitation reading the size of the nlist.
It is always recommended that DAR recovery be limited to small sizes and fewer files for best performance.
In this case, the recovery attempted was nearly 500 GB and over 192,000 files. So it is most likely caused by the NetApp working hard on going through such a large nlist
Resolution
As a workaround, limit the nlist from the NetWorker side by using the NSR_NDMP_RESTORE_LIMIT variable. This is in the Dell NetWorker for Network Data Management Protocol User Guide. NetWorker version-specific documentation is available through: Support for NetWorker | Manuals & Documents
Example to split it into 10,000 files:NSR_NDMP_RESTORE_LIMIT=10000
nsrrc. Setting it in nsrrc did not take effect.
Linux: From an elevated prompt, edit /etc/environment. Add NSR_NDMP_RESTORE_LIMIT=10000.
Windows: From an Administrator command/PowerShell prompt: setx /m NSR_NDMP_RESTORE_LIMIT 10000