NetWorker provides the 'Inactivity Timeout' attributes in several locations:
- NetWorker server job inactivity timeout:
NOTE: The Job Inactivity Timeout sets how long NetWorker waits for a job response. If no response arrives in time, the server ends the job. The job inactivity timeout applies to all actions defined in all workflows in a policy. The default setting is zero (no timeout).
- Action inactivity timeout:
NOTE: This specifies the maximum number of minutes that a job that is run by an action can try to respond to the server. If a job does not respond in time, the server marks it as failed. NetWorker immediately retries to avoid losing time due to the failure. Inactivity might occur for backups of large save sets, backups of save sets with large sparse files, and incremental backups of many small static files. The Inactivity Timeout option applies to probe actions, and the backup actions for the Traditional and Snapshot action types. If you specify a value for this option in other actions, NetWorker ignores the value.
The NetWorker Administration Guide provides additional information regarding "Inactivity Timeout" settings: NetWorker Documentation
In general, an inactivity timeout may be triggered by many reasons.
Inactivity timeout errors often occur when the client scans a large file system. If few files changed, the next save stream may be delayed significantly. This condition can be resolved by increasing the value of the 'Inactivity Timeout' attribute for the NetWorker backup action that is reporting the error. Set the value to 0 to disable the timeout. Then observe the save set duration and set the inactivity timeout to that number of minutes or more.
If the timeout is not due to the reason listed above, then confirm the following items:
- Check NetWorker client has not been turned off and the network cable is attached.
- Retries of backups always fail.
- All name resolutions between the NetWorker client and backup server are successful: NetWorker: Name Resolution Troubleshooting Best Practices
- Verify all known aliases for the NetWorker client are entered in the alias attribute of the NetWorker client resource.
- All network cards and switches have the same Duplex, Speed, MTU, and other settings.
- Parallelism settings are properly defined. See the NetWorker Performance Optimization and Planning Guide: NetWorker Documentation
- Verify the adjustable TCP/IP parameter. See the NetWorker Performance Optimization and Planning Guide: NetWorker Documentation
If all the above statements are true, then check the following also known contributing factors that can generate this problem.
Firewalls:
If a firewall separates the NetWorker server and client, open enough ports within the NetWorker connection range to allow proper communication. The NetWorker server may reach the client, but the client might fail to open new ports to respond.
NetWorker Processes and Ports
NOTE: Consult with your network or security team to monitor firewall traffic during the action that is failing with timeout errors, and monitor for any communication issues.
NetWorker Configuration:
It has been seen where an improperly configured 'Execution Path' attribute for the NetWorker client's resource has caused these Inactivity Timeout errors. Configure the 'Execution Path' attribute correctly or leave blank.
Corruption:
- This issue was seen due to corruption of files on the disk. Verify the checksum or date of the files to ensure their integrity. Remove, move, or fix the corrupted file or use NetWorker directive to skip it.
- Corruption of the NetWorker resource file, especially the backup, or clone action resource, causes inactivity timeout errors on occasion. Try deleting the failing backup or clone action's resource and re-creating it. You might also try recovering the NetWorker resource file from a time before the errors started. This typically requires NetWorker support engagement to perform NetWorker Server Disaster Recovery (NSRDR)
- Corruption of NetWorker Client File Indexes or Media database is known to result in inactivity timeout errors. Run
nsrck -L6 to attempt to repair indexes or run nsrck -L7 to recover indexes. For the media database, corruption might be fixed by recovering an older media database. This typically requires NetWorker support engagement to perform NetWorker Server Disaster Recovery (NSRDR).
When all above recommendations have been exhausted, and before going to other troubleshooting, it is highly recommended to:
- Restart NetWorker daemons to clear caches that may be out of date.
Linux:
nsr_shutdown
systemctl start networker
OR
/etc/init.d/networker start
Windows:
net stop nsrexecd /y
For NetWorker servers:
net start nsrd
For NetWorker Management Console Servers:
net start nsrexecd
net start gstd
For NetWorker client systems:
net start nsrexecd
- Reboot the server and client which clears memory from any fragments of dead code and reinitialize all applications and the OS.