NetWorker: RPS-enabled cloning fails after upgrade to 19.11 if the Server has reverse DNS state set to banned
Summary: This Article describes a defect under investigation by NetWorker engineering.
Symptoms
After upgrading to NetWorker 19.11, clone jobs to appear unresponsive, logging the following loop of messages:
01/13/25 16:51:19.000291 nsrclone-D5 find_clone_backend_job(): ENTER 01/13/25 16:51:19.000323 nsrclone-D5 extend_mmd_reservation_all_clone_backend_jobs: ENTER 01/13/25 16:51:19.000335 nsrclone-D5 extend_mmd_reservation_all_clone_backend_jobs: EXIT 01/13/25 16:51:20.001007 nsrclone-D5 extend_mmd_reservation_all_clone_backend_jobs: ENTER 01/13/25 16:51:20.001070 nsrclone-D5 extend_mmd_reservation_all_clone_backend_jobs: EXIT 01/13/25 16:51:21.000097 nsrrecopy-D3 main 0x342e850 wait timed out (locked)
The problem appears when:
- Server has
reverse DNS state: bannedset in the Local Agent database (NSRLADB) - Clone job is configured to use a remote Storage Node instead of the server as the Source (Read) node
- Clone job requires RPS - either set globally in the NSR (Server) resource (Disable RPS Clone: No) or automatically invoked due to save set type (vProxy/OAPP)
The job does not complete, and fails or must be aborted.
Cause
The cause appears to be related to communications changes in NetWorker 19.11. The new reverse DNS state value lets Administrators remove reverse lookup-matching requirements, which have been part of NetWorker since its initial releases.
However, this major change appears to have introduced issues which are under investigation. Although reverse DNS state is not "banned" by default, Administrators using it on the server face issues with RPS cloning when a separate Storage Node is used.
Resolution
The fix is being investigated in the defect NETWORKER-111382. This fix will appear in NetWorker 19.11.0.6 and NetWorker 19.12.0.2.
In the short-term, there are three potential workarounds for the issue:
- Use reverse DNS state: cached or uncached instead of banned on the server. If you currently rely upon the banned setting for nonreverse-resolvable client backups to succeed, you must ensure that reverse DNS lookup zone entries are created for those clients' IP addresses, queryable by the NetWorker server and nodes, for them to continue to work. To change this setting on the server, at an elevated command prompt, run:
(echo . type: nsrla & echo upd reverse DNS state: cached) | nsradmin -p nsrexec -i -
printf ". type: nsrla\nupd reverse DNS state: cached\n" | nsradmin -p nsrexec -i -
Then restart services following the change:
nsr_shutdown systemctl start networker
net stop nsrexecd /y net start nsrd net start gstd *Starting gstd is only required if NMC server is installed on the same host as the NetWorker server.
- Change the Source and Destination nodes in the clone Action to use the server (
nsrserverhost) instead of a storage node, if possible. For Data Domain clone jobs, the storage node is largely irrelevant since the Data Domains themselves handle the data traffic, and is contingent only upon the server's access to each Data Domain. - Disable RPS globally.
To do this, on the server, at an elevated command prompt, run:
(echo . type: nsr & echo upd Disable RPS Clone: Yes) | nsradmin -i -
printf ". type: nsrla\nupd Disable RPS Clone: Yes\n" | nsradmin -i -
A service restart is not required - the next clone job should start with RPS disabled.
Additional Information
For similar issues with NetWorker 19.11 dealing with backup failures related to the new reverse DNS state settings, see: NetWorker: After upgrading to 19.11, backup fails reporting "Hostname resolution failed"