NetWorker: Name Resolution Troubleshooting Best Practices
Summary: Troubleshooting guide for Domain Name Space (DNS) related issues in NetWorker.
Instructions
NetWorker depends on name resolution. If name resolution is not correct and entirely consistent, problems may arise in many of NetWorker's operations. Since NetWorker manages potentially sensitive data, it must ensure the identities of the hosts with whom it interacts by various means.
Any number of symptoms in NetWorker can be the result of name resolution imperfections in NetWorker:
- Error messages indicating forward or reverse name lookup problems.
- Inability to probe clients during backup
- Inability of clients to manually save to the server or recover.
- Problems cloning or accessing Storage Node devices
- Browsing or media database record issues.
- Server or Storage Node stops responding at startup or during regular operation.
- Misnamed or nested index directories
- Misconfigured client errors
Name Resolution Workflow
Attempts to resolve a hostname used by command or internal configuration must be resolved to an IP address in order to be used. The following resources are checked, in the following order, to see if the name:IP has already been cached, halting when the name is matched.
- NetWorker name cache: Most major NetWorker daemons; configurable lifetime in nsrla database
- Local host resolver cache: Varies by Operating System and defers load from hosts or DNS lookups
- Local hosts file entries: Fast local lookup, but manually maintained; useful to override DNS resolution
- DNS server lookups: Industry preferred due to centralized administration, but slower
1. NetWorker Caching:
NetWorker daemons maintain internal name caches. Clients cache resolved names in nsrexecd, while core daemons like nsrd and nsmmdbd keep their own caches. This is the first IP table checked, and the fastest. The internal cache lifetime can be set in each NetWorker hosts' nsrla database using nsradmin:
Linux/UNIX
printf ". type: nsrla\nshow positive DNS cache TTL; negative DNS cache TTL\nprint\n" | nsradmin -p nsrexec -s remote_host
Windows
(echo . type: nsrla & echo show positive DNS cache TTL; negative DNS cache TTL & echo print) | nsradmin -p nsrexec -s remote_host
Should return 30 minutes by default (1800 seconds):
positive DNS cache TTL: 1800; negative DNS cache TTL: 1800;
This value controls how long before NetWorker deliberately purges the process cache in favor of updated information from the next layers sequentially. As such, raising it is appropriate for environments where DNS lookup is slow, but DNS addressing is relatively static. Conversely, lower values may be desirable for environments with frequently-changing addresses.
If a required name is present in NetWorker's internal cache, it is used, and further query stops. For troubleshooting, if cached name-to-IP mappings seem wrong, use commands to log the current cache and then flush or re-resolve entries:
-
dbgcommand -n nsrd PrintDnsCache=1(Dump to daemon.raw)dbgcommand -n nsrd FlushDnsCache=1(Flush), or,dbgcommand -n nsrd FlushDnsCache=9(Flush and immediately re-resolve/rebuild cache)
-n process name" or "-p PID" can be used. To use the Process ID (PID), you must run other commands first to get the PID; for example:
-
- Linux/UNIX:
ps -ef | grep nsr - Windows:
tasklist | findstr nsr
- Linux/UNIX:
2. Resolver Cache:
ipconfig /displaydns on Windows), and all provide a way to flush it:
-
- Flushing resolver cache varies depending on OS/distribution - see vendor documentation.
- Windows:
ipconfig /flushdns
3. Hosts files:
-
- UNIX/Linux: /etc/hosts
- Windows: %systemroot%\System32\drivers\etc\hosts
4. Forward Resolution:
ipconfig /all to view them; on Linux/UNIX, check /etc/resolv.conf for DNS order. nslookup is the most common tool for querying DNS and exists on all platforms, but is frequently misused; to query the forward zone:
- Run
nslookupwith no arguments to enter the interactive prompt. - Enter the name iteration to lookup and press enter to retrieve forward record from the DNS server you have connected to.
- Enter the same name twice more to see if the name record is round-robining silently between different hosts, or returns the same data.
- Repeat the same process for any instance of any name that the host may be called by other hosts or regard itself as for the same IP address.
- Repeat the same process for any other DNS server that the host is configured to potentially use by entering server next_dns_server.
5. Reverse Resolution:
nslookup IP_Address or even entering the IP address in nslookup does not query the Reverse Lookup Zone:
-
Run
nslookupwith no arguments to enter the interactive prompt. - Enter set
q=ptrto change the query type to the Reverse Zone. - Enter the IP address to reverse resolve, and press enter.
- Ensure that the name that is returned in the reverse record matches the forward record name/IP.
[root@linux_a~]# nslookup linux_a
Server: 1.2.3.4
Address: 1.2.3.4#53
Name: linux_a.domain.com
Address: 5.6.7.8
[root@linux_a~]# nslookup 5.6.7.8
Server: 1.2.3.4
Address: 1.2.3.4#53
Name: linux_a.domain.com
Address: 5.6.7.8
[root@linux_a~]# nslookup
> set q=ptr
> 5.6.7.8
Server: 1.2.3.4
Address: 1.2.3.4#53
Non-authoritative answer:
8.7.6.5.in-addr.arpa name = linux_a.domain.com.
nslookup non-interactively never queries the reverse lookup zone.
NOTE: NetWorker relies on consistent forward and reverse DNS for authorization. This design helps prevent IP spoofing and protects backup data from unauthorized access.
Testing name resolution
All NetWorker hosts must have consistent forward and reverse name resolution for any host they communicate with, based on their Data Zone role. It is critical for NetWorker administrators to ensure that any host resolution problems are addressed immediately and completely.
When troubleshooting name resolution problems, or to rule them out in your NetWorker Data zone:
1. Find all hosts involved in the failing operation - Server, Clients, and possibly Storage Nodes, so forth.
2. For each determine the IP addresses configured locally and all expected resolvable names for those IPs.
3. Configure all hosts to use the hosts file before DNS for host resolution.
4. At the beginning of one hosts' hosts file, configure a single entry for each IP, with every name corresponding it on the same line.
5. Copy those lines exactly from the first host to the hosts files of the other involved hosts.
6. Edit the NetWorker client objects to have Aliases correctly corresponding to the wanted IPs.
7. Shut down NetWorker on all involved hosts.
8. Clear the resolver cache on each host using the appropriate operating system mechanism.
9. Restart NetWorker and attempt the problematic operation again.
To prove the name is resolved by a given host, use this test:
1. From the first NetWorker host (for example, the Client), connect to the second (for example, the Server) using nsradmin -s remote_host -p nsrexec - leave the session open.
2. On the same host, determine the process for nsradmin (for example, Windows, tasklist | findstr nsradmin)
3. Run netstat to show the socket associated with that process (for example, Windows, netstat -ao | findstr process_id)
4. Determine the connecting socket from that host (the leftmost IP:port pairing in the output)
5. On the remote host - run netstat -a and findstr/grep for :calling_port_from_first_host.
6. The hostname before the colon is how the second host resolves the first host when accepting the inbound connection.
7. Run again with the -n switch added to the netstat command to verify the IP of the same socket, to check if the IP/route is expected.
8. Reverse the same test to ensure that the second host is resolving the first host within expected parameters.
About NetWorker Client Aliases
NetWorker also has a configurable field which is global for all Client instances called 'Aliases', which should reflect all names resolvable for that client. This lets NetWorker link multiple resolved names to one Client instance. For example, client1.domain.prod may also appear as client1.domain.bkup or client1, depending on the IP used.
Additional Information
NetWorker operations like savegroup use multiple TCP sockets: one each for control, data, and index updates. If any socket uses an inconsistent (but valid) name, the operation may fail.
- Round-robining is sometimes deliberately used and configured - but usually is unexpected and to be avoided
netstat -areveals open/active TCP sockets, which reveal the OS-resolved name of the foreign host - this can be used to identify problems- Static routing may sometimes be necessary when network traffic uses an unexpected/unwanted adapter, which may later lead to name resolution issues.
See Also: NetWorker Processes and Ports