NetWorker: RPC Errors on NetWorker DDBoost Backup Devices

Summary: Data Domain with active session observes communication issues which make the Data Domain devices enter a stale state. Even if there is nothing writing, it holds the not working session and will not release the session info used from the media management database. All devices associated with the same pool and storage node are unable to accept any more session; no backup or clone job are accepted by the impacted devices resulting in RPC errors. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

  • All Backup/Clone operations on the NetWorker Server would Just remain in Hung/Queued status as the DDBOOST devices were in Unmounted status and the following error would be seen in the Action logs: 
Failed to get the username and password for device <DDBOOST Device Name>; RPC send operation failed; errno = Broken pipe
 
  • Backup of save set failed due to unrecoverable errors is one of the significant error message.
  • Data Domain devices get unmounted during the Backup window, though there are no connectivity issues between the NetWorker and Data Domain.
  • Corruption in NetWorker Jobs Database which leads to inconsistency between the NetWorker Backup Application and the Jobs Database

Cause

RPC Errors on the Data Domain DDBOOST backup devices should be periodically monitored and corrective action to be taken else the Backup/Clone Jobs just remain in waiting status and even there is nothing writing it holds the not working session and will not release the session info used from the media management database.

Resolution

NetWorker Services restart can be done in the first instance, and we could notice that the issue occurs again as the underlying RPC errors on the DDBOOST devices still persist. 

Time out Values on the NetWorker Storage nodes must be fine-tuned as per the NetWorker Backup environment considerations and NetWorker device optimization guide.

On a Linux Networker Storage Node, Follow the below to set the appropriate TCP Keep Alive Timeout values as shown.
1. Switch to root: sudo su -
2. Run the following commands to modify tcp_keepalive settings:
# echo 700 > /proc/sys/net/ipv4/tcp_keepalive_time
# echo 10 > /proc/sys/net/ipv4/tcp_keepalive_intvl
# echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes
On a Windows Networker Storage Node, Follow the below to set the appropriate TCP Keep Alive Timeout values as shown:
1. Backup the Windows registry before making changes.
2. Navigate to " HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters ".
3. Create a new REG-DWORD entry named " KeepAliveTime ".
4. Give it a value of Decimal 900000 (15 minutes) .
5. A reboot is required to make the new value active.

NOTE: Additional TCP tuning is detailed in the NetWorker Performance Optimization Planning Guide, available through https://www.dell.com/support/home/product-support/product/networker/docs.

Storage Node NetWorker services must be restarted once the above parameters are defined. Ensure that these values remain consistent across NetWorker storage node restarts.

Also any inconsistency on the NetWorker Backup Application in relation to nsrmmd process on the NetWorker storage node has to be eliminated by performing the steps mentioned below.

1. Stop the NetWorker services on the Backup server.

Linux: nsr_shutdown
Windows: net stop nsrexecd /y

2. Rename the /nsr/res/jobsdb, /nsr/logs/daemon.raw, and /nsr/tmp folders on the NetWorker Server.
3. Restart the NetWorker Services again on the Backup server, this will now reinitialize the NetWorker save operations and no further RPC connection reset and inactivity timeout values would be noticed.

Linux: systemctl start networker
Windows: net start nsrd

      if NMC is installed on the NetWorker server: net start gstd

Article Properties
Article Number: 000217738
Article Type: Solution
Last Modified: 15 Nov 2023
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.