Start a Conversation

Unsolved

This post is more than 5 years old

35325

November 12th, 2012 09:00

RPC send operation failed; errno = An existing connection was forcibly closed by the remote host.

Hello,

We are recently getting RPC errors with our backups, which we didn't get before.

Environment:

Server/Storage Node: NetWorker 7.4.5.10.Build.810 Enterprise Edition on Sun Solaris 9

Tape Library: Sun StorageTek SL500 - 5xLTO3 tape drives

Clients: 7.6.4.3.Build.1070, 7.4.5.Build.758, 7.4.3.Build.569 on Windows 2003 and Windows 2008 servers.

We also have Linux and Solaris clients but this problem does not occur on those clients.

The error messages for the savesets that fail are the following:

39078:save: RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host.

39078:save: RPC error: RPC send operation failed.  A network connection could not be established with the host.

74209:save: Quit signal received.

Here are some examples from the NetWorker savegroup completion logs:

--- Unsuccessful Save Sets ---

* win2008-yx01:VSS SYSTEM FILESET:\ 1 retry attempted

* win2008-yx01:VSS SYSTEM FILESET:\ 39078:save: RPC error: RPC send operation failed.  A network connection could not be established with the host.

* win2008-yx01:VSS SYSTEM FILESET:\

* win2008-yx01:VSS SYSTEM FILESET:\ System Writer - ERROR: Failed to save FileGroup files, writer = System Writer

* win2008-yx01:VSS SYSTEM FILESET:\ System Writer - Error saving writer System Writer

* win2008-yx01:VSS SYSTEM FILESET:\ System Writer - ERROR: Aborting backup of saveset VSS SYSTEM FILESET: because of the error with writer System Writer.

* win2008-yx01:VSS SYSTEM FILESET:\ System Writer - Error saving

* win2003-db02:D:\ORACLE 1 retry attempted

* win2003-db02:D:\ORACLE 39078:save: RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host.

* win2003-db02:D:\ORACLE

* win2003-db02:D:\ORACLE

* win2003-db02:D:\ORACLE 74209:save: Quit signal received.

     * :  Failed with error(s)

* win2008-db05:D:\ 1 retry attempted

* win2008-db05:D:\ 39078:save: RPC error: RPC send operation failed.  A network connection could not be established with the host.

* win2008-db05:D:\

     * :  error while saving \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy8\Inetpub\catalog.wci\0001000E.ci

* win2008-db05:C:\ 1 retry attempted

* win2008-db05:C:\ 39078:save: RPC error: RPC send operation failed.  A network connection could not be established with the host.

* win2008-db05:C:\

* win2008-db05:C:\ 5195:save: save failed on \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy7\WINSRV\$NtServicePackUninstall$\reg00254

     * :  error while saving \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy7\WINSRV\$NtServicePackUninstall$\reg00254

* win2003-db21:G:\ 1 retry attempted

* win2003-db21:G:\ 39078:save: RPC error: RPC send operation failed.  A network connection could not be established with the host.

* win2003-db21:G:\

     * :  error while saving \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy633\Backup\WPMSharepoint.bak

* win2003-db21:C:\ 1 retry attempted

* win2003-db21:C:\ 39078:save: RPC error: RPC send operation failed.  A network connection could not be established with the host.

* win2003-db21:C:\

     * :  error while saving \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy634\hp\hpdiags\tcstorage.dll

* win2003-file01:D:\ 1 retry attempted

* win2003-file01:D:\ 39078:save: RPC error: RPC send operation failed.  A network connection could not be established with the host.

* win2003-file01:D:\

     * :  error while saving \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy330\APPS\Ecl\2004a_1\frontsim\2002a_1\old\Eclipse\2002a_1\flogrid\tutorials\RESCUE\cloudspin\cloudspin.bin

* win2003-lotus01:C:\ 1 retry attempted

* win2003-lotus01:C:\ 39078:save: RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host.

* win2003-lotus01:C:\

* win2003-lotus01:C:\

* win2003-lotus01:C:\ 74209:save: Quit signal received.

Thank you very much for your help/contribution in the resolution of this problem.

544 Posts

November 12th, 2012 13:00

Hi Celio,

Additionally, I recommend you to delete the nsr peer information and clear the tmp directory on the client, Please follow the steps given below to delete the NSR peer information on NetWorker Server and on the Client.

  1. At NetWorker server command line, go to the location /nsr/res
  2. Type the command: 
    nsradmin -p nsrexec
    print type:nsr peer information; name:client_name
    delete
    y

   (specify the name of the client in the place of client_name) 

  1. At the client command line, go to the location /nsr/res
  2. Type the command:
    nsradmin -p nsrexec
    print type:nsr peer information
    delete
    y

From the client side, stop the NetWorker services and then rename the tmp directory under \nsr to tmp.old and then start the services again.

Hope this helps as well.

Ahmed Bahaa

544 Posts

November 12th, 2012 13:00

Hi Celio ,

Would you please check the vss writers on those machines by running the following command:

vssadmin list writers

Ensure that all writers are in stable with no error state.

Secondly, Run some communication checks from the backup server to the client and vice versa using the nslookup and rpcinfo, from the backup server to the networker client ( using short name and FQDN name):

  • nslookup
  • rpcinfo -p

  And the following from the client to the backup server ( using short name and FQDN name as well): 

  • check server file under the nsr directory if the backup server is added there
  • nslookup
  • rpcinfo -p

If all above check are correct and naming resolution is correct, You have to add the VSS:*=off in the save operations field for the Windows 2003 clients. So from NMC navigate to Configurations then client and choose which machines (Windows 2003) are failing for these errors and modify the save operations field under tha apps and modules tab, where you have to add the VSS:*=off

While for Windows 2008 clients, modify the save command field with

save -a '"ignore-all-missing-system-files=yes"'

Re-run the backups again. Waiting your updates.

Ahmed Bahaa

16 Posts

November 14th, 2012 02:00

Hello Ahmed,

Thank you very much for your help.

I've tried all your suggested steps:

  • name resolution verification - OK
  • rpcinfo -p - OK
  • "VSS:*=off" for Windows 2003 clients - DONE
  • save -a '"ignore-all-missing-system-files=yes"' for Windows 2008 clients - DONE
  • delete the nsr peer information and renaming the client's nsr/tmp folder - DONE

But unfortunately the problem persists.

39078:save: RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host.


74209:save: Quit signal received.

2 Attachments

16 Posts

November 14th, 2012 02:00

Hi,

Yes, I forgot to mention that I disabled the AntiVirus (Symantec Endpoint Protection) before testing again.

No, there is no firewall between the networker server and client, otherwise the rpcinfo wouldn't work.

I will try the registry keys for TCP/IP and I will get back to you.

Thanks again!

16 Posts

November 14th, 2012 02:00

Do you have to reboot in order for the registry parameters to be effective?

89 Posts

November 14th, 2012 02:00

Hi,

1- For test purpose, disable AntiVirus

2- If still have the same error after that, then do you have Firewall? Eitherway, set the following REGISTRY keys:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
KeepAliveTime=3420000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
TcpWindowSize=256000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
GlobalMaxTcpWindowSize=16777216

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
KeepAliveInterval=1000

Please confirm..

Regards,

Mustafa

89 Posts

November 14th, 2012 03:00

Yes, reboot is required

Set the above on both Backup Server & Client

Mustafa

16 Posts

November 14th, 2012 03:00

Unfortunatly I cannot reboot the client as it is a production server.

The Backup Server is on Sun Solaris.

Thanks.

16 Posts

November 14th, 2012 07:00

I'm suspecting that the problem is related to open files. What do you think? Is there a way to backup open files or simply skip them?

Thanks again.

2 Posts

April 15th, 2013 10:00


We have networker running on Windows 2008 servers.  we are having  the same RPC errors occurring across 3 different backup servers involving multiple servers at random.  A restart will complete successfully in many instances.  The additional problem we are having is that tapes are being marked full prematurely when this occurs and in other instances we are getting RPC/CRC errors involving the tape drives.  It started in the last couple of months but has gotten much worse recently.  I am going to see if the registry edits can be done by the server team.  Will update shortly.

38 Posts

April 18th, 2013 00:00

Besides firewalls... are the client and backup server on different network zones??

2.4K Posts

April 18th, 2013 03:00

On the NW client,

  - verify that in the /nsr/res/servers file the server name is listed

  - restart the client daemon

2 Posts

April 19th, 2013 05:00

Yes, they are on different subnets if that is what you are asking.

Harry Ankerbrand

LSUHSC Enterprise Network Operations Help Desk

504-568-4357

800-303-3290

14.3K Posts

April 19th, 2013 06:00

My observation with this issue in my environment so far has been tied to moments when backup server is too busy - reducing retention on jobdb did a trick since 1h interval cleanup had less data to process and haven't seen it since (7.6.x code).

38 Posts

April 21st, 2013 21:00

ok.. we too are facing similar issues on and off.. and the interzone firewall has allow all rule configured so thats as good as no firewall... however we fixed the issue by setting the keepalive time interval at 10,000...

No Events found!

Top