Unsolved
This post is more than 5 years old
16 Posts
2
35687
RPC send operation failed; errno = An existing connection was forcibly closed by the remote host.
Hello,
We are recently getting RPC errors with our backups, which we didn't get before.
Environment:
Server/Storage Node: NetWorker 7.4.5.10.Build.810 Enterprise Edition on Sun Solaris 9
Tape Library: Sun StorageTek SL500 - 5xLTO3 tape drives
Clients: 7.6.4.3.Build.1070, 7.4.5.Build.758, 7.4.3.Build.569 on Windows 2003 and Windows 2008 servers.
We also have Linux and Solaris clients but this problem does not occur on those clients.
The error messages for the savesets that fail are the following:
39078:save: RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host.
39078:save: RPC error: RPC send operation failed. A network connection could not be established with the host.
74209:save: Quit signal received.
Here are some examples from the NetWorker savegroup completion logs:
--- Unsuccessful Save Sets ---
* win2008-yx01:VSS SYSTEM FILESET:\ 1 retry attempted
* win2008-yx01:VSS SYSTEM FILESET:\ 39078:save: RPC error: RPC send operation failed. A network connection could not be established with the host.
* win2008-yx01:VSS SYSTEM FILESET:\
* win2008-yx01:VSS SYSTEM FILESET:\ System Writer - ERROR: Failed to save FileGroup files, writer = System Writer
* win2008-yx01:VSS SYSTEM FILESET:\ System Writer - Error saving writer System Writer
* win2008-yx01:VSS SYSTEM FILESET:\ System Writer - ERROR: Aborting backup of saveset VSS SYSTEM FILESET: because of the error with writer System Writer.
* win2008-yx01:VSS SYSTEM FILESET:\ System Writer - Error saving
* win2003-db02:D:\ORACLE 1 retry attempted
* win2003-db02:D:\ORACLE 39078:save: RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host.
* win2003-db02:D:\ORACLE
* win2003-db02:D:\ORACLE
* win2003-db02:D:\ORACLE 74209:save: Quit signal received.
* : Failed with error(s)
* win2008-db05:D:\ 1 retry attempted
* win2008-db05:D:\ 39078:save: RPC error: RPC send operation failed. A network connection could not be established with the host.
* win2008-db05:D:\
* : error while saving \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy8\Inetpub\catalog.wci\0001000E.ci
* win2008-db05:C:\ 1 retry attempted
* win2008-db05:C:\ 39078:save: RPC error: RPC send operation failed. A network connection could not be established with the host.
* win2008-db05:C:\
* win2008-db05:C:\ 5195:save: save failed on \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy7\WINSRV\$NtServicePackUninstall$\reg00254
* : error while saving \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy7\WINSRV\$NtServicePackUninstall$\reg00254
* win2003-db21:G:\ 1 retry attempted
* win2003-db21:G:\ 39078:save: RPC error: RPC send operation failed. A network connection could not be established with the host.
* win2003-db21:G:\
* : error while saving \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy633\Backup\WPMSharepoint.bak
* win2003-db21:C:\ 1 retry attempted
* win2003-db21:C:\ 39078:save: RPC error: RPC send operation failed. A network connection could not be established with the host.
* win2003-db21:C:\
* : error while saving \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy634\hp\hpdiags\tcstorage.dll
* win2003-file01:D:\ 1 retry attempted
* win2003-file01:D:\ 39078:save: RPC error: RPC send operation failed. A network connection could not be established with the host.
* win2003-file01:D:\
* : error while saving \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy330\APPS\Ecl\2004a_1\frontsim\2002a_1\old\Eclipse\2002a_1\flogrid\tutorials\RESCUE\cloudspin\cloudspin.bin
* win2003-lotus01:C:\ 1 retry attempted
* win2003-lotus01:C:\ 39078:save: RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host.
* win2003-lotus01:C:\
* win2003-lotus01:C:\
* win2003-lotus01:C:\ 74209:save: Quit signal received.
Thank you very much for your help/contribution in the resolution of this problem.
Bebo2k
544 Posts
0
November 12th, 2012 13:00
Hi Celio,
Additionally, I recommend you to delete the nsr peer information and clear the tmp directory on the client, Please follow the steps given below to delete the NSR peer information on NetWorker Server and on the Client.
nsradmin -p nsrexec
print type:nsr peer information; name:client_name
delete
y
(specify the name of the client in the place of client_name)
nsradmin -p nsrexec
print type:nsr peer information
delete
y
From the client side, stop the NetWorker services and then rename the tmp directory under \nsr to tmp.old and then start the services again.
Hope this helps as well.
Ahmed Bahaa
Bebo2k
544 Posts
0
November 12th, 2012 13:00
Hi Celio ,
Would you please check the vss writers on those machines by running the following command:
vssadmin list writers
Ensure that all writers are in stable with no error state.
Secondly, Run some communication checks from the backup server to the client and vice versa using the nslookup and rpcinfo, from the backup server to the networker client ( using short name and FQDN name):
And the following from the client to the backup server ( using short name and FQDN name as well):
If all above check are correct and naming resolution is correct, You have to add the VSS:*=off in the save operations field for the Windows 2003 clients. So from NMC navigate to Configurations then client and choose which machines (Windows 2003) are failing for these errors and modify the save operations field under tha apps and modules tab, where you have to add the VSS:*=off
While for Windows 2008 clients, modify the save command field with
save -a '"ignore-all-missing-system-files=yes"'
Re-run the backups again. Waiting your updates.
Ahmed Bahaa
Celio1
16 Posts
0
November 14th, 2012 02:00
Hello Ahmed,
Thank you very much for your help.
I've tried all your suggested steps:
But unfortunately the problem persists.
39078:save: RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host.
74209:save: Quit signal received.
2 Attachments
14-11-2012 11-13-36.png
14-11-2012 11-10-13.png
Celio1
16 Posts
0
November 14th, 2012 02:00
Hi,
Yes, I forgot to mention that I disabled the AntiVirus (Symantec Endpoint Protection) before testing again.
No, there is no firewall between the networker server and client, otherwise the rpcinfo wouldn't work.
I will try the registry keys for TCP/IP and I will get back to you.
Thanks again!
Celio1
16 Posts
0
November 14th, 2012 02:00
Do you have to reboot in order for the registry parameters to be effective?
NWadmin1
89 Posts
1
November 14th, 2012 02:00
Hi,
1- For test purpose, disable AntiVirus
2- If still have the same error after that, then do you have Firewall? Eitherway, set the following REGISTRY keys:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
KeepAliveTime=3420000
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
TcpWindowSize=256000
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
GlobalMaxTcpWindowSize=16777216
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
KeepAliveInterval=1000
Please confirm..
Regards,
Mustafa
NWadmin1
89 Posts
0
November 14th, 2012 03:00
Yes, reboot is required
Set the above on both Backup Server & Client
Mustafa
Celio1
16 Posts
0
November 14th, 2012 03:00
Unfortunatly I cannot reboot the client as it is a production server.
The Backup Server is on Sun Solaris.
Thanks.
Celio1
16 Posts
0
November 14th, 2012 07:00
I'm suspecting that the problem is related to open files. What do you think? Is there a way to backup open files or simply skip them?
Thanks again.
Hanker2
2 Posts
0
April 15th, 2013 10:00
We have networker running on Windows 2008 servers. we are having the same RPC errors occurring across 3 different backup servers involving multiple servers at random. A restart will complete successfully in many instances. The additional problem we are having is that tapes are being marked full prematurely when this occurs and in other instances we are getting RPC/CRC errors involving the tape drives. It started in the last couple of months but has gotten much worse recently. I am going to see if the registry edits can be done by the server team. Will update shortly.
fx8282
38 Posts
0
April 18th, 2013 00:00
Besides firewalls... are the client and backup server on different network zones??
bingo.1
2.4K Posts
0
April 18th, 2013 03:00
On the NW client,
- verify that in the /nsr/res/servers file the server name is listed
- restart the client daemon
Hanker2
2 Posts
0
April 19th, 2013 05:00
Yes, they are on different subnets if that is what you are asking.
Harry Ankerbrand
LSUHSC Enterprise Network Operations Help Desk
504-568-4357
800-303-3290
ble1
2 Intern
2 Intern
•
14.3K Posts
0
April 19th, 2013 06:00
My observation with this issue in my environment so far has been tied to moments when backup server is too busy - reducing retention on jobdb did a trick since 1h interval cleanup had less data to process and haven't seen it since (7.6.x code).
fx8282
38 Posts
0
April 21st, 2013 21:00
ok.. we too are facing similar issues on and off.. and the interzone firewall has allow all rule configured so thats as good as no firewall... however we fixed the issue by setting the keepalive time interval at 10,000...