Unsolved
This post is more than 5 years old
6 Posts
1
6105
January 11th, 2008 12:00
Lost connection to server, exiting
Running into a very strange issue, wanted to see if anyone else has experienced something like this before.
Backups are failing with a "Lost connection to server, exiting". Near as I can tell, it simply drops the connect. Looking at the network, we are not seeing errors anywhere along the chain (no big packet loss, overflow, port resets), everything is hard coded 100/full end to end, speeds are fantastic when the backup is running, no firewall in place. Nothing that indicates the port is being torn down, yet that appears to be the case.
Shorter/smaller backups appear to run fine, though no real rhyme or reason when they do fail. A daily backup that gets roughly 20-30 GB a day always runs fine. Generally, they need to push 50+ GB of data, though usually more in the 80-100 GB range before it dies. A file-by-file batch job from the client ran without a hitch (each file being a filesystem backup), so I don't think we have a bad file.
Server is running Solaris 8, Networker 7.3.2.
Client was upgraded to Solaris 10, Networker 7.1.3, then upgraded to Networker 7.3.3.
Before this problem cropped up, the client was running Solaris 8 with Networker 7.1.3 without incident. Nothing else has been changed except the client hardware/OS upgrade (ok, not a minor change ).
Currently we have a ticket open with Legato, and they have been going over logs, even a -D9 log, they are still scratching their heads. Hope to get a product specialist on the case soon, but figured I'd post here too, in case someone else has fought strange-ness like this before.
Dave
Backups are failing with a "Lost connection to server, exiting". Near as I can tell, it simply drops the connect. Looking at the network, we are not seeing errors anywhere along the chain (no big packet loss, overflow, port resets), everything is hard coded 100/full end to end, speeds are fantastic when the backup is running, no firewall in place. Nothing that indicates the port is being torn down, yet that appears to be the case.
Shorter/smaller backups appear to run fine, though no real rhyme or reason when they do fail. A daily backup that gets roughly 20-30 GB a day always runs fine. Generally, they need to push 50+ GB of data, though usually more in the 80-100 GB range before it dies. A file-by-file batch job from the client ran without a hitch (each file being a filesystem backup), so I don't think we have a bad file.
Server is running Solaris 8, Networker 7.3.2.
Client was upgraded to Solaris 10, Networker 7.1.3, then upgraded to Networker 7.3.3.
Before this problem cropped up, the client was running Solaris 8 with Networker 7.1.3 without incident. Nothing else has been changed except the client hardware/OS upgrade (ok, not a minor change ).
Currently we have a ticket open with Legato, and they have been going over logs, even a -D9 log, they are still scratching their heads. Hope to get a product specialist on the case soon, but figured I'd post here too, in case someone else has fought strange-ness like this before.
Dave
No Events found!


ble1
6 Operator
•
14.4K Posts
•
56.2K Points
0
January 27th, 2008 12:00
shareef2
21 Posts
0
April 4th, 2010 05:00
hi
it is quite funny to reply 2 years later
but i just wanted to know if the root cause have been discovered
and i have a question about your platform ; is your client Sparc based or x86
thanks
Shareef
nandfred1
10 Posts
0
September 20th, 2011 03:00
Another year and a half have gone by
Well I havent found root cause, we are seeing the same stuff.
Windows 2008R2 Server.
Linux - Windows - AIX storage nodes.
The lost connections seem to be only towards AIX and Linux clients, and it appears as if it is the control connection to the index server..
But we cant for the death of us, find out where. Network says its the client that resets its control connections.
we tried with out firewalls, sames result. Directly to the backup server, ie. no storage node involved, no difference.
Right now we are testing a linux backup server, see if that helps, and a windows backup server at the same site. maybe that helps, if only to narrow down where to look
nandfred1
10 Posts
0
September 26th, 2011 05:00
Heureka
or whatever the saying goes, it was there all the time. Page 80 in the 7.6 SP2 release notes
Disable TCP chimney by running the following command:
netsh int tcp set global chimney=disabled
That seemed to have worked in our setup...ra298
1 Rookie
•
18 Posts
0
November 27th, 2011 03:00
That a fix for Windows. I have the same problem with Linux that I'm facing now.
Daniea3
123 Posts
0
January 12th, 2012 00:00
Hi rha,
On Linux, is it the mount point that fails with the above error?
Regards,
Arun
vermad1
36 Posts
0
January 12th, 2012 02:00
The error itself says that there is some communication problem. Please check TCP/IP settings and add TCP keepalive to the client. Also check if network has some issues.
mridul_singh
57 Posts
0
January 16th, 2012 03:00
TCP chimney applies for Linux as well. You can use ethtool command to check the settings and see if tcp chimney is enabled. If so, disable it and that shoud do the trick