This post is more than 5 years old
1 Rookie
•
11 Posts
0
3684
June 12th, 2012 13:00
Networker 7.5.4 on Win 2k8 R2 file server - one volume fails consistently.
Greetings all,
A coworker and I are stumped about this one, so we thought to ask the community for guidance. We have a Windows 2008 R2 file server that will no longer backup one filesystem. The other 14 run great and are successful for both incremental and full backups - but not this one. The volume in question is our S: drive. It's 680 GB and ~ 3 million files, and as of two weeks ago it backed up fine. Then we had a network outage in that datacenter and ever since then it will not backup the S: drive. We're told the network is fine, and tend to believe them since the rest of the server backs up fine.
Last Friday, I rebooted the server after hours, but no change - the S: drive will not run to completion. Networker starts to backup the drive, as it does the entire server, but then as the other volumes run fine, the S: drive slows and stops after it's backed up 46 GB every time. We've seen it do this now nearly six times. It doesn't matter if we back up the server, or just that volume. It starts, runs great, only to slow down and die at 46 GB total backed up out of 680 GB.
Windows is not reporting any NTFS issues with the volume, we just get an appcrash on save.exe every time. I've attached both our job reports and the appcrash. Any suggestions?
Thank you!
Phil
0 events found


nmc2
268 Posts
0
June 14th, 2012 20:00
It looks like there might be some corrupt file/directory which is causing problem.
Try running backup from the client using below command,
Syntax:
save -vvv -D3 -s -b
Here in this command debug and verbose mode set to 3. Verify the command output and identify where (which file) its getting stuck/fails.
Regards,
Prajith
nmc2
268 Posts
1
June 12th, 2012 22:00
Hello Phil,
I think this might be a connection timeout issue. During backup connection being timeout.
Try Setting Keep Alive Time and nsr keep alive wait on client.
Keep Alive Time:
On client machine,
Start > Run > Regedit > go to:
HKLM\System\CurrentControlSet\Services\TCPIP\Parameters\KeepAliveTime
And set the value to 3,360 000
NSR keep alive:
To set environment variables in Microsoft Windows:
View the system variables.
a. From the Start menu, select Settings>Control Panel>System.
b. Click the Advanced tab.
c. Click Environment Variables.
d. In the Environment Variables dialog box, below System Variables, click New.
In the Name field enter NSR_KEEPALIVE_WAIT, In the Value field enter 10. This will set the keepalive_wait to 10 seconds.Click OK.
Refer this link for more details,
http://solutions.emc.com/EMCSolutionView.asp?id=esg91994&usertype=C
Let us know if it helps.
Regards,
Prajith
fx8282
38 Posts
1
June 13th, 2012 02:00
Change the keepalive timeout to 10,000 in your networker startup script only on your backup server and restart the networker service.
In case this doesn't work, please share the daemon.raw output from the backup server and client
reissp
1 Rookie
•
11 Posts
0
June 14th, 2012 13:00
We've tried both the networker server and client timeouts to no success. There was one breakthough today though. The root folder contains a series of subfolders. Our storage admin split the job up so there are multiple saves running. They're running fine, whereas one reached 46 GB of data and then slowed and stopped - while the others continued. It seems there something in that folder of 3000 subfolders that is conflcting. We're still working on it.
reissp
1 Rookie
•
11 Posts
0
June 18th, 2012 08:00
Greetings all. Prajith, my co-worker ran the networker job verbose and we found a folder, right at 46 GB that stopped the backup. He cautioned that the command you offered generated large log files, but a similar method worked great. I created a nsr.dir to skip that folder and the job is running! The folder resists all attempts to remove it, but at least with a directive in place backups run. Thank you all.