Start a Conversation

Unsolved

This post is more than 5 years old

2214

September 15th, 2010 09:00

DMZ Client Scheduled Backup Fails and Savegrp Cmd succeed.

I have a DMZ client which the port range 7937-9936 have been opened and the both client and backup server are able to communicate without any issue. The problem is the backup will fail randomly on the certain volume with 1 retry attempt error.. When i use savegrp -v -G < > cmd to run the backup, backup completed sucessfuly.. Any idea??

38 Posts

September 15th, 2010 10:00

Please include more of the error message if any.  Since it is a DMZ client, is there a time limit for connections.  Firewalls can be configured to limit connection time.

Attached is the firewall doc. Please check the O/S logs for clues.

1 Attachment

50 Posts

September 15th, 2010 18:00

Thanks a lot. I am not in very good knowledge on the firewall. i will find out from the log if any and provide update.

Today scheduled backup completed. Most probably due to the amount data to be backed up (jncremental backup) for each volume is less than 2GB. When the amount is more than 5GBand above. Some of the saveset will start failing with 1 retry attempt. So I think when this DMZ client runs the full backup, some of the saveset (got few saveset with more than 100GB)  will fail.I read an article regarding the connection and time out issue in firewall environment. And I think we may need to configure the TCP Keep Alive setting. But i dont know how to do it..

50 Posts

September 15th, 2010 22:00

Create this environment value on the client side ?? What is 20 meant? Do i need to restart the server??

I look at the firewall.doc sent ny MIkeToo and it looks like i need to edit the registry with something like below on the backup server and client side?? Any idea??

Windows

Change the registry key:

HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\ :DWORD=

KeepAlive parameters by operating system

Operating System

Parameter wait time before probing the connection

Parameter interval between retry probes

Parameter maximum retry probes

Unit of measure

AIX

tcp_keepidle

tcp_keepintvl

n/a

half-seconds

HP-UX 11i

tcp_time_wait_interval

tcp_keepalive_interval

tcp_keepalives_kill (1)

milliseconds

Linux

tcp_keepalive_time

tcp_keepalive_intvl

tcp_keepalive_probes

Seconds

Solaris

tcp_time_wait_interval

tcp_keepalive_interval

n/a

milliseconds

Windows

KeepAliveTime

KeepAliveInterval

TcpMaxDataRetransmission

milliseconds

55 Posts

September 15th, 2010 22:00

Hi,

   Apologies I misspelt the env variable. just go through the KB article - https://solutions.emc.com/emcsolutionview.asp?id=esg91994. You will get a clear idea on what has to be done.

HTH,

Rovin D'Souza.

55 Posts

September 15th, 2010 22:00

Hi,

     create an environmental variable “TCP_KEEPALIVE_WAIT” and assign it a value of “20”. In the group properties change the timeout value to “0”.

HTH,

Rovin D'Souza.

50 Posts

September 15th, 2010 22:00

Thanks Robin.. I will do that .. I assume the servers need to get rebooted after applying this

55 Posts

September 15th, 2010 22:00

If the machine in question is running windows then change in Enviornment variable will need a reboot to get the change applied.

38 Posts

September 17th, 2010 18:00

Hi, Has the issue been solved?

50 Posts

September 18th, 2010 19:00

Not yet.. Network team told us that they set 1800 secs for TCP Idle timeout. Therefore, i have edit the registry on the backup server and storage node to this value and rebooted the servers. I reran the backup but still the same issue. The savsets failed with 1 retry attempt. The daemon.log file didn't tell me anything. However, when i use cmd line, savegrp -v -G   to run the backup. Backup completed successfully. This tells me that scheduled backup is not working but cmd line is working. Very strange..

7341 9/18/2010 7:36:44 PM  2 0 0 4444 2108 0 ausbupc180 savegrp ausagileprds07:C:\ unexpectedly exited.
7339 9/18/2010 7:36:44 PM  2 0 0 4444 2108 0 ausbupc180 savegrp ausagileprds07:C:\ will retry 1 more time(s)
38714 9/18/2010 7:36:45 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausagileprds07:D:\ done saving to pool 'filesystem' (AUC892L2) 3014 MB
7336 9/18/2010 7:36:45 PM  2 0 0 4444 2108 0 ausbupc180 savegrp Log file D:\apps\networker\nsr\tmp\sg\FS_DMZ_0100\sso.000016 is empty.
Unable to render the following message: savegrp:FS_DMZ_0100 * ausagileprds07:D:\ Cannot determine status of backup process.  Use mminfo to determine job status.

Unable to render the following message: .

7341 9/18/2010 7:36:45 PM  2 0 0 4444 2108 0 ausbupc180 savegrp ausagileprds07:D:\ unexpectedly exited.
7339 9/18/2010 7:36:45 PM  2 0 0 4444 2108 0 ausbupc180 savegrp ausagileprds07:D:\ will retry 1 more time(s)
42504 9/18/2010 7:36:48 PM  2 0 0 3424 3420 0 ausbupc180 nsrd media event cleared: Waiting for 1 writable volumes to backup pool 'Server' tape(s) on ausbupc180.aus.amer.dell.com
38718 9/18/2010 7:36:48 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausbupc180.aus.amer.dell.com:index:ausagileprds01 saving to pool 'Server' (AUC746L2) 
12361 9/18/2010 7:36:52 PM  2 0 0 3424 3420 0 ausbupc180 nsrd [Jukebox `AUSVTLPC1A03', operation # 123]. Finished with status: succeeded
0 9/18/2010 7:36:54 PM  2 0 0 3808 3804 0 ausbupc180 nsrmmdbd              pools supported: Server;
38714 9/18/2010 7:37:02 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausbupc180.aus.amer.dell.com:index:ausagileprds01 done saving to pool 'Server' (AUC746L2) 41 MB
38714 9/18/2010 7:37:08 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausagileprds07:VSS SYSTEM FILESET:\ done saving to pool 'filesystem' (AUD153L2) 2780 MB
Unable to render the following message: savegrp:FS_DMZ_0100 * ausagileprds07:VSS SYSTEM FILESET:\  See the file D:\apps\networker\nsr\tmp\sg\FS_DMZ_0100\sso.000019 for output of the job.

7341 9/18/2010 7:37:08 PM  2 0 0 4444 2108 0 ausbupc180 savegrp ausagileprds07:VSS SYSTEM FILESET:\ failed.
38718 9/18/2010 7:37:09 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausagileprds07:H:\ saving to pool 'filesystem' (AUC892L2) 
38718 9/18/2010 7:37:14 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausagileprds07:F:\ saving to pool 'filesystem' (AUC892L2) 
0 9/18/2010 7:37:19 PM  2 0 0 3808 3804 0 ausbupc180 nsrmmdbd              pools supported: filesystem, Server;
38718 9/18/2010 7:37:36 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausagileprds07:E:\ saving to pool 'filesystem' (AUC892L2) 
42506 9/18/2010 7:37:38 PM  2 0 0 3424 3420 0 ausbupc180 nsrd media info: WORM capable for device
\\.\Tape1 has been set
42506 9/18/2010 7:37:38 PM  2 0 0 3424 3420 0 ausbupc180 nsrd media info: verification of volume "AUC746L2", volid 1919721379 succeeded.
42506 9/18/2010 7:37:38 PM  2 0 0 3424 3420 0 ausbupc180 nsrd write completion notice: Writing to volume AUC746L2 complete
42506 9/18/2010 7:37:41 PM  2 0 0 3424 3420 0 ausbupc180 nsrd media info: WORM capable for device
\\.\Tape2 has been set
42506 9/18/2010 7:37:51 PM  2 0 0 3424 3420 0 ausbupc180 nsrd media info: WORM capable for device
\\.\Tape2 has been set
42506 9/18/2010 7:37:51 PM  2 0 0 3424 3420 0 ausbupc180 nsrd media info: verification of volume "AUD153L2", volid 127161781 succeeded.
42506 9/18/2010 7:37:51 PM  2 0 0 3424 3420 0 ausbupc180 nsrd write completion notice: Writing to volume AUD153L2 complete
0 9/18/2010 7:48:37 PM  2 0 0 3424 3420 0 ausbupc180 nsrd Operation 124 started : Unload jukebox device `\\.\Tape1'..
0 9/18/2010 7:48:37 PM  2 0 0 3424 3420 0 ausbupc180 nsrd Operation 125 started : Unload jukebox device `\\.\Tape2'..
38752 9/18/2010 7:48:38 PM  2 0 0 3424 3420 0 ausbupc180 nsrd
\\.\Tape1 Eject operation in progress
38752 9/18/2010 7:48:38 PM  2 0 0 3424 3420 0 ausbupc180 nsrd
\\.\Tape2 Eject operation in progress
67986 9/18/2010 7:49:02 PM  2 0 0 4480 4476 0 ausbupc180 nsrmmgd Unloading volume `AUD153L2' from device `\\.\Tape2' to slot 241.
67986 9/18/2010 7:49:08 PM  2 0 0 4480 4476 0 ausbupc180 nsrmmgd Unloading volume `AUC746L2' from device `\\.\Tape1' to slot 35.
12361 9/18/2010 7:49:20 PM  2 0 0 3424 3420 0 ausbupc180 nsrd [Jukebox `AUSVTLPC1A03', operation # 125]. Finished with status: succeeded
12361 9/18/2010 7:49:28 PM  2 0 0 3424 3420 0 ausbupc180 nsrd [Jukebox `AUSVTLPC1A03', operation # 124]. Finished with status: succeeded
38714 9/18/2010 7:50:45 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausagileprds07:G:\ done saving to pool 'filesystem' (AUC892L2) 2851 MB
Unable to render the following message: savegrp:FS_DMZ_0100 * ausagileprds07:G:\  See the file D:\apps\networker\nsr\tmp\sg\FS_DMZ_0100\sso.00001b for output of the job.

7341 9/18/2010 7:50:45 PM  2 0 0 4444 2108 0 ausbupc180 savegrp ausagileprds07:G:\ failed.
38718 9/18/2010 7:51:09 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausagileprds07:C:\ saving to pool 'filesystem' (AUC892L2) 
38714 9/18/2010 7:53:35 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausagileprds07:F:\ done saving to pool 'filesystem' (AUC892L2) 2884 MB
Unable to render the following message: savegrp:FS_DMZ_0100 * ausagileprds07:F:\  See the file D:\apps\networker\nsr\tmp\sg\FS_DMZ_0100\sso.00001d for output of the job.

7341 9/18/2010 7:53:35 PM  2 0 0 4444 2108 0 ausbupc180 savegrp ausagileprds07:F:\ failed.
38714 9/18/2010 7:53:36 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausagileprds07:H:\ done saving to pool 'filesystem' (AUC892L2) 2968 MB
Unable to render the following message: savegrp:FS_DMZ_0100 * ausagileprds07:H:\  See the file D:\apps\networker\nsr\tmp\sg\FS_DMZ_0100\sso.00001c for output of the job.

7341 9/18/2010 7:53:36 PM  2 0 0 4444 2108 0 ausbupc180 savegrp ausagileprds07:H:\ failed.
38718 9/18/2010 7:53:59 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausagileprds07:D:\ saving to pool 'filesystem' (AUC892L2) 
38714 9/18/2010 7:54:00 PM  0 0 2 3424 3420 0 ausbupc180 nsrd ausagileprds07:E:\ done saving to pool 'filesystem' (AUC892L2) 2847 MB
Unable to render the following message: savegrp:FS_DMZ_0100 * ausagileprds07:E:\  See the file D:\apps\networker\nsr\tmp\sg\FS_DMZ_0100\sso.00001e for output of the job.

50 Posts

October 2nd, 2010 09:00

Hi all,

The DMZ backup isue fixed after we have updated the KeepAliveTime value on the backup server , storage nodes and also the client itself. Need to take note that we need to check know the TCP idle tomeout value on the firewall side and adjust the value accordingly. Thanks all

No Events found!

Top