Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

10995

September 5th, 2017 07:00

Clone Jobs failing - NW 9.1.1.3

Hello,

OS - Windows Server 2008 R2

2 - NW Servers - 9.1.1.3

2 - Backup Repository Data Domains DD670 OS: 5.6.2.0-543944

We Clone backups to the data domain to each office. Over a MPLS network (50MB).

Cloning was working fine prior to upgrading to 9.1.1.3. We were on version 8.2. The failures at this point seem to be random. Some days jobs will clone and other days they will not. Error Message I am receiving below.

2 Intern

 • 

14.3K Posts

September 5th, 2017 10:00

You can try to disable RPS cloning feature (in server properties) to see if that makes any difference.  As per logs, it says it gets timeout with your storage node... could be RPS is too aggressive (if used).

4 Posts

September 5th, 2017 07:00

9/4/2017 10:00:07 PM Action clone 'clone' has initialized as 'clone job' with job id 98770 9/4/2017 10:00:23 PM extend reservation timeout for reservation id(803). 9/4/2017 10:00:23 PM 09/04/17 22:00:23.636728 launching backend job on host stlpputl03.ccaglobal.local 9/4/2017 10:00:23 PM Launch backend job: job Id (98772) 9/4/2017 10:00:29 PM Backend started: job Id(98772). 9/4/2017 10:00:29 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 10:00:39 PM Send EOF to backend : job Id(98772). 9/4/2017 10:05:30 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 10:10:31 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 10:15:32 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 10:20:33 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 10:25:34 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 10:30:35 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 10:30:37 PM 98519:nsrrecopy: Unable to setup direct save with server stlpputl03: Failed to connect to storage node on stlpputl03.ccaglobal.local: nsrrecopy failed to authenticate with nsrmmd on host stlpputl03.ccaglobal.local: Timed out. 9/4/2017 10:33:00 PM 98519:nsrrecopy: Unable to setup direct save with server stlpputl03: retry needed. 9/4/2017 10:35:36 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 10:40:37 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 10:45:38 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 10:50:39 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 10:55:40 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 11:00:41 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 11:03:30 PM 98519:nsrrecopy: Unable to setup direct save with server stlpputl03: Failed to connect to storage node on stlpputl03.ccaglobal.local: nsrrecopy failed to authenticate with nsrmmd on host stlpputl03.ccaglobal.local: Timed out. 9/4/2017 11:05:42 PM extend reservation timeout for backend: job Id(98772). 9/4/2017 11:06:13 PM 98519:nsrrecopy: Unable to setup direct save with server stlpputl03: retry needed. 9/4/2017 11:06:43 PM 108635:nsrrecopy: Unable to initialize save session with stlpputl03 for STLSQLDW02:: retry needed 9/4/2017 11:06:43 PM retry needed5792:nsrrecopy: Failed : Destination volume not ready for cloning 9/4/2017 11:06:43 PM 108643:nsrrecopy: Cannot start save session 9/4/2017 11:06:56 PM Backend exited: job Id(98772). 9/4/2017 11:06:56 PM [ORIGINAL REQUESTED SAVESETS] 9/4/2017 11:06:56 PM 2964197342/1504579551, 2980974558/1504579551, 9/4/2017 11:06:56 PM 2947420126/1504579551, 3014528989/1504579550, 9/4/2017 11:06:56 PM 3031306205/1504579550, 3081637853/1504579547, 9/4/2017 11:06:56 PM 2930642916/1504579556, 2880311274/1504579562, 9/4/2017 11:06:56 PM 2997751774/1504579551, 2813202550/1504579703, 9/4/2017 11:06:56 PM 2829979761/1504579698, 3098415069/1504579547, 9/4/2017 11:06:56 PM 2863534058/1504579563, 2796425584/1504579952, 9/4/2017 11:06:56 PM 2846756842/1504579563, 2779648374/1504579958, 9/4/2017 11:06:56 PM 2913865700/1504579556, 2762871247/1504580048, 9/4/2017 11:06:56 PM 2897088484/1504579557, 2746094037/1504580054, 9/4/2017 11:06:56 PM 3048083421/1504579547, 3064860637/1504579547, 9/4/2017 11:06:56 PM 2712539943/1504580391, 2729317153/1504580385; 9/4/2017 11:06:56 PM [CLONE FAILED SAVESETS] 9/4/2017 11:06:56 PM 2964197342/1504579551, 2980974558/1504579551, 9/4/2017 11:06:56 PM 2947420126/1504579551, 3014528989/1504579550, 9/4/2017 11:06:56 PM 3031306205/1504579550, 3081637853/1504579547, 9/4/2017 11:06:56 PM 2930642916/1504579556, 2880311274/1504579562, 9/4/2017 11:06:56 PM 2997751774/1504579551, 2813202550/1504579703, 9/4/2017 11:06:56 PM 2829979761/1504579698, 3098415069/1504579547, 9/4/2017 11:06:56 PM 2863534058/1504579563, 2796425584/1504579952, 9/4/2017 11:06:56 PM 2846756842/1504579563, 2779648374/1504579958, 9/4/2017 11:06:56 PM 2913865700/1504579556, 2762871247/1504580048, 9/4/2017 11:06:56 PM 2897088484/1504579557, 2746094037/1504580054, 9/4/2017 11:06:56 PM 3048083421/1504579547, 3064860637/1504579547, 9/4/2017 11:06:56 PM 2712539943/1504580391, 2729317153/1504580385; 9/4/2017 11:06:57 PM Action clone 'clone' with job id 98770 is exiting with status 'failed', exit code 1 9/4/2017 11:06:57 PM NSRCLONE failed for one or more savesets. 9/4/2017 11:06:57 PM Clone operation failed for one or more save sets.

4 Posts

September 5th, 2017 10:00

We disabled rps in the past due to a different networker version we were on. Disabling this feature slowed things down. We were on networker 9.1.0.2 and it was recommend to go to 9.1.1.3 so we could use this feature.

What about increasing the “Save Mount Timeout” ? Would this help?

2 Intern

 • 

14.3K Posts

September 7th, 2017 11:00

save mount timeout is something that was used in the past with tapes - I don't think this is related to your case.  Disabling RPS will give you normal (default) cloning speeds.  In this case, just try to disable it to see if that makes any difference.  If it does, then you know you isolated an issue to certain code range where problem is coming from and this can speed up work with engineering when requesting the fix.

1 Rookie

 • 

22 Posts

September 13th, 2017 14:00

We are seeing some of these same messages on 9.1.1.3 and related to enabling RPS, but we're backing up to tape.

We also tried enable RPS in another 9.1.0.5 environment and were unable to do DD to DD clone, and disabling again allowed it to work.

Still trying to come to a better understanding as to how RPS plays a role in all of this.


I think EMC shipped this feature way too prematurely.

6 Posts

September 13th, 2017 14:00

Hi there !

Same problem here, 9.1.1.3 server under Linux RHEL.

Clones are hanging, and messages "failed to authenticate with storage node" and time out with server can be found in the server's logs.

RPS cloning is already disabled.

Last time, i had to restart NW service on the storage node (and kill a process because it wouldnt stop) and clones finished successfuly.

You can try that...

But I don't understand why the server cannot authenticate with SN. On SN's logs, no auth failures or things like that...

4 Posts

September 14th, 2017 06:00

I disabled RPS and rebooted both NW servers. Now everything is cloning fine (for now). And it has been cloning successfully for a week now. I have other small issues about once a week unrelated to cloning. I will be glad when we have money in the budget to go to Veeam. This new networker software seems to have a lot more bugs than i would normally like to see. It has taken me over a month to get backups running with minimal issues. Backups should not be this hard!

Thanks to all of those that replied.

No Events found!

Top