Clone fails immediately

Question

Hi community,

I configured a lot of clone sessions within networker.

Most of the clone jobs work fine but I got also some clone jobs which fail after a couple of seconds.

Strange thing: Networker backed up a group called "group01" yesterday. When I try to clone this group "group01", it won't work although I define the option "Include save set from the previous 7 days".

Inside the clone object I selected the networker group and the pool where the clone should reside. Other clone jobs with the same configuration operate succesfully.

In the networker log is no failure message, just the message that the clone jobs starts.

Can anyone help me out?

Cheers

Jan

CarlosRojas · Answer

Hi Jan,

Have you tried to run the nsrclone command manually from command line and with debug options?

Have you tried to clone some savesets from NMC?

Do you have any output in the nsrtask.raw or anything in the daemon.raw?

Thank you.

Carlos.

JayJay9882 · Answer

Hi Carlos,

I started the clone via command line. Starting from NMC causes the same result: Clone job fails.

Didn't know that there ist a nsrtask.raw.

The output for the failed clone job is:

12/07/12 09:10:43 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Starting 'nsrtask', process 15851 on nwserver

12/07/12 09:10:44 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Launching: nsrclone -S -D 0 -s nwserver -b DATACLONE -d storagenode.ez.edeka.net -w "Sun Aug 12 09:10:43 2012" -y "Sun Aug 12 09:10:43 2012" -C 1 -l full -g Oracle_xyz -t "Thu Jul 5 09:10:43 2012" -e "Thu Jul 12 09:10:43 2012" on nwserver

12/07/12 09:10:44 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Monitoring job: 214575

12/07/12 09:10:49 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Indication received from job: no complete save sets to clone

12/07/12 09:10:49 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Job completion details: completion severity = 50

12/07/12 09:10:49 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Job completion details: completion status = failed

12/07/12 09:10:49 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] 39077 1342077049 0 0 0 2619994672 15854 0 nwserver nsrclone 11 error, %s%s 2 49 39 5813 31 no complete save sets to clone 0 0

12/07/12 09:10:50 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Completion status of job 214575 not available. Trying again in 2 seconds. 7 attempt(s) left

12/07/12 09:10:52 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Completion status of job 214575 not available. Trying again in 4 seconds. 6 attempt(s) left

12/07/12 09:10:57 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Completion status of job 214575 not available. Trying again in 8 seconds. 5 attempt(s) left

12/07/12 09:11:05 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Completion status of job 214575 not available. Trying again in 16 seconds. 4 attempt(s) left

12/07/12 09:11:21 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Completion status of job 214575 not available. Trying again in 32 seconds. 3 attempt(s) left

12/07/12 09:11:53 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Completion status of job 214575 not available. Trying again in 64 seconds. 2 attempt(s) left

12/07/12 09:12:58 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Completion status of job 214575 not available. Trying again in 128 seconds. 1 attempt(s) left

12/07/12 09:15:06 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Task result: Failed

12/07/12 09:15:06 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Updating task resource

12/07/12 09:15:06 2 0 0 3968234032 15851 0 nwserver nsrtask [clone.Oracle_xyz] Exiting 'nsrtask', process 15851 on nwserver

I don't understand the message "no complete save sets to clone". The database I'm trying to clone is backed up every day and there are complete save sets for sure.

~jan

CarlosRojas · Answer

Hi Jan,

Have you tried to clone only one of the savesets included within the date specified in the nsrclone command?

If so, what is the output?

There is a chance that the savesets are marked as suspect, and for that reason the savesets don't get cloned, however I think that the problem is the command issued. Some of the flags are not correct or the flags added return no complete or not matching savesets:

As per documentation, that output means:

no complete save sets to clone

Only complete save sets can be copied, and no complete save sets were found matching the requested command line parameters.

So I would revise the command line and maybe use either less values or more accurate values instead.

If you check one SSID and specify that in the command line, instead of the completion times, dates etc, just the SSID with the destination pool etc should work fine.

Thank you.

Carlos.

JayJay9882 · Answer

Before I do deep troubleshooting: I initiate the clone, using this command: nsrtask clone.Oracle_xyz (where Oracle_xyz is the name of the clone). Is this correct?

JayJay9882 · Answer

Ok Carlos,

I have tried to clone just one save set and it worked.

I believe that the clone goes into a failed status when the clone has no data to clone. For example, one clone cloned yesterday a couple of save set and went into a succeeded state. The same clone ran today again and went into the failed state. The database i wanted to clone in this example had no scheduled backup for today but for yesterday. Do you know what i mean?

However, I tried to set the flag "continue on save set error" but this had no improvement.

When I run

nsrtask clone.Oracle_xyz

the CLI shows me this command:

/usr/sbin/nsrclone -S -D 0 -s -b LOGSCLONE -d -w Mon Aug 13 12:35:00 2012 -y Mon Aug 13 12:35:00 2012 -F -C 1 -l full -g -t Fri Jul 6 12:35:00 2012 -e Fri Jul 13 12:35:00 2012

My goal is the following:

I got appr. 25 important databases which are backed up every day at night from networker. After the back up jobs around six in the morning I wanna start the clone jobs. So I have for every database backup job one clone job.

ble1 · Answer

Just write a script or do it manually until you find what is wrong... I assume each database is in separate group, so simply do:

mminfo -t 'last 12 hours' -q group>group name here> -r ssid,cloneid | awk '{print $1"/"$2}' to get input list and run it with nsrclone command.

masonb · Answer

JayJay,

I would have to research clone job failing of no savesets to clone is correct, however I don’t see why you need 25 separate cloning jobs to achieve the clones of the important databases you want. Although these are all started from different groups I would assume you don’t have 25 different pools but use 1. If this is the case then setting the scheduled clone to pick the relevant savesets from the pool would be better and also stop failure if a specific database backup failed or was suspended for business reason for some reason - which I assume was the reason for no savesets being available?

Hope this helps

Regards,

Bill Mason

JayJay9882 · Answer

Hi there,

thanks for your responses so far.

Let me explain what I wanna achieve:

Within networker I have 50 database backup groups. 25 groups for daily database backup and 25 groups for archive log backup which are running every 10 minutes. All jobs are scheduled by an job handling software (UC4). There are two pools where the data is stored in, one pool for the database backups and one for the archive logs.

So I created for every database two clone jobs (database clone and archive logs clone) means a total 50. Every clone job is configured equal. Only storage node and pool differ.

So these clone jobs get started by the job handling software (by executing nsrtask clone.oracle_xyz) every morning at 6 o'clock. And I find every morning a different situation. Today for example 19 clone jobs went into the failed status right after they were started. Sometimes just a few fail, other days over 25 fail. And I can't see any context.

masonb · Answer

JayJay9882,

OK - I think I now understand. Each scheduled clone job queries the Media Database to see what savesets it has to clone given the parameters you can set when setting up the scheduled job.

In your case it identified several savesets which in total were 350GB - this completed successfully.

You then re-ran the job and it failed as there were no savesets to be cloned - you had just cloned them all with the previous job and I am assuming no further backups had been taken.

I guess a more eloquent way to end would be to report no savesets to clone rather than fail but other logic could say we are expecting something to be there therefore highlight with a fail the fact there is no data to clone?

A failure of sorts but as there is nothing to clone you can ignore.

Regards,

Bill Mason

masonb · Answer

JayJay9882,

OK - so your archive log pool volume/s should always have backups on them which require cloning and also the database pool volumes also as it would be unlikely all database backups did not run or all failed.

Depending on the storage nodes which do the backups and that used for cloning there maybe an issue with volume access to allow the cloning to proceed.

Archive log backups are usually very small and therefore to fill a physical tape this can take some days. As they run every 10 mins this means the volume is most likely in use for a backup and therefore may not be available to be read to do the clones.

Do the Database backups complete before 06:00 each day? If not then I would suggest we delay the cloning as with the same pool we could be writing to volumes also.

I would also within your scheduler stop the archive log backups for say 1 or 2 hours and run the cloning here when the volumes will only be required for cloning. NetWorker has in built logic which where backups take preference over cloning - therefore start a clone and the volume required for his is also required for backup - clone will be suspended once the savesets at that time are finished and backup will go forward.

Its probably a scheduling/conflict which is giving you the unpredictable results which is compounded by the number of jobs you have and the "limited volumes" used for the backups.

Consider just having two cloning jobs - for archive log and database pool as previously suggested as you end up with the copies you need.

Hope this helps,

Regards,

Bill Mason

JayJay9882 · Answer

Hi Bill!

Sorry, I wasn't clear on that.

I do database and archive log backups to disk (DataDomain). Then I have four Quantum i500 which are just there for the cloning job.

So the productive data always go into the DataDomain.

The archive logs are cloned once a days, not every 10 minutes. I figured that six in the morning is a good time since all the important backup are done at this time. The clone jobs can run the whole day, they don't have impact in the production. Tha data flows from the DataDomain boxes to the quantum roboters on diffrent locations.

I did a test the last hours:

There is a clone job, called Oracle_xyz. This clone job should just clone a group named Oracle_xyz. I started the clone and networker did clone a lot of data to tape (350 GB). Then I started the clone again and it immediately failed. Output nsrtask.raw: no complete save set to clone! What? Why? Networker just cloned 340 GB and should recognize it. Here is the nsr clone ressourece configuration:

type: NSR clone;

name: Oracle_xyz;

comment: "";

groups: Oracle_xyz;

clients: ;

save sets: ;

levels: full;

source pools: ;

start time: 7 day ago;

end time: ;

ssids: ;

force: Yes;

maximum instances: 1;

browse policy: Month;

retention policy: Month;

destination pool: DATACLONE;

destination storage node: storagenode01;

all: ;

failed: ;

good: ;

JayJay9882 · Answer

Bill,

I could do that but when really something fails, I wouldn't recognize.

In the clone job there is a flag called "Continue on save set error". I thought this would avoid that a clone job goes into failed status.

NetWorker

Clone fails immediately

Was this post helpful?