Start a Conversation

Unsolved

This post is more than 5 years old

3737

June 28th, 2011 08:00

duplicate saveset

Hi all,

I'm having some problems with my Legato NetWorker 7.6.1.Build.397 running in Windows Server 2008.

I noticed duplicated savesets when i openning the "Volume Savesets" in media pool selected for that job.

This event has occurred for any group created, but it happens only with the same clients.

What can be?

Regards,

Sarah Milani

June 28th, 2011 09:00

Hello Sarah,

I have a couple of questions. When you say "Duplicated save sets"

- Do you mean that the save set ID is the same?

- What kind of save sets are we talking about, are they file system backups or is there any module involved?

- Could you please provide us with some example?

Thank you.

Carlos.

June 28th, 2011 09:00

Can you confirm that these are not fragments of the same savesets that may be spanning tapes (check the ?  You may want to use mminfo and post the results here if the results look suspect:

mminfo -v -q "ssid=ssid"

You want to look at the flag which indicates if the saveset is (c)omplete, (h)ead, (m)iddle or (t)ail.

June 28th, 2011 11:00

This is happening every 2 hours so have you set your group run interval to be 2 hours instead of the default of 24 hours?

21 Posts

June 28th, 2011 11:00

Hi Carlos,

I tried to post the screenshots here, but didn't work =/

The backup job are file system with diferent SSID, but the same job (once executed) create 2 or more savesets in the same midia with different sizes or not.

See mminfo output below:

volume        type   client             date           time       size      ssid             fl  lvl   name

000053         LTO Ultrium-4 pavo     26/04/2011 08:44:34  173 GB 3702963931 cb full /u01/app/backup
000053         LTO Ultrium-4 pavo     26/04/2011 10:44:35  182 GB 3686193809 cb full /u01/app/backup
000053         LTO Ultrium-4 pavo     26/04/2011 12:44:36  182 GB 3669423795 cb full /u01/app/backup

This backup job was performed only once but in exact range of two hours ( as you can see in time field ) this created 3 savesets with different ssid in output above.

June 28th, 2011 12:00

It would be useful to see the log output for this job.  I am wondering if the file system backup completes but somehow the group decides it hasn't and restarts, I've seen something similar in the past though I cannot remember the details at present!

When you say the event only occurs with some of the clients in the group I am making the assumption that it is the same clients each time and they consistently show this behaviour.  Is the 2 hour separation we see in the example consistent too or is that just coincidence (and does this correspond to the inactivity timeout on the group)?

21 Posts

June 28th, 2011 12:00

David,

The scheduled interval follows the default configuration (24:00).

Another important detail : this event occurs only with some clients of the group and not with the whole group.

21 Posts

June 28th, 2011 13:00

So, I've seen this event occur to the same set of clients. However, sometimes only one make this duplication...in other cases two or more clients of this set (problematic clients) create a duplicated savesets in the same midia.

In the case previously reported, the time is two hours ... but in other cases identified it can vary, see examples below

000069         LTO Ultrium-4 birro    28/06/2011 02:48:52  100 GB 3305732595 cb full /u01/app/backup

000028         LTO Ultrium-4 birro    27/06/2011 11:37:57   38 GB 3607667725 cb full /u01/app/backup

(same client, different midias, same job - running once time -  )

000053         LTO Ultrium-4 anhuma   25/04/2011 16:30:19   88 GB 4155890051 cb full C:\Backup

000053         LTO Ultrium-4 anhuma   25/04/2011 20:28:53   88 GB 3853914476 cb full C:\Backup
000053         LTO Ultrium-4 anhuma   26/04/2011 00:28:57   22 GB 3753265623 tb full C:\Backup

(same client, same midias, same job - running once time - interval for 4 hours )

June 29th, 2011 01:00

Hello Sarah,

I agree with David that would be interesting to check the output of this job.

As far as I know we have clarified that the interval set for the group is 24:00, correct?

Is this happening with only some clients within the same group or separate groups?

What is the OS of the clients having this "issue"?

Could you please run the group for those affected clients in one group, and provide us with the output? (Please compress and attach in your next reply /nsr/tmp/sg/group_name_folder)

Thank you.

Carlos.

21 Posts

June 29th, 2011 07:00

Hi Rfreund

I discard the possibility of backup job has been executed/started by any user through software interface or even command line.

I really believe that this event is a critic problem from the moment at duplicate savesets process consumes my network bandwidth, storage media and time.

=/

47 Posts

June 29th, 2011 07:00

@sarah

In general, there is nothing wrong to have multiple save sets on the same media - due to the different ssids. It is only an issue when you do not expect this not to happen.

Now when NW does not initiate the backup, somebody else could do that. Do not forget that somebody at the client could create manual backups even with levels (save -l level ...). And if he schedules this via Windows then you will not see these jobs in the savegroup report although the backups are alike. Also look at the scheduler on the server - there might also be a forgetten task.

2.4K Posts

June 29th, 2011 10:00

Well, as this seems to be a repetitive process, let me suggest you  use the Task Manager next time and look at processes and the user who  started them. You should see a savegrp process on the server that  someone must have started. Then proceed from here.

BTW  - a savegrp process will have index backups in the end - if they are  missing, the backup must have been started from the client site (running  save, not savegrp).

14.3K Posts

June 30th, 2011 02:00

Hi Sarah,

Having a log from server would really help here. For example, if you focus on client biro (run twice).  To check whether this has been started from group do:

mminfo -avot -q client=biro -r "name,ssflags,sumflags,copies,volume,ssid,cloneid,group,sscreate(20),sscomp(20)"

Find the one you are after and see if they have group entry (I'm quite sure they are as level is shown as full, but this is to make sure different group does not do the same job).

One thing I noticed in your later example is following:

000069         LTO Ultrium-4 birro    28/06/2011 02:48:52  100 GB 3305732595 cb full /u01/app/backup

000028         LTO Ultrium-4 birro    27/06/2011 11:37:57   38 GB 3607667725 cb full /u01/app/backup

000053         LTO Ultrium-4 anhuma   25/04/2011 16:30:19   88 GB 4155890051 cb full C:\Backup

000053         LTO Ultrium-4 anhuma   25/04/2011 20:28:53   88 GB 3853914476 cb full C:\Backup
000053         LTO Ultrium-4 anhuma   26/04/2011 00:28:57   22 GB 3753265623 tb full C:\Backup

First two entries refer to client biro and they run on different days.  Both are full, but without knowing configuration this may be or not right.  Last 3 are from client called anhuma and we see sort of 4 cycle.  What I find strange a bit here is that these instances are full of certain mnt point or folder, but despite that and despite running close to each other they vary in size.  Further, last 3 entries are on the same tape and apparently complete saveset (88GB).  But last one is marked as tb (tail) indicating there was already session for 3rd incarnation of session run.  So to get a feeling of what is going on one needs to take a look at daemon log (which might be out of reach for anguma given that it was back in April), check client configuration (multiple client instances in different groups with directives for example) and check mdb entries to get an idea of whole workflow.  If you could upload those here or simply send it to you support I'm quite sure some progress would be made.  At the moment there might be few different things going on, but it would be guess work.  My favourite candidates are that either you get keepalive timeout for control session and group might have retries so it does that or there are multiple instances doing the same thing.  Of course, I might be 100% wrong too - without further data it is guess work.

June 30th, 2011 06:00

Sarah

It may be useful for us to see the configuration of the group and client resource, you can output these by:

C:>  nsradmin

nsradmin>. type:NSR group;name:groupname;

nsradmin>print

nsradmin>. type:NSR client;name:birro;

nsradmin>print

nsradmin>exit

C:>

If you can paste up the output we can see if there may be anything about the configuration that may be causing the problem.

21 Posts

June 30th, 2011 07:00

First, thanks for ALL help.
I'm posting here the output of  command "mminfo -avot -q client=birro -r  "name,ssflags,sumflags,copies,volume,ssid,cloneid,group,sscreate(20),sscomp(20)"  of March and April:


name                           ssflags fl copies volume       ssid          clone id group            ss created         ss completed
/u01/app/backup         vF      cb      1 000043        4069241688  1301001049 GABD        24/03/2011  17:10:48 24/03/2011 17:19:53
/u01/app/backup         vF     cb      1  000045        3750792291  1301318766 GABD        28/03/2011 09:25:55  28/03/2011 09:49:04
/u01/app/backup         vF     cb      1  000050        1754682584  1301697756 GABD        01/04/2011 18:42:32  01/04/2011 19:03:10
/u01/app/backup         vF     cb      1  000051        3299102279  1302613588 GABD        12/04/2011 09:06:15  12/04/2011 09:59:30
/u01/app/backup         vF     cb      1  000051        2879760235  1302701947 GABD        13/04/2011 09:38:51  13/04/2011 10:31:05
/u01/app/backup         vF     cb      1  000030        3551309996  1303163063 GABD        18/04/2011 17:44:12  18/04/2011 18:10:58
/u01/app/backup         vF     cb      1  000030        3131920249  1303203718 GABD        19/04/2011 05:01:45  19/04/2011 06:13:59
/u01/app/backup         vF     cb      1  000048        2679019924  1303288227 GABD        20/04/2011 04:30:12  20/04/2011 05:19:41
/u01/app/backup         vF     cb      1  000046        4038453075  1303766867 GABD        25/04/2011 17:27:47  25/04/2011 17:45:00
/u01/app/backup         vF     cb      1  000041        3384354188  1303979411 GABD        28/04/2011 04:30:04  28/04/2011 05:51:50
/u01/app/backup         vF     cb      1  000041        3199812013  1303986611 GABD        28/04/2011 06:30:05  28/04/2011 07:09:16
/u01/app/backup         vF     cb      1  000027        2948231172  1304064013 GABD        29/04/2011 04:00:04  29/04/2011 05:25:56
/u01/app/backup         vF     cb      1  000056        4274284384  1304717163 GABD        06/05/2011 17:25:52  06/05/2011 18:04:05

First two entries of birro posted show:

000069         LTO Ultrium-4 birro    28/06/2011 02:48:52  100 GB 3305732595 cb full /u01/app/backup

000028         LTO Ultrium-4 birro    27/06/2011 11:37:57   38 GB 3607667725 cb full /u01/app/backup

This  backup job started at 27/06 and created first saveset at 11:37 however  this job still running and at 02:58 (28/06) created another entry, with  different size (100 GB).

name                           ssflags fl copies volume        ssid          clone id group            ss created         ss completed

/u01/app/backup         vF      cb      1 000028        3607667725  1309189151 NOVO        27/06/2011  11:38:53 27/06/2011 13:10:28
/u01/app/backup         vF     cb      1  000069        3305732595  1309243892 NOVO        28/06/2011 02:51:31  28/06/2011 04:16:03

This problematic  clients are are configured in only one group. I have created new groups  and put one of these clients, but the result is the same. When I backup  another folder of the same clients the job completes successfully,  without duplication.

The output bellow show daemon log for last group created  named "NOVO" at this period (27/06 -> 28/06):

42506 27/06/2011 11:37:33  2 0 0 3820 3248 0 pinguim nsrd savegroup info: starting NOVO (with 11 client(s))

Unable to render the following message: 38758  1309189115 2 0 0 3820 3248 0 pinguim nsrd 9 %s %s: %s 3 0 5 media 0 7  warning 0 48 \\.\Tape2 reading: No more data is on the tape.

Unable to render the following message: 38758 1309189115 2 0 0  3820 3248 0 pinguim nsrd 9 %s %s: %s 3 0 5 media 0 7 warning 0 48  \\.\Tape2 reading: No more data is on the tape.

Unable to render the following message: 38758 1309189227 2 0 0  3820 3248 0 pinguim nsrd 9 %s %s: %s 3 0 5 media 0 7 warning 0 48  \\.\Tape2 reading: No more data is on the tape.

Unable to render the following message: 38758 1309189227 2 0 0  3820 3248 0 pinguim nsrd 9 %s %s: %s 3 0 5 media 0 7 warning 0 48  \\.\Tape2 reading: No more data is on the tape.

77568 28/06/2011 02:48:50  2 0 0 4032 2756 0 pinguim savegrp NOVO * vivi:/u01/app/backup Cannot determine status of backup process.  Use mminfo to determine job status.
77568 28/06/2011 02:48:50  2 0 0 4032 2756 0 pinguim savegrp NOVO * birro:/u01/app/backup Cannot determine status of backup process.  Use mminfo to determine job status.
77568 28/06/2011 02:48:51  2 0 0 4032 2756 0 pinguim savegrp NOVO * pavo:/u01/app/backup Cannot determine status of backup process.  Use mminfo to determine job status.
77568 28/06/2011 02:48:58  2 0 0 4032 2756 0 pinguim savegrp NOVO * anhuma:C:\Backup Cannot determine status of backup process.  Use mminfo to determine job status.
77567 28/06/2011 17:59:47  2 0 0 4032 2756 0 pinguim savegrp NOVO * vivi:/u01/app/backup See the file D:\Program Files\Legato\nsr\tmp\sg\NOVO\sso.000021 for output of save command.
77567 28/06/2011 17:59:47  2 0 0 4032 2756 0 pinguim savegrp NOVO * birro:/u01/app/backup See the file D:\Program Files\Legato\nsr\tmp\sg\NOVO\sso.000022 for output of save command.
77567 28/06/2011 17:59:48  2 0 0 4032 2756 0 pinguim savegrp NOVO * pavo:/u01/app/backup  See the file D:\Program Files\Legato\nsr\tmp\sg\NOVO\sso.000023 for output of save command.
77567 28/06/2011 17:59:56  2 0 0 4032 2756 0 pinguim savegrp NOVO * anhuma:C:\Backup See the file D:\Program Files\Legato\nsr\tmp\sg\NOVO\sso.000024 for output of save command.
42506 28/06/2011 18:01:17  2 0 0 3820 3248 0 pinguim nsrd savegroup  info: Added 'pinguim' to the group 'NOVO' for bootstrap backup
42506 29/06/2011 03:33:00  2 0 0 3820 3248 0 pinguim nsrd savegroup info: starting NOVO (with 11 client(s))

Legends:

Pinguim = backup server

Problematic Clients (in this case) = vivi, birro, pavo and anhuma

* At 03:33 (29/06/2011) the backup job start automatically (as scheduled).

*I could not find the logs from April - to the case of client "Anhuma" previously posted =/

2.4K Posts

June 30th, 2011 11:00

Sorry Sarah,

what i read does not really make sense to me. Let me explain:

-----------------------------------

First two entries of birro posted show:

000069         LTO Ultrium-4 birro    28/06/2011 02:48:52  100 GB 3305732595 cb full /u01/app/backup

000028         LTO Ultrium-4 birro    27/06/2011 11:37:57   38 GB 3607667725 cb full /u01/app/backup

This   backup job started at 27/06 and created first saveset at 11:37 however   this job still running and at 02:58 (28/06) created another entry,  with  different size (100 GB).

name                            ssflags fl copies volume        ssid          clone id group             ss created         ss completed

/u01/app/backup          vF      cb      1 000028        3607667725  1309189151 NOVO         27/06/2011  11:38:53 27/06/2011 13:10:28
/u01/app/backup          vF     cb      1  000069        3305732595  1309243892 NOVO         28/06/2011 02:51:31  28/06/2011 04:16:03

-----------------------------------

According to the mminfo output, the backup of ssid 3607667725 took roughly 1.5 hrs - the duration for ssid 3305732595 is about the same.

However, just before that you also state that 'the backup of the first save set did not end after 14 hrs'. This just does not make sense, especially not for 100 GB on a LTO4. And all other backups from march to may only needed a fraction of 14 hrs.

I just wonder whether this is due to cloning, not to backup? - However, on the other hand, there is only 1 copy which means that there is only one instance.

In my eyes there is only one reason - you might clone 'indirectly'. Do you eventually use Save Set Consolidation??? - In this case, increasing fulls would also make sense although the change is quite dramatical. Just look at the client's/group's schedule. And to verify the level, simply add 'level' to mminfo's report section.

Be aware that consolidated save sets will appear with the level full - but they need a level 1 backup for the consolidation to take place.

No Events found!

Top