Unsolved
This post is more than 5 years old
21 Posts
0
3822
duplicate saveset
Hi all,
I'm having some problems with my Legato NetWorker 7.6.1.Build.397 running in Windows Server 2008.
I noticed duplicated savesets when i openning the "Volume Savesets" in media pool selected for that job.
This event has occurred for any group created, but it happens only with the same clients.
What can be?
Regards,
Sarah Milani
36115_carloscor
217 Posts
0
June 28th, 2011 09:00
Hello Sarah,
I have a couple of questions. When you say "Duplicated save sets"
- Do you mean that the save set ID is the same?
- What kind of save sets are we talking about, are they file system backups or is there any module involved?
- Could you please provide us with some example?
Thank you.
Carlos.
DavidHampson-rY
294 Posts
0
June 28th, 2011 09:00
Can you confirm that these are not fragments of the same savesets that may be spanning tapes (check the ? You may want to use mminfo and post the results here if the results look suspect:
mminfo -v -q "ssid=ssid"
You want to look at the flag which indicates if the saveset is (c)omplete, (h)ead, (m)iddle or (t)ail.
DavidHampson-rY
294 Posts
0
June 28th, 2011 11:00
This is happening every 2 hours so have you set your group run interval to be 2 hours instead of the default of 24 hours?
SarahMilani
21 Posts
0
June 28th, 2011 11:00
Hi Carlos,
I tried to post the screenshots here, but didn't work =/
The backup job are file system with diferent SSID, but the same job (once executed) create 2 or more savesets in the same midia with different sizes or not.
See mminfo output below:
volume type client date time size ssid fl lvl name
000053 LTO Ultrium-4 pavo 26/04/2011 08:44:34 173 GB 3702963931 cb full /u01/app/backup
000053 LTO Ultrium-4 pavo 26/04/2011 10:44:35 182 GB 3686193809 cb full /u01/app/backup
000053 LTO Ultrium-4 pavo 26/04/2011 12:44:36 182 GB 3669423795 cb full /u01/app/backup
This backup job was performed only once but in exact range of two hours ( as you can see in time field ) this created 3 savesets with different ssid in output above.
DavidHampson-rY
294 Posts
0
June 28th, 2011 12:00
It would be useful to see the log output for this job. I am wondering if the file system backup completes but somehow the group decides it hasn't and restarts, I've seen something similar in the past though I cannot remember the details at present!
When you say the event only occurs with some of the clients in the group I am making the assumption that it is the same clients each time and they consistently show this behaviour. Is the 2 hour separation we see in the example consistent too or is that just coincidence (and does this correspond to the inactivity timeout on the group)?
SarahMilani
21 Posts
0
June 28th, 2011 12:00
David,
The scheduled interval follows the default configuration (24:00).
Another important detail : this event occurs only with some clients of the group and not with the whole group.
SarahMilani
21 Posts
0
June 28th, 2011 13:00
So, I've seen this event occur to the same set of clients. However, sometimes only one make this duplication...in other cases two or more clients of this set (problematic clients) create a duplicated savesets in the same midia.
In the case previously reported, the time is two hours ... but in other cases identified it can vary, see examples below
000069 LTO Ultrium-4 birro 28/06/2011 02:48:52 100 GB 3305732595 cb full /u01/app/backup
000028 LTO Ultrium-4 birro 27/06/2011 11:37:57 38 GB 3607667725 cb full /u01/app/backup
(same client, different midias, same job - running once time - )
000053 LTO Ultrium-4 anhuma 25/04/2011 16:30:19 88 GB 4155890051 cb full C:\Backup
000053 LTO Ultrium-4 anhuma 25/04/2011 20:28:53 88 GB 3853914476 cb full C:\Backup
000053 LTO Ultrium-4 anhuma 26/04/2011 00:28:57 22 GB 3753265623 tb full C:\Backup
(same client, same midias, same job - running once time - interval for 4 hours )
36115_carloscor
217 Posts
0
June 29th, 2011 01:00
Hello Sarah,
I agree with David that would be interesting to check the output of this job.
As far as I know we have clarified that the interval set for the group is 24:00, correct?
Is this happening with only some clients within the same group or separate groups?
What is the OS of the clients having this "issue"?
Could you please run the group for those affected clients in one group, and provide us with the output? (Please compress and attach in your next reply /nsr/tmp/sg/group_name_folder)
Thank you.
Carlos.
SarahMilani
21 Posts
0
June 29th, 2011 07:00
Hi Rfreund
I discard the possibility of backup job has been executed/started by any user through software interface or even command line.
I really believe that this event is a critic problem from the moment at duplicate savesets process consumes my network bandwidth, storage media and time.
=/
RFreund1
47 Posts
0
June 29th, 2011 07:00
@sarah
In general, there is nothing wrong to have multiple save sets on the same media - due to the different ssids. It is only an issue when you do not expect this not to happen.
Now when NW does not initiate the backup, somebody else could do that. Do not forget that somebody at the client could create manual backups even with levels (save -l level ...). And if he schedules this via Windows then you will not see these jobs in the savegroup report although the backups are alike. Also look at the scheduler on the server - there might also be a forgetten task.
bingo.1
2.4K Posts
0
June 29th, 2011 10:00
Well, as this seems to be a repetitive process, let me suggest you use the Task Manager next time and look at processes and the user who started them. You should see a savegrp process on the server that someone must have started. Then proceed from here.
BTW - a savegrp process will have index backups in the end - if they are missing, the backup must have been started from the client site (running save, not savegrp).
ble1
2 Intern
2 Intern
•
14.3K Posts
0
June 30th, 2011 02:00
Hi Sarah,
Having a log from server would really help here. For example, if you focus on client biro (run twice). To check whether this has been started from group do:
mminfo -avot -q client=biro -r "name,ssflags,sumflags,copies,volume,ssid,cloneid,group,sscreate(20),sscomp(20)"
Find the one you are after and see if they have group entry (I'm quite sure they are as level is shown as full, but this is to make sure different group does not do the same job).
One thing I noticed in your later example is following:
000069 LTO Ultrium-4 birro 28/06/2011 02:48:52 100 GB 3305732595 cb full /u01/app/backup
000028 LTO Ultrium-4 birro 27/06/2011 11:37:57 38 GB 3607667725 cb full /u01/app/backup
000053 LTO Ultrium-4 anhuma 25/04/2011 16:30:19 88 GB 4155890051 cb full C:\Backup
000053 LTO Ultrium-4 anhuma 25/04/2011 20:28:53 88 GB 3853914476 cb full C:\Backup
000053 LTO Ultrium-4 anhuma 26/04/2011 00:28:57 22 GB 3753265623 tb full C:\Backup
First two entries refer to client biro and they run on different days. Both are full, but without knowing configuration this may be or not right. Last 3 are from client called anhuma and we see sort of 4 cycle. What I find strange a bit here is that these instances are full of certain mnt point or folder, but despite that and despite running close to each other they vary in size. Further, last 3 entries are on the same tape and apparently complete saveset (88GB). But last one is marked as tb (tail) indicating there was already session for 3rd incarnation of session run. So to get a feeling of what is going on one needs to take a look at daemon log (which might be out of reach for anguma given that it was back in April), check client configuration (multiple client instances in different groups with directives for example) and check mdb entries to get an idea of whole workflow. If you could upload those here or simply send it to you support I'm quite sure some progress would be made. At the moment there might be few different things going on, but it would be guess work. My favourite candidates are that either you get keepalive timeout for control session and group might have retries so it does that or there are multiple instances doing the same thing. Of course, I might be 100% wrong too - without further data it is guess work.
DavidHampson-rY
294 Posts
0
June 30th, 2011 06:00
Sarah
It may be useful for us to see the configuration of the group and client resource, you can output these by:
C:> nsradmin
nsradmin>. type:NSR group;name:groupname;
nsradmin>print
nsradmin>. type:NSR client;name:birro;
nsradmin>print
nsradmin>exit
C:>
If you can paste up the output we can see if there may be anything about the configuration that may be causing the problem.
SarahMilani
21 Posts
0
June 30th, 2011 07:00
First, thanks for ALL help.
I'm posting here the output of command "mminfo -avot -q client=birro -r "name,ssflags,sumflags,copies,volume,ssid,cloneid,group,sscreate(20),sscomp(20)" of March and April:
name ssflags fl copies volume ssid clone id group ss created ss completed
/u01/app/backup vF cb 1 000043 4069241688 1301001049 GABD 24/03/2011 17:10:48 24/03/2011 17:19:53
/u01/app/backup vF cb 1 000045 3750792291 1301318766 GABD 28/03/2011 09:25:55 28/03/2011 09:49:04
/u01/app/backup vF cb 1 000050 1754682584 1301697756 GABD 01/04/2011 18:42:32 01/04/2011 19:03:10
/u01/app/backup vF cb 1 000051 3299102279 1302613588 GABD 12/04/2011 09:06:15 12/04/2011 09:59:30
/u01/app/backup vF cb 1 000051 2879760235 1302701947 GABD 13/04/2011 09:38:51 13/04/2011 10:31:05
/u01/app/backup vF cb 1 000030 3551309996 1303163063 GABD 18/04/2011 17:44:12 18/04/2011 18:10:58
/u01/app/backup vF cb 1 000030 3131920249 1303203718 GABD 19/04/2011 05:01:45 19/04/2011 06:13:59
/u01/app/backup vF cb 1 000048 2679019924 1303288227 GABD 20/04/2011 04:30:12 20/04/2011 05:19:41
/u01/app/backup vF cb 1 000046 4038453075 1303766867 GABD 25/04/2011 17:27:47 25/04/2011 17:45:00
/u01/app/backup vF cb 1 000041 3384354188 1303979411 GABD 28/04/2011 04:30:04 28/04/2011 05:51:50
/u01/app/backup vF cb 1 000041 3199812013 1303986611 GABD 28/04/2011 06:30:05 28/04/2011 07:09:16
/u01/app/backup vF cb 1 000027 2948231172 1304064013 GABD 29/04/2011 04:00:04 29/04/2011 05:25:56
/u01/app/backup vF cb 1 000056 4274284384 1304717163 GABD 06/05/2011 17:25:52 06/05/2011 18:04:05
First two entries of birro posted show:
000069 LTO Ultrium-4 birro 28/06/2011 02:48:52 100 GB 3305732595 cb full /u01/app/backup
000028 LTO Ultrium-4 birro 27/06/2011 11:37:57 38 GB 3607667725 cb full /u01/app/backup
This backup job started at 27/06 and created first saveset at 11:37 however this job still running and at 02:58 (28/06) created another entry, with different size (100 GB).
name ssflags fl copies volume ssid clone id group ss created ss completed
/u01/app/backup vF cb 1 000028 3607667725 1309189151 NOVO 27/06/2011 11:38:53 27/06/2011 13:10:28
/u01/app/backup vF cb 1 000069 3305732595 1309243892 NOVO 28/06/2011 02:51:31 28/06/2011 04:16:03
This problematic clients are are configured in only one group. I have created new groups and put one of these clients, but the result is the same. When I backup another folder of the same clients the job completes successfully, without duplication.
The output bellow show daemon log for last group created named "NOVO" at this period (27/06 -> 28/06):
42506 27/06/2011 11:37:33 2 0 0 3820 3248 0 pinguim nsrd savegroup info: starting NOVO (with 11 client(s))
Unable to render the following message: 38758 1309189115 2 0 0 3820 3248 0 pinguim nsrd 9 %s %s: %s 3 0 5 media 0 7 warning 0 48 \\.\Tape2 reading: No more data is on the tape.
Unable to render the following message: 38758 1309189115 2 0 0 3820 3248 0 pinguim nsrd 9 %s %s: %s 3 0 5 media 0 7 warning 0 48 \\.\Tape2 reading: No more data is on the tape.
Unable to render the following message: 38758 1309189227 2 0 0 3820 3248 0 pinguim nsrd 9 %s %s: %s 3 0 5 media 0 7 warning 0 48 \\.\Tape2 reading: No more data is on the tape.
Unable to render the following message: 38758 1309189227 2 0 0 3820 3248 0 pinguim nsrd 9 %s %s: %s 3 0 5 media 0 7 warning 0 48 \\.\Tape2 reading: No more data is on the tape.
77568 28/06/2011 02:48:50 2 0 0 4032 2756 0 pinguim savegrp NOVO * vivi:/u01/app/backup Cannot determine status of backup process. Use mminfo to determine job status.
77568 28/06/2011 02:48:50 2 0 0 4032 2756 0 pinguim savegrp NOVO * birro:/u01/app/backup Cannot determine status of backup process. Use mminfo to determine job status.
77568 28/06/2011 02:48:51 2 0 0 4032 2756 0 pinguim savegrp NOVO * pavo:/u01/app/backup Cannot determine status of backup process. Use mminfo to determine job status.
77568 28/06/2011 02:48:58 2 0 0 4032 2756 0 pinguim savegrp NOVO * anhuma:C:\Backup Cannot determine status of backup process. Use mminfo to determine job status.
77567 28/06/2011 17:59:47 2 0 0 4032 2756 0 pinguim savegrp NOVO * vivi:/u01/app/backup See the file D:\Program Files\Legato\nsr\tmp\sg\NOVO\sso.000021 for output of save command.
77567 28/06/2011 17:59:47 2 0 0 4032 2756 0 pinguim savegrp NOVO * birro:/u01/app/backup See the file D:\Program Files\Legato\nsr\tmp\sg\NOVO\sso.000022 for output of save command.
77567 28/06/2011 17:59:48 2 0 0 4032 2756 0 pinguim savegrp NOVO * pavo:/u01/app/backup See the file D:\Program Files\Legato\nsr\tmp\sg\NOVO\sso.000023 for output of save command.
77567 28/06/2011 17:59:56 2 0 0 4032 2756 0 pinguim savegrp NOVO * anhuma:C:\Backup See the file D:\Program Files\Legato\nsr\tmp\sg\NOVO\sso.000024 for output of save command.
42506 28/06/2011 18:01:17 2 0 0 3820 3248 0 pinguim nsrd savegroup info: Added 'pinguim' to the group 'NOVO' for bootstrap backup
42506 29/06/2011 03:33:00 2 0 0 3820 3248 0 pinguim nsrd savegroup info: starting NOVO (with 11 client(s))
Legends:
Pinguim = backup server
Problematic Clients (in this case) = vivi, birro, pavo and anhuma
* At 03:33 (29/06/2011) the backup job start automatically (as scheduled).
*I could not find the logs from April - to the case of client "Anhuma" previously posted =/
bingo.1
2.4K Posts
0
June 30th, 2011 11:00
Sorry Sarah,
what i read does not really make sense to me. Let me explain:
-----------------------------------
First two entries of birro posted show:
000069 LTO Ultrium-4 birro 28/06/2011 02:48:52 100 GB 3305732595 cb full /u01/app/backup
000028 LTO Ultrium-4 birro 27/06/2011 11:37:57 38 GB 3607667725 cb full /u01/app/backup
This backup job started at 27/06 and created first saveset at 11:37 however this job still running and at 02:58 (28/06) created another entry, with different size (100 GB).
name ssflags fl copies volume ssid clone id group ss created ss completed
/u01/app/backup vF cb 1 000028 3607667725 1309189151 NOVO 27/06/2011 11:38:53 27/06/2011 13:10:28
/u01/app/backup vF cb 1 000069 3305732595 1309243892 NOVO 28/06/2011 02:51:31 28/06/2011 04:16:03
-----------------------------------
According to the mminfo output, the backup of ssid 3607667725 took roughly 1.5 hrs - the duration for ssid 3305732595 is about the same.
However, just before that you also state that 'the backup of the first save set did not end after 14 hrs'. This just does not make sense, especially not for 100 GB on a LTO4. And all other backups from march to may only needed a fraction of 14 hrs.
I just wonder whether this is due to cloning, not to backup? - However, on the other hand, there is only 1 copy which means that there is only one instance.
In my eyes there is only one reason - you might clone 'indirectly'. Do you eventually use Save Set Consolidation??? - In this case, increasing fulls would also make sense although the change is quite dramatical. Just look at the client's/group's schedule. And to verify the level, simply add 'level' to mminfo's report section.
Be aware that consolidated save sets will appear with the level full - but they need a level 1 backup for the consolidation to take place.