brastedd

336 Posts

3880

January 26th, 2016 07:00

Simultaneous backups on the same client?

I thought Avamar only allowed a single job per client to run, so if you run more than one or a scheduled job starts when another is already running, the new jobs just queue up and wait. That's been my normal experience.

However last night one of our SQL backups appears to have run at the same time as the filesystem backup. Both are scheduled for 8pm and normally the filesystem backup completes and the SQL job runs immediately afterwards.

Today I looked at the logs and they show the filesystem backup starting at 8:00 and the SQL backup starting at 8:01, with all but one database completing the backup and the last one having a VDI timeout error.

Has anyone seen this behavior before? Is that normal/expected? If not, any idea why it occurred?

Responses(15)

pchassang

33 Posts

0

January 26th, 2016 07:00

To help you, we need log output of jobs.

regards.

Pascal

B

brastedd

336 Posts

0

January 26th, 2016 08:00

Here are some snippets:

Filesystem backup log:

Log #1: avtar log 2016-01-25 20:00:01 Central Standard Time [7.0.102-43 Windows Server 2003 Standard Server Edition Service Pack 2-x86]

2016-01-25 20:00:01 avtar Info <5008>: Logging to C:\Program Files\avs\var\clientlogs\8pm_-_6am-Windows_2003_Filesystem-1453773600357-3001-Windows.log

2016-01-25 20:00:01 avtar Info <5551>: Command Line: avtar --sysdir="C:\Program Files\avs\etc" --bindir="C:\Program Files\avs\bin" --vardir="C:\Program Files\avs\var" --ctlcallport=1026 --ctlinterface="3001-8pm - 6am-Windows 2003 Filesystem-1453773600357" --logfile="C:\Program Files\avs\var\clientlogs\8pm_-_6am-Windows_2003_Filesystem-1453773600357-3001-Windows.log"

2016-01-25 20:00:01 avtar Info <7977>: Starting at 2016-01-25 20:00:01 Central Standard Time [avtar May 24 2014 00:00:10 7.0.102-43 Windows Server 2003 Standard Server Edition Service Pack 2-x86]

2016-01-25 20:00:01 avtar Info <10684>: Setting ctl message version to 3 (from 1)

2016-01-25 20:00:01 avtar Info <16136>: Setting ctl max message size to 268435456

2016-01-25 20:00:01 avtar Info <6648>: Successfully connected to 127.0.0.1:1026 with proprietary encryption

SQL backup log:

Log #1: avtar log 2016-01-25 20:01:28 Central Standard Time [7.0.102-43 Windows Server 2003 Standard Server Edition Service Pack 2-x86]

2016-01-25 20:01:28 avtar Info <5008>: Logging to C:\Program Files\avs\var\Every_2_hours_10am-8pm-SQL_Full_-_Every_2_Hours-1453773600957#0-3006-SQL.avtar.log

2016-01-25 20:01:28 avtar Info <5551>: Command Line: avtar --case_sensitive=false --max-streams=1 --backup-type=level0_full --ctlcallport=2136 --ctlinterface="3006-Every 2 hours 10am-8pm-SQL Full - Every 2 Hours-1453773600957" --check-stdin-path=false --logfile="C:\Program Files\avs\var\Every_2_hours_10am-8pm-SQL_Full_-_Every_2_Hours-1453773600957#0-3006-SQL.avtar.log" --vardir="C:\Program Files\avs\var" --bindir="C:\Program Files\avs\bin" --sysdir="C:\Program Files\avs\etc" --account=/SQL/arkanis.uhd.campus --id=backuponly --password=**************** --server=rocky-bu.bk.uhd.campus

2016-01-25 20:01:28 avtar Info <7977>: Starting at 2016-01-25 20:01:28 Central Standard Time [avtar May 24 2014 00:00:10 7.0.102-43 Windows Server 2003 Standard Server Edition Service Pack 2-x86]

2016-01-25 20:01:28 avtar Info <10684>: Setting ctl message version to 3 (from 1)

2016-01-25 20:01:28 avtar Info <16136>: Setting ctl max message size to 268435456

2016-01-25 20:01:28 avtar Info <6648>: Successfully connected to 127.0.0.1:2136 with proprietary encryption

pchassang

33 Posts

0

January 27th, 2016 06:00

Thanks brastedd,

But no warning or error log message in your post.

When you opened activity log of those two jobs, have you some red line in the beginning?

If yes follow hyperlink and give us before and after ten lines or more. Or attach logfiles.

best regards.

Pascal

B

brastedd

336 Posts

0

January 27th, 2016 06:00

Thanks, but I'm not interested in the error. I'm interested in how and why the SQL job was able to start when the filesystem job was already running.

Has anyone ever seen backup jobs from different plugins (e.g. Windows and SQL) run simultaneously on the same client?

ionthegeek

2K Posts

0

January 27th, 2016 07:00

One of my colleagues was able to track down the escalation. This is bug number 206145 which is fixed in 7.2.0. As a workaround, you can separate the datasets so that the SQL and filesystem backups run in two different groups. This should force one of the backups to queue.

B

brastedd

336 Posts

0

January 27th, 2016 07:00

Thanks Ian. I think I found the documentation on the bug (https://emc--c.na5.visual.force.com/apex/KB_BreakFix_1?id=kA1700000001DjK) but that's not the same behavior as what I'm seeing. Neither of my backups failed. I did have a VDI timeout on one of the SQL databases, but I don't think that's related as the other databases backed up fine.

My backups are already running in different groups, one for the SQL backup (run every two hours) and the other for the filesystem backup (start at 8pm).

Do you know how the issue occurs? Every time I've ever run two backup jobs at the same time, or have tried to start a job when another is already running, it always queues. In this case the filesystem job had been under way for nearly 90 seconds before MCS attempted (and succeeded) to run the 8pm SQL job. Plus this doesn't occur every day, only occasionally.

ionthegeek

2K Posts

0

January 27th, 2016 07:00

I've seen this before for plug-ins that use multiple avtars to implement multi-streaming. I believe it depended on the order in which the jobs started (if the filesystem job started first, the SQL job would queue but not vice-versa). Let me see if I can dig up the details.

B

brastedd

336 Posts

0

January 27th, 2016 07:00

Yes, the filesystem job took about 25 minutes. The SQL job took a little under 2 minutes.

Can someone from EMC shed any light on this perplexing issue?

pchassang

33 Posts

0

January 27th, 2016 07:00

Perhaps because file system job use file cache and sql hash cache.

I have to test on my lab, but no possible for the moment.

regards.

Pascal

pchassang

33 Posts

0

January 27th, 2016 07:00

OK sorry for disorder

You have more than one minute between logfiles.

Could you confirm the end time of file system job is older than the bigining of sql job?

if yes, I could not help you, because it is built-in.

B

brastedd

336 Posts

0

January 29th, 2016 11:00

I found this error again today. Another of our servers had simultaneous backups (SQL and FS) and the SQL job failed while the FS job completed without error. On previous days the SQL job waited until the FS job was finished before starting, and thus completed successfully.

Any ideas why this is, or should I open a support ticket?

ionthegeek

2K Posts

0

January 29th, 2016 12:00

This is the same bug. The fix disallows clients from picking up multiple workorders.

If I recall correctly, the issue could happen any time a backup that uses multiple avtars starts before a filesystem or VSS backup because the client software no longer prevents multiple avtars from starting up.

Did you try separating the SQL and filesystem datasets? If you schedule the SQL backups after any filesystem and VSS backups, you should stop seeing this problem.

B

brastedd

336 Posts

0

January 29th, 2016 13:00

Thanks Ian for the clarification. I've moved the filesystem backup to start later, so we shouldn't see this anymore.

B

brastedd

336 Posts

0

January 29th, 2016 13:00

The SQL and filesystem datasets are already separate, though they are scheduled to start at the same time. The issue only appears to affect our Windows Server 2003 boxes, which have a single job for the filesystem backup (as opposed to separate VSS and filesystem backups). In my case the filesystem job starts just before the SQL job does, and the filesystem backup isn't the one that fails, so that is different to what is reported in the bug.

When you say the fix "disallows clients from picking up multiple workorders" do you mean that nothing will queue anymore?

The issue doesn't occur all the time. Most of the time the jobs queue up as they should and complete one after the other.

ionthegeek

2K Posts

0

January 29th, 2016 13:00

Re-reading the bug notes, it looks like I had it backwards -- if the SQL backup starts first, things should run fine. If you move the filesystem backups to start later, that should prevent the issue from happening. Since this is caused by a race condition, I wouldn't be surprised if the symptoms vary.

Sorry, I should've been more clear about the fix. The fix disallows clients from picking up multiple workorders concurrently. The workorders will be queued as normal but the Administrator Server will never offer a client more than one workorder at a time so only one backup will run at a time.

View All

No Events found!