Start a Conversation

Unsolved

This post is more than 5 years old

4423

October 26th, 2014 04:00

Design: How to spread Avamar snapshots

Hi

As our VMware environment grows, we're running into more and more issues with Avamar. Ok, ok, not really Avamar issues but more of a design thing. We have one vCenter managing about 50 ESXi hosts and over 700 VMs. Our backup window is between 11pm and 6am. We're running about 5 Avamar proxy's.

The last few months we notice more and more "consolidation failed" errors in the morning after the backup has finished. Sometimes these are easy to clean, sometimes a Storage VMotion is required and sometimes a VM needs to be shutdown first. Our SLA window for solving these issues is getting smaller and smaller because of customers demanding more uptime in general, which makes shutting down a VM for cleaning a failed consolidation a pain and gets us more complaints from customers.

Talking to VMware Support they told us that often Avamar is too fast or is demanding too much from the host when it comes to creating and cleaning snapshots. Not the concurrent backups is the issue but the snapshot creating and/or cleaning at the exact same time. At some hosts I see Avamar kicking in, creating snapshots on 7 VMs within one minute.

A few questions:

- How are your experiences with "failed consolidations"? How often do you see them in your environment?

- How do you design for this number of VMs to be backed up? Do you try to put as much load on the proxy's as possible or put a proxy in for each cluster? Is there a max number of total VMs (not concurrent) or ESXi hosts that should be covered by a proxy?

- Are there ways / settings to force a proxy to only make 1 snapshot at the same time?

Any input is appreciated.

Gabrie

1 Rookie

 • 

20.4K Posts

October 27th, 2014 04:00

I have 3 vCenter environments being backed up to one Avamar grid (Data Domain as the target).  I have deployed 5 proxy servers per each vCenter, since each proxy has 8 "streams" that gives me 40 possible concurrent sessions per cluster. We are backing up around 900 VMs daily and used to get at least 2-3 VMs that could not be backed up either due to error 10052 (could not snapshot VM) or error 10056 (too many existing snapshots). The error 10056 is the one where we have to constantly consolidate snapshots, we had to create a script that would it everyday before we start our backups. Last week we upgraded to 7.0.2 (from 7.0.1) and i have not seen a single occurrence of this issue, the jury is still out but i have never gone more than 2-3 days without at least one occurrence of 10056 that required consolidation.

October 27th, 2014 07:00

Thanks for your input. We're on 7.0.1 and preparing to go to 7.1 in the next three weeks. I hope we'll see beter results by then.

October 27th, 2014 11:00

Can the proxy be 7.0.2 when the grid is on 7.0.1?

45 Posts

October 27th, 2014 11:00

What is the vCenter version? is it 5.0 or 5.1 update?

498 Posts

October 27th, 2014 12:00

I am currently on 7.0.2  ( and no I don't think the proxy can be higher than the grid)

When I have seen this happen to me on previous versions ( and rarely now) it has been that the proxy needed to be rebooted.

if you look at the failures - do you see a common proxy that is having the issue.

in Activity window get your failures and the go look at the column that says what proxy.

Every time I have had this error - its always failing on one proxy.

reboot that proxy and it goes away, until the next time they do some Change Control the vcenter or vm's and don't tell me.

sees when they do that, they 'confuse' a proxy and it needs to be rebooted.

now this is just me.. and end user... relaying what I have experienced...

45 Posts

October 27th, 2014 12:00

I have had this issue before, your probable root cause is on the vCenter. The RPC connection between your backup tool in this case Avamar and the vCenter drops and the backup solution's backup jobs may fail and snapshots are not removed from virtual machines. The best way you can verify this is open a case with VMware and ask them to verify VPXD log on vCenter and confirm if you have drop in connections. our problem lied in the vCenter version, once it was upgraded to the suggested version, I havent had any timeouts neither orphaned "hidden snapshot"

Take a look at the VMware kb article:

VMware VirtualCenter Server service fails randomly when using 3rd party backup solution (2045561)

October 27th, 2014 12:00

Thank you I will try this.

498 Posts

October 27th, 2014 12:00

on a side note...

When I first started having this issue .... support had be break up the start of my vm's.

they said to not start more than 100 at a time.

so I created more group policies. and I try to keep the to about 100  (of course retiring servers can make them get low so I check every now and then to see which is the smallest and add new ones to it.)

I have about 600 vm's from one vcenter , but going to 2 grids.

so I have them start about 1 hour between.  (that was my choice not their suggestion - I think I could have started them closer together, but this works and they all finish way before my window closes)

October 27th, 2014 12:00

vCenter Server 5.1 Build 1123961 (Update 1a)

45 Posts

October 29th, 2014 07:00

Hello GabesVirtualWorld,

Were you able to identify if the root cause for your problems with orphaned snapshot?

November 2nd, 2014 22:00

Hi

We're working on identifying the root cause. Looking at the number of snapshots being made at the same time, but there is no real clear common behaviour. My backup admin has plans to move to 7.1 as soon as possible, but that still is not withi 2-3 months.

Gabrie

17 Posts

December 8th, 2016 06:00

Failed consolidation is nothing but existence of Orphaned snapshots. Check if orphaned snapshots are still residing on vm.

Try cleaning orphaned snapshots from Avamar command line or from vcenter guys before initiating any backups.

Here are answers to your questions.

How are your experiences with "failed consolidations"? How often do you see them in your environment?

We did face few but i used to perform consolidation from Avamar command line and that helps. It was very rare we saw like this but so far none. We are at version 7.2 Avamar and Vcenter 6.0.

- How do you design for this number of VMs to be backed up? Do you try to put as much load on the proxy's as possible or put a proxy in for each cluster? Is there a max number of total VMs (not concurrent) or ESXi hosts that should be covered by a proxy?

Best practice is to deploy atleast  2 proxies per cluster. There isn't any limit for max vm's per proxy. As long as backup window is available, proxy can backup as many vm's as it can in that window with its 8 streams.

- Are there ways / settings to force a proxy to only make 1 snapshot at the same time?

the only way could be switch from 8 stream to 1 stream for a proxy.

No Events found!

Top