What I mean by this is in terms of strategies for proxy configuration and how do you determine the "real" ESXi resources required to support these backups without impacting your VMs. Since deploying we see times when these backups are running where the VMs loose pings, applications residing on the VMs become slow, unresponsive until the backup completes. In terms of proxy deployment, recommendation is roughly 1 proxy per 20 VMs. We initially had it set up with a single policy, letting Avamar do the load balancing, backups completed quickly, well within the window, unfortunately, "the lights dimmed". Changed this so that we are doing backups within a couple of different policies, limiting the number of proxies active on an ESXi host, basically getting a little more granular, seems to help, but there are still times when we are resource constrained. I understand the underlying technology and the fact that these backups do place significant load on the ESX host, especially in terms of storage, cpu and memory, ethernet at this point doesn't appear to be the issue. We have a mix of production & development VMs, data stores are Symmetrix (DMX-4) thin pools, the topology is FC-SW. Up until this point we've been using VMware fixed pathing policy, we are just beginning to introduce PowerPath/VE into the environment, which should help with the storage side of things. BTW, I am a storage & BRS guy, not a VMware guy. Running Avamar 5.0.3-29.
An update to my original post:
“The image backup process always attempts to use the highest-performing available connection between the storage virtual disk vmfs and the Avamar source deduplication engine that runs on the Avamar proxy virtual machine appliance.
For example, assume a virtual disk (type SCSI) hosted by a SAN LUN connected to a Fibre Channel and an ESX host running the Avamar proxy virtual machine appliance connected by Fibre Channel to the same SAN LUN. In this case, the high-speed Fibre Channel is used. If instead no Avamar proxy
virtual machine appliance has connectivity to high-performance SAN, the proxy uses regular network connectivity which can lead to bandwidth problems and slow or even failed backups. When SAN connectivity is not available, guest backup is likely preferred to image backup.”
Ref: “Avamar 6.0 for VMware Guide”
BTW, the same verbage is in the Avamar 5.0 for VMware Guide. In our environment all of the traffic will be over FC. So, PowerPath will improve things relative to picking best path and multipathing. On those ESXi hosts without PowerPath and using VM Fixed, storage throughput is a major issue, essentially a single pipe. Just to summarize what we know (or think we do), the following strategies make a lot of sense:
- have the proxy that is backing the VM not be on the same ESXi host that the VM lives on.
- get PowerPath installed on all ESXi hosts.
- our datastores are on DMX-4, if they were on VMAX or NS or VNX, then we could leverage being able to have Avamar do the snapshots on the array (tiered pool with SSD even), which would take a huge load off of the ESXi hosts.
- if we could structure the VM deployment such that one or more ESXi hosts contained only non-production VMs (template VMs, test & development VMs, etc.), we could have all/most of the Avamar proxies live on those ESXi hosts, the backups could beat the snot out of that/those ESXi hosts all night long and no one would complain (if you did get any complaints, then you should promote those developers....).
- snapshots are created on the same datastore of the VM that you are backing up.
There are no best practices or documentation on how to best configure and size your environment to support this, and there needs to be. I see posts on the forum where sites are backing up 170 VMs and they have 29 or 30 proxies configured, perhaps that works if you are a 9-to-5 environment, we are not. The rule of thumb that we received was 1 proxy for each 20 VMs. So, we are still looking for a Avamar resource to take a look at our environment and come up recommendations.
October 25, 2011
To put this one to bed. Our biggest issue turned out to be storage latency. We moved all of our datastores to an array that supports VAAI, specifically a NS960 (so, CX4-960 for the block storage). Also runing FAST and FASTCache (1TB). Single pool, with a mix of SSD, FC and SATA II. At Avamar release 5.x, no longer having issues with snapshots.
Message was edited by: warchol