We're working on identifying the root cause. Looking at the number of snapshots being made at the same time, but there is no real clear common behaviour. My backup admin has plans to move to 7.1 as soon as possible, but that still is not withi 2-3 months.
Failed consolidation is nothing but existence of Orphaned snapshots. Check if orphaned snapshots are still residing on vm.
Try cleaning orphaned snapshots from Avamar command line or from vcenter guys before initiating any backups.
Here are answers to your questions.
How are your experiences with "failed consolidations"? How often do you see them in your environment?
We did face few but i used to perform consolidation from Avamar command line and that helps. It was very rare we saw like this but so far none. We are at version 7.2 Avamar and Vcenter 6.0.
- How do you design for this number of VMs to be backed up? Do you try to put as much load on the proxy's as possible or put a proxy in for each cluster? Is there a max number of total VMs (not concurrent) or ESXi hosts that should be covered by a proxy?
Best practice is to deploy atleast 2 proxies per cluster. There isn't any limit for max vm's per proxy. As long as backup window is available, proxy can backup as many vm's as it can in that window with its 8 streams.
- Are there ways / settings to force a proxy to only make 1 snapshot at the same time?
the only way could be switch from 8 stream to 1 stream for a proxy.