I am currently in deployment phase of Avamar & Data Domain on our Production system and we are facing to lack of performance during backup on a specific configuration.
We have installed avamar windows client on 2 Windows 2008 R2 file servers which are currently hosting around 50 Millions of files (new files /day = 200000).
Those file server are composed of 2 Windows nodes hosted on a VMware plaform (5.1) and are configured with RDM volume to get rid of vMotion "by pair" of VM. All nodes are using 4 Gb Ram. File volume have been formated to use smallest block size as files hosted are very small (1Kb)
To deploy the cluster configuration of avamar Client we have add to windows cluster, a 10Gb dedicated volume for /var of Avamar.
the symptoms are:
performance: 48 h for 120Gb and 35 millions of files
/var disk usage: 100%
memory usage: 100% - around 90% used for avtar process
Do you have any idea on how to solve this situation?
We are replacing our currently backup solution (comvault) by Avamar, but performance are very bad (>48h for initial backup) and lots of error in secondary one.
it is not acceptable while our current solution is doing the job in less that 6 hours on an legacy platform through slow network link.
I am currently wondering if I will or not refuse the delivery of Avamr solution due to this performance.
Many thanks for you help.
I replied to your other thread before seeing this one.
Initial backups are always going to take a long time since every single file will need to be fully processed by avtar.
Daily backups should take much less time, however here you have 50 million files which need to be scanned and 200,000 files which need to be fully processed to identify changed data.
200,000 files changing out of 50 million is a normal proportion (0.4%). The more pressing concern is the large number of files in the dataset as this makes the scope of the backup much more challenging.
Typical backup performance is 1 million files per hour but I have seen some well specified clients where the data is hosted on very fast storage can achieve 2-3 million files per hour.
From the numbers you mentioned above I guesstimate it's achieving around 1.5 million files/hr
You can check by searching for the following two lines from the log of a completes backup
2015-02-04 08:04:28 avtar Info <5156>: Backup #405 timestamp 2015-02-04 08:04:28, 7,214 files, 784 folders, 5.336 GB (250 files, 6.309 MB, 0.12% new)
2015-02-04 08:04:28 avtar Info <6083>: Backed-up 5.336 GB in 3.51 minutes: 91 GB/hour (123,435 files/hour)
The client will probably need the file scan rate to be in excess of 3-4 million files/hour in order to back up the dataset within a daily backup window.
Given that this is a brand new, very large client I personally think the most appropriate course of action would be to engage the local EMC pre-sales team (rather than support) to discuss it in more depth.
Thanks for your input. We are currently in transition process to move from Comvault to Avamar & Datadomain, so I have Comvault performance as baseline, and we are very disappointed by those provided by this combination.
Here are details regarding the logs of the "primary" backup:
2015-02-05 15:25:04 avtar Info <5156>: Backup #1 timestamp 2015-02-05 15:25:05, 42,456,134 files, 130,518 folders, 81.07 GB (42,456,134 files, 20.08 GB, 24.77% new)
2015-02-05 15:25:04 avtar Info <6083>: Backed-up 81.07 GB in 1227.57 minutes: 4.0 GB/hour (2,075,138 files/hour)
You could then see that performance is very poor in term in transfer rate but file/hour is descent.
This is due to fact that the average file size is very low.
The avtar file scan performance is indeed relatively high.
The challenge is the number of files which need scanning within the daily backup window. A lengthy 20 hour backup window 2 million files/hr would allow you to complete a low change dataset with around 40 million files but there aren't enough hours in the day for it to scan 50 million.
I'm not familiar with Comvault but if the performance is significantly different it's likely to be approaching the same task in a very different way compared with Avamar.
From an Avamar configuration perspective you could reduce the work done on a single day by splitting the dataset into two or more parts and back up each on alternate day.
Where files in a dataset are distributed amongst multiple storage devices it could be an option to run multiple avtars in parallel, each operating on the data hosted by a different physical storage device (set of spindles). This config requires an RPQ which needs to be submitted by the EMC sales rep.
I'd suggest discussing this client with EMC pre-sales as they are likely to encounter this type of scenario on a regular basis and can advise how they helped other customers solve it.