Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

5377

May 11th, 2014 23:00

Avamar 6.1 Slow Scanning of Files

We've been trying to get our entire DFS data set backed up by Avamar 6.1 and are very close but still fighting some very poor performance to fit it all within a 24 hour period. The results from a recently completed backup job are below:

2014-05-07 06:59:22 avtar Info <5156>: Backup #70 timestamp 2014-05-07 06:59:22, 8,509,264 files, 838,732 folders, 3,767 GB (566,573 files, 5.636 GB, 0.15% new)

2014-05-07 06:59:22 avtar Info <7539>: Label "12pm-12PM_File_Server", scheduled to expire after 07/05/14 (2014-07-05 02:00:00 UTC), daily backup

2014-05-07 06:59:22 avtar Info <6083>: Backed-up 3,767 GB in 1139.38 minutes: 198 GB/hour (448,101 files/hour)

2014-05-07 06:59:25 avtar Info <5587>: Updating cache files in C:\Program Files\avs\var

2014-05-07 06:59:25 avtar Info <5069>: - Writing cache file "C:\Program Files\avs\var\f_cache.dat"

2014-05-07 06:59:27 avtar Info <5546>: Cache update complete C:\Program Files\avs\var\f_cache.dat (352.0MB of 768MB max)

It is taking 19 hours for the Avamar client to complete the full backup when there is only 5.636GB of new data to actually be backed up. I suspect that it is having to scan far too many files as it shows that 566,573 files have changed but I don't think that this is the actual case.

We have added resources to this Hyper-V VM so it is now using 4 CPU's and 6 to 8 GB dynamic RAM. It is a Windows Server 2012 standard server.

Currently this backup is failing as I had added in some additional data and now it is only getting roughly 30% of the way through but I think there are some underlying issues causing the backup to be quite slow.

Any ideas?

May 13th, 2014 03:00

Besides Trend, mentioned earlier, the use of Sophos on a client is known to cause file metadata to be changed, resulting in large proportion of files being re-chunked by avtar.

You can read more about it here

185839 - Poor backup performance on Avamar client running Sophos anti-virus

2K Posts

May 12th, 2014 06:00

It is taking 19 hours for the Avamar client to complete the full backup when there is only 5.636GB of new data to actually be backed up.

This is a misleading way to look at Avamar performance. The 5.636GB of data is only the amount of new data sent to the server -- Avamar still has to process the other ~4TB of data to make sure it hasn't changed.

The GB/hour rating on the <6083> log line is the most relevant number -- this system is backing up 198GB/hour which is more than double the typical performance for Windows filesystem backups.

When troubleshooting filesystem backup performance, the most important first step is to ensure that the file cache is sized correctly. The Avamar client generates a hash of the metadata for each file and compares this with the file cache to determine if the file has changed. If there is a file cache miss, the file will be chunked, meaning the client has to read the entire file from disk. If the cache is undersized, the client will needlessly re-chunk the files whose cache entries were discarded. Based on the log snip above, the cache appears to be adequately sized (only 352MB of 768MB are in use).

If the cache is adequately sized, there must be some other reason the client is finding >500,000 changed files.

  • Has this client completed an initial backup of all of these files? Any new files must be backed up at least once before their entries in the file cache will be used.
  • Have prior backups wrapped up cleanly? A sealed backup is created whenever a backup job completes or times out. This sealed backup is used to validate the cache information during subsequent backup jobs. If a backup fails, no partial is created so any new information added to the caches during that backup cannot be used.
  • Is some application changing the modification dates or other metadata for these files without changing the content? A change to the modification date, ACLs, etc. will cause a file cache miss and the file will have to be re-chunked.

Regarding adding resources to the system, the bottleneck for the vast majority of clients is the client's I/O performance. Adding more CPU power or memory is unlikely to improve performance.

18 Posts

May 12th, 2014 12:00

To add to Ian's comment, I would also suggest looking at any virus software running on the client. Trend (and others) have settings that can trigger a virus scan on file Access rather than file Modification.

7 Posts

May 12th, 2014 17:00

null0,

We have tried this previously as well to no great effect. It can't hurt to test it again seeing as a fair amount of things have changed in that meantime.

Thanks!

7 Posts

May 12th, 2014 17:00

Ian,

My main concern is that the 5.63GB shows that not much has changed and the majority of the work being performed by the Avamar client is primarily a comparison between filename, location and modified date compared to what is in the cache file. Out of curiosity is the 5.63GB the amount of block level data transfer performed or the total size of the modified files since last backup?

I know the maths works out that 193GB of backup data is processed in an hour but to me this seems quite slow when considering that 0.15% of that data actually needs to be backed up rather than simply parsed. Maybe there's some complexity to the parsing process that I'm not aware of which is making this process actually performing to expectation but as I said above, it seems slow for what is a non-disk intense process.

  • There were 5 previous sequential backups which completed without error right before the backup I extracted that log section from. The log I used was also a successful backup without errors. I had added some data into the backup set during these 6 successful backups and that had added some substantial time to each backup which I assume is due to Avamar hashing the new files.
  • As above
  • I can't think of any particular reason that so many files would be modified in a single day. I have reviewed the completion statistics of a few of the successful backups and they all have a similar number of files listed in the brackets (modified files?). In every run it is at least 550,000 files - sometimes a few 10k more likely due to me expanding the backup set. This is for a company of about 300 users which equates to about 1800 files per user per day so I doubt it is users changing them. DFS shouldn't be modifying these files unless a user has, neither should the virus scanner. The fact that the number is consistent lends me to thinking that there is a specific set or type of files that are not making it into the cache for whatever reason.

The memory was increased initially as it was previously only 4GB, this was restricting the cache size. It is currently assigned dynamic memory but I believe it is always initialising with a max cache of 700+ MB. The CPU was on advise from the BaaS vendor and was probably more of a guess from them than a real idea. I have previosuly monitored the disk queue length to ensure there is no bottle neck there and it seemed fine then, I might try and give it another run to see if anything has changed in the meantime.

Thanks for your idea Ian, always helps to hear others ideas - hopefully something will twig in my head soon!

2K Posts

May 12th, 2014 19:00

My main concern is that the 5.63GB shows that not much has changed and the majority of the work being performed by the Avamar client is primarily a comparison between filename, location and modified date compared to what is in the cache file.

It is vanishingly unlikely that the client is spending any significant amount of time running cache lookups. The file metadata lookups ("stat" on Linux; I don't recall the API call for Windows offhand) and file cache lookups are both incredibly fast. On Windows systems, the typical scan rate is about a million files per hour.


For each and every file cache miss, Avamar has to completely re-read and re-chunk the file. That is almost certainly what is taking the time here.

Out of curiosity is the 5.63GB the amount of block level data transfer performed or the total size of the modified files since last backup?

The 5.63GB is the total size of the chunks that were transferred to the Avamar server, not the size of the modified files. The total size of the modified files is not reported in the "Status" log lines.

0.15% of that data actually needs to be backed up rather than simply parsed

We are still re-chunking more than 500,000 files a day and that's going to take time. If these files are not new, some process is touching the files in a way that is making Avamar think they have been changed. The 0.15% change rate of the file content tells me that the content of the files is probably not changing.

Maybe there's some complexity to the parsing process that I'm not aware of which is making this process actually performing to expectation but as I said above, it seems slow for what is a non-disk intense process.

The backup procedure for an individual file is as follows:

  1. "stat" the file and hash the result (we don't hash everything -- for example, the client ignores access time changes by default)
  2. Look up the file metadata hash in the file cache. If there is a cache hit, mark the file added and move on.
  3. If there is a file cache miss, read the file and chunk it up into variable sized chunks. Compress each chunk and hash it. For each chunk:
    1. Look up the chunk in the hash cache. If there is a cache hit, mark the chunk added and move on
    2. If there is a hash cache miss, send the hash to the server and ask if it already has the associated chunk. If the server has the chunk, mark the chunk added and move on
    3. If the server does not have the chunk, send the chunk and its hash over the network, mark the chunk added and move on

I'm leaving "mark added and move on" deliberately vague because the full explanation is a bit much for a forums post. Please trust me that this process is also very fast.

More than 90% of the performance gain in Avamar comes from the file cache. If we're getting file cache misses for 500,000 files, that's going to cause a big performance hit. Processing the 500,000 changed files is a disk intense process. Those files will be read in their entirety and re-chunked by the client. Whether or not the data needs to be sent to the server at that point is largely irrelevant -- we have already wasted the time reading them off the disk and re-chunking them.

The typical performance rates in an Avamar environment are:

  • 30GB/hr per node server ingest rate (75GB/hr per node for 7.8TB nodes)
  • 700GB/hr logical network throughput for typical Gigabit LAN links (350GB/hr raw with roughly 2:1 compression)
  • 1 million files per hour filesystem stat performance
  • 100GB/hr Windows filesystem read performance
  • 300GB/hr Linux filesystem read performance

For Avamar servers with 4 storage nodes or more, the bottleneck for any given backup is almost certainly the client I/O performance.

The fact that the number is consistent lends me to thinking that there is a specific set or type of files that are not making it into the cache for whatever reason.

That's possible but I can practically guarantee the issue is environmental in some way. The Avamar client's behaviour is extremely predictable. When the caches are correctly sized, I have never seen the client fail to cache a file that should have been cached.


I would recommend enabling --statistics in the dataset. It adds about 20 lines of information to the log showing the breakdown of chunk sizes, etc.. Another option with caveats -- enabling --verbose=2 in the dataset will provide information about each file that causes a file cache miss but please be aware that the resulting log will be extremely large. Shockingly large.


If you are not able to determine why the client is seeing so many files as changed, I would recommend opening a service request and working with support.

7 Posts

May 12th, 2014 20:00

Ian,

Thanks for the very technical and detailed information, it helps to have a better understanding of what happens within the client rather than thinking of it as more of a black box system. I’ll add the --statistics option and report what information I receive back in 2 days (today’s backup has already begun). Is this added to the “avagent.cmd” file?

Currently the C: drive has 80GB of free space, would enabling verbose logging use anywhere near that much?

Thanks,

Shayne

2K Posts

May 12th, 2014 21:00

You can add the statistics flag to avtar.cmd or the client's dataset. I believe there is an Advanced option in the dataset to control statistics, actually.

The verbose logs won't be 80GB but you should be careful with this flag either way. Besides the large log size, plug-in backups don't like the verbose flag so that one definitely has to go in the dataset, not into avtar.cmd.

7 Posts

May 13th, 2014 04:00

Sophos is installed on this server. Do you think that Sophos is causing Avamar to consider a large number of files as having been changed in some way?

May 14th, 2014 02:00

I think it is quite likely...

The KB I mentioned explains further.  Are you able to view it using the link? 

If not - see below for the current customer accessible copy. 

Poor backup performance on Avamar client running Sophos anti-virus Article Number:000185839

This behaviour can cause long running Avamar backups and may prevent backups from finishing within the available backup window 
An Avamar client where Sophos anti-virus is installed may occasionally experience poor backup performance when a Sophos on-demand scan is run on the system.

On days where Sophos anti-virus runs an on-demand scan, the number of percentage of changed files reported by Avamar may be extremely high. 

avtar Info <5156>: Backup #82 timestamp 2014-03-21 16:24:53, 80,687 files, 11,757 folders, 46.06 GB (80,188 files, 17.03 GB, 36.98% new)

avtar Info <6083>: Backed-up 46.06 GB in 73.38 minutes: 38 GB/hour (65,976 files/hour)

Here, avtar is reporting that the changed data (the values in brackets) are very high relative to the dataset being backed up.  In this case, over 99% of the files are detected to have changed (80,188 files out of 80,687).

On other days, the proportion of changed files is more typical

avtar Info <5156>: Backup #91 timestamp 2014-04-03 10:11:54, 139,164 files, 13,805 folders, 60.73 GB (4 files, 280.1 KB, 0.00% new)

avtar Info <6083>: Backed-up 60.73 GB in 7.03 minutes: 519 GB/hour (1,188,522 files/hour)


 


 
As discussed in the attached Sophos documentation

7.4 Archiver backs up all files that have been scanned on demand

Symptom
Your archiver always backs up all the files that Sophos Anti-Virus has scanned on demand.

Cause
This is because of changes that Sophos Anti-Virus makes in the “status-changed” time of files. By default, Sophos Anti-Virus tries to reset the access time (atime) of files to the time shown before scanning.
However, this has the effect of changing the inode status-changed time (ctime). If your archiver uses the ctime to decide whether a file has changed, it backs up all files scanned by Sophos Anti-Virus.

Resolve the problem
Run sweep with the option --no-reset-atime.


The Avamar client process (avtar) relies on the ctime of a file to determine if that file was changed.  If third party software alters the ctime of a file in the dataset, avtar will detect that file to have changed and will process (chunk, compress, hash) the entire file. 

This can take considerable time if many files have had their ctime modified.

 
Make the modification suggested by the Sophos documentation and configure Sophos to run with the option "--no-reset-atime"

 
Link to Sophos pdf on the web
http://tw.sophos.com/sophos/docs/eng/manuals/savul_4_umeng.pdf

7 Posts

May 14th, 2014 04:00

Thanks NR, apologies for not seeing that link previously. I was reading it in email and must have skimmed tat line. It certainly does look like a promising lead but unfortunately I'm no sure exactly where to set that. All of my searches about this switch lead to pages regarding Sophos Linux clients.

I'll keep plugging away in the Enterprise Console and on the client end. Thanks again for the tip and link!

May 15th, 2014 05:00

You're welcome!  Once you fix the cause of whatever is changing the metadata for all those files, if you still see relatively slow performance the following article discusses how to approach identifying backup performance bottlenecks.

https://support.emc.com/kb/163552

Best of luck,

Nick

7 Posts

May 18th, 2014 18:00

It looks like Sophos was the culprit. After excluding avagent.exe from the Sophos scans the backups are taking about 8 hours less to complete.

The number of files "changed" is still relatively high (250,000+ changed between Saturday and Sunday) but the speed has improved dramatically.

Thanks for the help guys, much appreciated!

No Events found!

Top