Article Number: 000019552

Avamar: Backup performance behavior and theory

Summary: This article discusses behavior during an Avamar backup and helps explain Avamar client backup performance.

This article may have been automatically translated. If you have any feedback regarding its quality, please let us know using the form at the bottom of this page.

Article Content

Instructions

The purpose of this article is to describe what happens during an Avamar backup with a focus on helping the reader understand backup performance behavior.

This article is a companion to the following articles:

What happens during an Avamar backup?

The avtar backup process:

1) Loads the file and hash cache files into memory

2017-06-09 23:00:25 avtar Info <5586>: Loading cache files from C:\Program Files\avs\var
2017-06-09 23:00:25 avtar Info <8650>: Opening filename cache file 'C:\Program Files\avs\var\f_cache2.dat'
2017-06-09 23:00:25 avtar Info <5573>: - Loaded filename cache file (6,532,792 bytes)
2017-06-09 23:00:26 avtar Info <8650>: Opening hash cache file 'C:\Program Files\avs\var\p_cache.dat'
2017-06-09 23:00:28 avtar Info <5573>: - Loaded hash cache file (402,653,728 bytes)
2017-06-09 23:01:01 avtar Info <6426>: Done loading cache files

2) Creates VSS snapshots (on Windows):

2017-06-09 23:04:32 avtar Info <19008>: Obtaining available VSS providers
2017-06-09 23:04:32 avtar Info <8776>: Freezing volumes now...
2017-06-09 23:04:32 avtar Info <8780>: Creating the shadow copy set (DoSnapshotSet) ... 
2017-06-09 23:14:33 avtar Info <8781>: Shadow copy set successfully created.
2017-06-09 23:14:34 avtar Info <6074>: VSS snapshot set creation successful

3) Walks all files defined by the dataset
For all files within the source dataset, avtar takes the full path and combines it with the stat-like metadata to calculate a hash to uniquely identify the file.

For more detail, see Avamar: What happens when avtar reads a file during the file scan phase.

4) Compare calculated hashes with those in the local client caches

Avtar looks up the file's hash in the file cache. It checks if it is new or whether it has been modified since the previous backup.

If the file cache lookup succeeds, the file exists and is unchanged.

If the lookup fails, the file is new, or has changed. It must be read and processed.

For more detail, see Avamar client - What has to change before avtar considers a file to have been modified?

5) Process new and modified files

For any new or modified file avtar must:

Read the entire file
Break it into variable-sized chunks
Compress each chunk
Calculate a hash for each chunk

6) Check if missing hashes are present on the Avamar server.

Avtar sends data for any missing hashes over the network to the Avamar server to check if they already exist. These are known as 'ispresent' requests.

7) Data is written to the Avamar Server (and if appropriate, Data Domain).

For more detailed workflow, see the attached Avtarprocess.pdf.

Overview of an Avamar backup from a performance perspective:

Taking the above stages we split them into 'phases' which have the greatest impact on backup performance:

Phase 0. Create VSS snapshots.

The Volume Shadowcopy Service (VSS) creates snapshots of volumes specified within the source dataset. Applications can continue to write to the volume whilst the backup runs.
Avamar backs up the read-only 'frozen' snapshot of the volume rather than the write-able volume. This ensures it has a consistent set of data to back up.

VSS snapshots take seconds to complete. If a client is experiencing VSS issues, this delay or prevents the backup from proceeding.

Phase 1. File scan phase. The avtar process stats all files in the target dataset

For clients with millions of files, this phase may be the most time consuming.
Database data contains few, larger files so the file scan phase takes little time. Database clients typically consume their time during phase #2.

For a client with rotational disks in RAID 5 configuration, file scan performance of ~1 million files per hour is typical. This varies from 300,000 to 3 million per hour. It depends on the client environment and the characteristics of the data being backed up.

From v7.3, Linux clients backing up to Data Domain can take advantage of Linux Fast Incremental (LFI) functionality. This avoids scanning the entire dataset each time the backup runs.

Critical resources: random-seek performance of the disk where the backup data is stored.

Phase 2. Avtar reads changed files and then chunks, compresses, and hashes the data.

A lot of computation occurs during this phase. For each modified or new file, avtar breaks it into small chunks. It compresses each chunk and calculates a hash as a 'fingerprint' to identify the chunk.

Files in database backups are often large and tend to change daily. Avtar spends most of its time in this phase. It is best to use official Avamar database plugins to ensure that the database is handled efficiently, leveraging incremental backup functionality, transaction logs and so forth.

Typical file processing performance is around 100 GB per hour but can vary up to 300 GB per hour. This is environment-dependent.

Critical resources: Client disk and CPU

For LAN backups where there are no bottlenecks in sending data to the Avamar server, phases #1 and #2 take the most time.

In the following chart, consider that the amount of area in the bars of the graph corresponds to how long the backup takes. Changed files can drastically increase the amount of time required, especially if those files are large.

For file system datasets, expect ~0-3% of files to change on a daily basis.

Avtar must 'stat()' each file that changes by performing two I/O operations, one to check the file attributes, another for the security attributes.

To achieve the benchmark scan rate performance of one ~1 million files/hour for file system backups avtar requires approximately two million seek operations per hour, or 600 seek operations per second.

For example: If a backup has a 3% change rate, 97 out of a 100 files require two disks seek operations in order to identify whether they changed. The remaining three which did change, must be scanned, chunked, compressed, and hashed.

This considers only the file scan phase and does not take into account I/O resources required for processing any files which were modified.
The more data within the modified files, the more work is needed to complete the backup.

Phase 3. Checking the existence of hashes on the Avamar server

Phases #1 and #2 produce hashes which point to elements of the backup. These elements could be unique file chunks, file systems or entire backups.

The hashes are written to the client cache files and compared with the hashes present on the Avamar server to check if any new data must be added. This is true whether an Avamar server or Data Domain is the target storage.

Hash comparisons between Avamar client and server are typically fast. They should not bottleneck the backup if the Avamar server is;

Healthy
Under regular load levels
Located on same LAN segment as the client

Since the hashes are only 20 bytes in size, this phase is influenced more by network latency than network bandwidth. When the hash arrives at the Avamar server, the general load and the random seek performance of the data nodes' disk subsystem determines how quickly the hash is retrieved and compared with that sent by the client.

Critical resources: Network response time and Avamar data node random seek performance.

The random seek performance of a physical Avamar scale with the number of and the size of data nodes. AVE systems perform less well, comparable to a single node system.

Phase 4. Sending the new chunk over the network to the Avamar server or Data Domain

When a client sends a new, unique chunk (up to 64 KB in size) to the server the performance relies primarily on the network bandwidth. This mainly affects WAN-based clients which generate a large amount of changed data each day. It can also affect those operating over congested network links.

Below are schematics showing data flow where a client sends data to an Avamar system and to an Avamar - Data Domain integrated system.

data flow where a client sends data to an Avamar system

data flow where a client sends data to an Avamar/DataDomain integrated system

Critical resources: Network bandwidth between client and server

Phase 5. Data written to Avamar server or Data Domain

Backup data must be written to the Avamar server or the Data Domain system.

Critical resources: Avamar server disk write performance and general loading.

Avamar: Backup performance behavior and theory

Summary: This article discusses behavior during an Avamar backup and helps explain Avamar client backup performance.

Article Content

Instructions

Article Properties

Affected Product

Last Published Date

Version

Article Type

Welcome

Welcome to Dell

Avamar: Backup performance behavior and theory

Summary: This article discusses behavior during an Avamar backup and helps explain Avamar client backup performance.

Article Content

Instructions

Article Properties

Affected Product

Last Published Date

Version

Article Type