Avamar: A replicating pair shows different levels of capacity usage. How to investigate the causes.

Table of Contents

Detailed Article

Symptoms

Cause

Resolution

Additional Info

Affected Products

Provide Feedback

Summary: Where an Avamar replicating pair shows differing levels of capacity consumption, this article gives a list of possible causes and how to investigate.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Check out other resources

Symptoms

This article discusses the scenario where two Avamar systems (a source and a target) are configured as a replicating pair. The capacity usage is notably higher on one grid over the other even though both the Avamar grids should be storing the same backups.

Before continuing, understand that:

1. An Avamar source replicates selected data asynchronously to the target system daily.

If replication completes each day, the data on the source system remains a day 'behind' the data that is stored on the target system.

2. Daily data change can mean a difference of several percent in capacity values between source and target. There is no cause for alarm if this difference is below 5%. Consider this when managing high capacity on replication pairs.

3. Replication is additive. It does not perform any kind of synchronization between systems. It is not intended that both source and target store the same information. They are fully independent systems.

Cause

Causes and possible reasons for the differences between the "'server utilization' values:

Logical or Physical differences between the grids:

A different number of data nodes on the source and target grids.
The data nodes of each grid have different disk configurations.
Adequately balanced distribution of stripes across the data nodes within each system (to within 2%)
Storage and parity requirements differ between Avamar versions. A difference in the usage may be observed if source and target software versions differ.
The Avamar server Disk read-only level might differ between the two grids.
One grid may be configured for RAIN parity, where the other may not.

Replication configuration:

Backups replicated to the target system may have a different retention policy compared with the source. Review the expiredelta flag for more information. Alternatively, the replicated backups may only cover a particular timespan. For example, the last 4 weeks of backups from the source.
Replication may be configured to replicate only a subset of clients from the source system to the target system. Check if "include" or "exclude" settings are used.
Clients and their associated backups may have been deleted from the source system. The deletion of a client or of backups on the source does not remove the same backups from the target system. The backups remain on the target until they expire according to their retention settings.
Retention policies can be changed for backups or clients on the source system. The change in retention policies affects new backups only. New backups are replicated to the target and adhere to the updated retention policy. Backups already existing on the target retain the retention policy which was applied to them when they were replicated.

Previous capacity management activity:

It is not unusual for customers to notice that one of the Avamar replication pair systems is approaching capacity and then act to reduce the capacity. Remember - an Avamar replication pair consists of two independently managed systems. If actions are carried out on one system, they must also be carried out on the other.
If backups are deleted or retentions reduced on the source system, identical changes must be made on the target. The best way to manage capacity in this way is with the modify-snapups script. This can be run on both Avamar servers with the same backup modification or deletion options.

Differing stripe structure (for example, more parity stripes on one system):

Being independent, two Avamar systems may end up with different stripe structures. Multinode systems can exhibit differences due to their use of parity stripes to protect data. Depending on their capacity history, two multinode systems contain the same backups but one could have a higher number of parity stripes than the other.
Like regular stripes, once created, a parity stripe always remains on the system. Unlike regular stripes, it always consumes a fixed amount of space within the Avamar server. This is so even if their parity group safe-stripes contain no data. Garbage collection has no effect on this behavior.
A replication target system is indirectly protected from major capacity problems on a replication source. However, the situation could occur on either machine if one of them is poorly managed from a capacity perspective.
Related article: Avamar shows up to ~30% usage even after all backups have been deleted and garbage collected

Backups still in MC_DELETED:

One rare scenario to be aware of is where a client is deleted on the source but its backups are retained. This could result in the source having a higher utilization than the target where the backups would be expected to expire naturally. Using the dump_root_hashes.rb script with the backupcompare option helps check for this scenario.

Data on target system from non-replicated backups:

If the system replicates in *one direction only*, check on the target that no clients exist outside of /REPLICATE and MC_SYSTEM.

If such data exists, a difference in capacity usage is to be expected.

Other behaviors:

Replication jobs may not be complete reliably. Data sent to the target may 'lag behind' the source by multiple days.
Both systems contain the same amount of deduplicated data, but the amount of parity overhead on each of them is different. This occurs in the following scenario:
- An Avamar source system is almost full.
- Many backups are deleted from the source system to lower its capacity level.
- Replication of the deduplicated data then occurs from source to target.
- The amount of deduplicated data is the same on both systems.
- The source system initially stores more parity overhead than the target.
Replication does not copy physical stripes from source to target grid. Instead, the target grid is allowed to determine itself where strips and chunks of data are stored.
Sometimes, target Avamar systems can store data more efficiently than a source grid where the data is originally backed up.

Resolution

In this section, we describe which information to gather and how to interpret this information to determine why there is a capacity difference.

Understand the replication environment:

Note the full hostname of the source Avamar grid.
Examine the replication configuration of the affected systems to understand which systems replicate what data and to where.
- It may help to draw a schematic of the environment if it is anything more complicated than replication from one Avamar server to another.
If the source integrates with Data Domain (DD), learn whether the customer's concern relates to backups replicated between DD devices.
Make a note of the full hostname of the target Avamar grid and any associated DD devices which receive replicated backups.

Check the overall health and situation of the grid:

Run the proactive check script on both grids and obtain a copy of hc_results.txt and review to understand the overall situation with the system.

See the "Health check script" section in the restricted notes for information about downloading and running the script.

If there are more serious issues present than a capacity differential, those must be addressed first.

How severe is the capacity differential?

The customer should provide a screenshot showing what they are looking at which leads them to believe there is a capacity consumption differential between source and target.
We would not consider there to be cause for alarm if the capacity difference is less than 5%.
Check the Avamar Administrator UI to understand the levels of Avamar server capacity and (if Data Domain is integrated) metadata capacity.
Be aware of how the UI capacity display works (discussed in Avamar UI dashboard in v7.2 and later shows Metadata Utilization instead of Avamar Utilization).
Run the following command on both systems. The server utilization value gives an overall value of Avamar server (but not Data Domain) capacity levels:

admin@utility:~/>: mccli server show-prop | grep "utilization"

Server utilization               3.7%

Check that the hardware is the same on both grids:

It only makes sense to compare capacity differences for "like" systems.
Using the proactive check output, note the type of nodes present in the system.
The following command shows an overall number, size of and space consumption on the physical nodes:

admin@utility:~/>: mccli server show-prop | grep "Capacity\|capacity\|nodes"

Total capacity                   23.3 TB
Capacity used                    858.5 GB
Number of nodes                  3

This output makes it easy to determine the number and size of the nodes in the system. They are (23.3 / 3 = ~7.8 TB).
The number and size of the hard drive partitions on each node must corroborate this.

For example:

admin@utility:~/>: mapall 'df -h' | grep data

(0.0) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.2 'df -h'
/dev/sda3       1.8T   55G  1.8T   4% /data01
/dev/sdb1       1.9T   54G  1.8T   3% /data02
/dev/sdc1       1.9T   53G  1.8T   3% /data03
/dev/sdd1       1.9T   53G  1.8T   3% /data04
/dev/sde1       1.9T   52G  1.8T   3% /data05
/dev/sdf1       1.9T   52G  1.8T   3% /data06
(0.1) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.3 'df -h'
/dev/sda3       1.8T   56G  1.8T   4% /data01
/dev/sdb1       1.9T   53G  1.8T   3% /data02
/dev/sdc1       1.9T   52G  1.8T   3% /data03
/dev/sdd1       1.9T   52G  1.8T   3% /data04
/dev/sde1       1.9T   53G  1.8T   3% /data05
/dev/sdf1       1.9T   53G  1.8T   3% /data06
(0.2) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.4 'df -h'
/dev/sda3       1.8T   55G  1.8T   4% /data01
/dev/sdb1       1.9T   53G  1.8T   3% /data02
/dev/sdc1       1.9T   53G  1.8T   3% /data03
/dev/sdd1       1.9T   52G  1.8T   3% /data04
/dev/sde1       1.9T   53G  1.8T   3% /data05
/dev/sdf1       1.9T   52G  1.8T   3% /data06

With this information, check the following:

a. Do both systems contain the same number of nodes?
b. Does each node contain the same number of data partitions?
c. Are all the data partitions same size?
d. Are all the data partitions the same size?

The output above shows that the system has three nodes, each node has six data partitions, and each partition is slightly under 2 TB in size.

Check the software version and configuration:

Using the output of the status.dpn command, compare the version of Avamar running on each system.
For multinode systems, confirm that both are configured with RAIN parity per Avamar - How to determine if a server is RAIN or Non-RAIN
Check and compare the two systems' capacity-related Avamar server configuration parameters. For example:

admin@utility:~/>: avmaint config --ava | grep -i "capacity\|disk"

  disknocreate="90"
  disknocp="96"
  disknogc="85"
  disknoflush="94"
  diskwarning="50"
  diskreadonly="65"
  disknormaldelta="2"
  freespaceunbalancedisk0="20"
  diskfull="30"
  diskfulldelta="5"
  balancelocaldisks="false"

Check stripe balancing:

Check the status.dpn output and note the total number of stripes on each data node. The number of stripes is identified in brackets (for example onl:xxx).
There should be less than 2% difference between the total number of stripes on each data node.

Check that garbage collection has been running properly on both systems:

If garbage collection is not running consistently and effectively, it will not remove expired data. The system reports higher capacity usage than expected.
- See the GC Resolution Path article in the restricted notes for information.

Confirm that replication is completing successfully:

Ensure that all replication tasks from source to target complete successfully. If this has not been happening, it could be that there is data still to be replicated from source to target.

Check the replication configuration:

Check the replication configuration (in the UI, CLI or logs) for any of the following flags:

--before
--after
--include
--exclude

The presence of these flags indicates that the intention is that not all backups on the source are sent to the target.

--expiredelta

The presence of this flag indicates that the backups are sent to the target with a different expiration, so capacity cannot be expected to be the same on source and target.

--retention-types

If any of the retention types are missing, certain backups may be prevented from being replicated. Ensure that it ALL retention types are specified, for example:

--retention-types=none,daily,weekly,monthly,yearly

Check the ingestion and data removal rates of both systems:

Run proactive_check.pl --capacity on both systems and compare the ingestion rates of both source and target systems.
If the target is purely a target system and receives ALL backups from the source, its ingestion rate should closely follow the ingestion rate of the source.
The Avamar NEW or DDR NEW columns show how much new data is being added to those systems.
Also pay close attention to the columns "removed", "mins" and "pass" to understand garbage collection behavior on both systems.
This information gives a clear view of what is happening on both systems.
For more information about interpreting the output, see Avamar: How to manage capacity with the capacity.sh script

Dump a list of backups existing on each system:

The dump_root_hashes.rb script is a utility which helps compare the difference in backups that are stored between an Avamar source and a target system. This is even if the backups are hosted on Data Domain storage.
See Avamar: Avamar: How to Use the dump_root_hashes.rb Script to Generate a List of Clients and Backups for information about downloading the utility and use cases, including comparing the content of two Avamar systems.
- Run the tool. Check for inconsistencies in the counts of the backups on all the clients. Pay attention to differences of +/-2).
As discussed in the Causes section, asymmetric capacity management results in differences between the two systems. Review the output to determine if this could be the scenario.
Also:
- Check the target for data on target system from non-replicated backups.
- Check the source for data which was not replicated to the target.

Check for differing stripe structures (for example, more parity stripes on one system):

Being independent, the two Avamar systems may have different stripe structures. Multinode systems can exhibit differences due to their use of parity stripes to protect data. Depending on their capacity history, two multinode systems contain the same backups but one could have a higher number of parity stripes than the other.
Like regular stripes, once created, a parity stripe remains on the system. Unlike regular stripes, it always consumes a fixed amount of space within Avamar, even if their parity group safe-stripes contain no data. Garbage collection has no effect on this behavior.
A replication target system is indirectly protected from major capacity problems on a replication source. However, the situation could occur on either machine if one of them is poorly managed from a capacity perspective.
Related article: Avamar shows up to ~30% usage even after all backups have been deleted and garbage collected

Backups still in MC_DELETED:

One rare scenario to be aware of is where a client was deleted on the source but its backups were retained. This results in the source having a higher utilization than the target where the backups would be expected to expire naturally. Using the dump_root_hashes.rb script with the backupcompare option should help check for this scenario.

Additional Information

Cross replication:

This article has been written specifically for one-way replication where an Avamar source sends backups to an Avamar target.
It is not uncommon for Avamar systems to act as both source and target, sending and receiving data within the pair. This is known as "cross replication".
Investigating capacity differences on a cross-replicating environment is only a valid exercise if both systems are configured to replicate ALL their backups to their partner.
- When running commands to gather information about such a replication pair, all commands must be run on both systems.
Also understand that, if the capacities match on two identically sized replicating pairs, it does not mean that both grids store exactly the same backups.
The source Avamar may be the target for replication data from another Avamar. Or, the target grid may be the target of more than one Avamar source.

Affected Products

Avamar

Products

Avamar

Article Number: 000031740

Article Type: Solution

Last Modified: 07 Jun 2024

Version: 12

Check if your device is covered by Support Services.

Avamar: A replicating pair shows different levels of capacity usage. How to investigate the causes.

Summary: Where an Avamar replicating pair shows differing levels of capacity consumption, this article gives a list of possible causes and how to investigate.

Symptoms

Cause

Causes and possible reasons for the differences between the "'server utilization' values:

Logical or Physical differences between the grids:

Replication configuration:

Previous capacity management activity:

Differing stripe structure (for example, more parity stripes on one system):

Backups still in MC_DELETED:

Data on target system from non-replicated backups:

Other behaviors:

Resolution

Understand the replication environment:

Check the overall health and situation of the grid:

How severe is the capacity differential?

Check that the hardware is the same on both grids:

Check the software version and configuration:

Check stripe balancing:

Check that garbage collection has been running properly on both systems:

Confirm that replication is completing successfully:

Check the replication configuration:

Check the ingestion and data removal rates of both systems:

Dump a list of backups existing on each system:

Check for differing stripe structures (for example, more parity stripes on one system):

Backups still in MC_DELETED:

Additional Information

Cross replication:

Affected Products

Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services