Avamar GSAN or User Capacity Resolution Path

Summary: This Resolution Path article is for handling and troubleshooting GSAN Capacity (aka User Capacity) issues on Avamar.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

For initial concepts and understanding of Operating System (OS) Capacity, see Avamar: Capacity General Training - Resolution Path

It is often easiest to consider GSAN Capacity as the space and utilization for client backups.

As summarized from this training article, a reasonable understanding of the following topics should be required to continue through the rest of this article:
  • Basic understanding of deduplication
  • Basic understanding of checkpoint, checkpoint validation (hfscheck), and Garbage Collection, and the importance of each.
  • The difference between GSAN (or User) Capacity and OS Capacity
  • Change Rate
  • Steady State
 
Impacts of high GSAN Capacity can include:
  • Backup or Replication failure when the grid access state has changed to "admin mode"
    • A client backup job could fail with a message similar to:  "avtar Info <5314>: Command failed (1 error, exit code 10028: Operation failed because target server is full)"
  • The automatic disabling of the backup scheduler (until acknowledged and cleared by someone)
 
When the following thresholds are crossed, an event warning or error is generated in the Avamar Administration UI:
  • 80% - Capacity Warning
  • 95% - Health Check Limit is reached (this can sometimes disable the backup scheduler, at least until manually acknowledged)
  • 100% - Server Read-Only Limit is reached (The grid goes into admin mode)

Cause

In a quick summary: Avamar server and GSAN capacity "deduplicates" backup data, meaning when certain bytes or chunks of data are similar, it is only required to store that chunk once. Any data can be "deduplicated" against any other data from the same or different clients backed up on the Avamar grid. As these chunks of data are small, it can find many duplicates and save a lot of capacity not having to repeatedly back it up.
The Capacity life cycle of client backup data:
  1. Avamar need only save and store the minor changes and differences between each client backup job because of deduplication. As new backups (or incoming replication) run, it can add new data and increase the Avamar capacity or utilization value. 
  2. After a certain amount of time, backups will expire based on their configured retention and expiration and are no found on the Avamar grid available to restore. 
  3. When the Avamar maintenance job called Garbage Collection (GC) runs, it finds all the unique portions or chunks of data that are no longer needed due to these expired backups. GC verifies that no other current backups share that same data (because of deduplication) and then removes or free up those chunks of data no longer required to reduce the Avamar server capacity or utilization.
 

When the amount of daily incoming data added is about the same as the amount of daily data being cleaned, this is called "Steady State". This is the goal of every Avamar grid installed.

Before a new Avamar grid is set up and configured, general preinstallation sizing calculations are made to determine the capacity required to store the backup data. These calculations are based on the customers retention requirements and how much data is to be backed up. It also estimates how much of that data could deduplicate on average and so forth.

However, sometimes the capacity does not reach a steady state. This can be caused by:
  1. Garbage Collection not consistently running
  2. Garbage Collection performance is slow or not running long enough
  3. Deduplication estimates prior to Avamar grid installation were not accurate enough
  4. Data other than what was calculated prior to Avamar grid installation is being backed up to this Avamar server.
  5. Other reasons

Resolution

Validate that each troubleshooting step below is true for the environment:

Note: The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution.
Do not skip any steps.
 
 

Step 1. Data Collection:

Ensure that there are no other non-capacity issues with the Avamar grid. If there are, they could require attention PRIOR to troubleshooting capacity.

This includes hardware errors, data integrity issues (including offline nodes), offline stripes, checkpoint validation failures or failing maintenance jobs. If any of these are an issue, capacity troubleshooting must be stopped and other issues addressed. Once other issues have been resolved, the capacity can be revisited.

A health check should be run (Avamar: How to run the proactive_check.pl health check script on an Avamar Server, but at minimum the status.dpn command can give a quick overview and verification of most of those same issues. See Avamar: How to understand the output generated by the status.dpn command

See the following article for additional information: Avamar: How to Apply the "Avamar troubleshooting hierarchy" Approach Correctly.

If assistance is required to address any noncapacity issues, Create a Service Request with the Dell Technologies Avamar Support team.

 

Step 2. Capacity Information Collection: 

Refer to the following for all the required information required to troubleshoot Avamar Capacity issues: Avamar: How to gather the information to troubleshoot capacity issues

At the very least, the status.dpn command or the values within the Avamar Administration UI show the GSAN capacity.

Note: The capacity shown by the status.dpn command and the UI differ by intended design.
 
 

Step 3. Check if the OS Capacity is full:

The following command help to show the current value of the OS Capacity for each disk partition. If any of the values have reached or exceeded 85%, as in the second sample output, it is considered high OS Capacity:

avmaint nodelist | egrep 'nodetag|fs-percent-full'
 

Sample outputs:

nodetag="0.2"
        fs-percent-full="56.6"
        fs-percent-full="54.7"
        fs-percent-full="54.4"
        fs-percent-full="54.6"
        fs-percent-full="54.7"
        fs-percent-full="54.7"
      nodetag="0.1"
        fs-percent-full="56.2"
        fs-percent-full="54.6"
        fs-percent-full="54.6"
        fs-percent-full="54.8"
        fs-percent-full="54.8"
        fs-percent-full="54.5"
      nodetag="0.0"
        fs-percent-full="56.2"
        fs-percent-full="54.7"
        fs-percent-full="54.8"
        fs-percent-full="54.7"
        fs-percent-full="54.6"
        fs-percent-full="54.6"
 
nodetag="0.2"
        fs-percent-full="94.5"
        fs-percent-full="94.4"
        fs-percent-full="94.2"
        fs-percent-full="94.1"
        fs-percent-full="94.0"
        fs-percent-full="94.0"
      nodetag="0.1"
        fs-percent-full="94.5"
        fs-percent-full="94.3"
        fs-percent-full="94.1"
        fs-percent-full="93.6"
        fs-percent-full="94.0"
        fs-percent-full="93.9"
      nodetag="0.0"
        fs-percent-full="94.4"
        fs-percent-full="94.4"
        fs-percent-full="94.0"
        fs-percent-full="94.1"
        fs-percent-full="92.7"
        fs-percent-full="92.5"
 
Caution: While high OS capacity may not appear to be the biggest concern, this prevents troubleshooting GSAN Capacity because Garbage Collection cannot run if the capacity exceeds 89%. This is discussed in more detail, and troubleshooting steps are provided in: Avamar: Operating System (OS) Capacity (Resolution Path)
 

Only if the OS capacity is below 85% should the GSAN Capacity troubleshooting continue. 

 

Step 4. Non-Capacity issues that can sometimes be misunderstood as Capacity:

It is possible that client backups may fail not for "Capacity" reasons but are instead "Quota" reasons. These can sometimes be misunderstood as Capacity.

This situation can be confirmed by the status.dpn command or some of the other collected output showing lower capacity.

It is also possible that client backups may have failed or not run due other Non-GSAN Capacity reasons. The collected information should confirm this, or can also be seen in the Avamar Administration UI.

If the GSAN capacity is not high, refer to the following articles:
 

If the GSAN Capacity is high, and these other Capacities are also high, troubleshooting can be performed in any order (except for OS Capacity which must always be first).

Note: It is possible that the GSAN Capacity, the Metadata Capacity, and the DD Capacity may be high. In these situations, they can be addressed in any order, unlike OS Capacity which must always be addressed first.
 
 
 

Step 5. Stripe Balance and OS Disk Balance:

"Stripes" on Avamar are the container files that backup data is stored within on the data nodes (except for a single-node Avamar grid).

The expectation is that stripes are "balanced" or evenly distributed across the different disks and nodes within the grid, however, in sometimes they can become unbalanced.

By design on Avamar, the largest node or disk partition is the limiting factor when it comes to Avamar Capacity.

This is intentional so none of the disks or nodes create more stripes than they can handle (or allowed) therefore having balanced stripes is important to Capacity.

For example, when adding additional data nodes for Avamar grid expansion, balancing must be run to evenly distribute stripes to the new nodes to decrease the overall Avamar Capacity percentage.

Note: While a perfect stripe balance is wanted and often seen, issues can arise and "not quite", but close, balancing is seen. The Avamar Engineering team has confirmed that a 4% difference and tolerance between stripe balances is within expected limits.
 
 

Another type of balance requiring understanding is OS Disk balance. This is only limited to data partitions on the same node, not partitions on multiple nodes. 

If on the same data node, one partition is much larger or smaller than another partition of the SAME node, a limit can be exceeded called "freespaceunbalance". While this is generally on the OS and not the GSAN Capacity, it can be reported as a GSAN Capacity issue.

 

Step 6. Check if the Garbage Collection is completing: 

Run the following command to get information about garbage collection:

dumpmaintlogs --types=gc --days=30 | grep "4201\|2402"
 

Ideally, the output will show that GC has completed for the last 30 days:

2025/10/07-12:00:35.24911 {0.1} <4201> completed garbage collection
2025/10/08-12:00:34.61185 {0.1} <4201> completed garbage collection
2025/10/09-12:00:35.14874 {0.1} <4201> completed garbage collection
2025/10/10-12:00:34.67986 {0.1} <4201> completed garbage collection
2025/10/11-12:00:34.73284 {0.1} <4201> completed garbage collection
2025/10/12-12:00:33.23205 {0.1} <4201> completed garbage collection
2025/10/13-12:00:33.41448 {0.1} <4201> completed garbage collection
2025/10/14-12:00:35.70726 {0.1} <4201> completed garbage collection
2025/10/15-12:00:35.08316 {0.1} <4201> completed garbage collection
2025/10/16-12:00:34.82681 {0.1} <4201> completed garbage collection
2025/10/17-12:00:35.29262 {0.1} <4201> completed garbage collection
2025/10/18-12:00:35.24618 {0.1} <4201> completed garbage collection
2025/10/19-12:00:34.56531 {0.1} <4201> completed garbage collection
2025/10/20-19:06:45.15574 {0.1} <4201> completed garbage collection
2025/10/21-12:00:34.21062 {0.1} <4201> completed garbage collection
2025/10/22-12:00:35.29770 {0.1} <4201> completed garbage collection
2025/10/23-12:00:36.13041 {0.1} <4201> completed garbage collection
2025/10/24-12:00:35.52502 {0.1} <4201> completed garbage collection
2025/10/25-12:00:35.93730 {0.1} <4201> completed garbage collection
2025/10/26-12:00:35.55037 {0.1} <4201> completed garbage collection
2025/10/27-12:00:36.12049 {0.1} <4201> completed garbage collection
2025/10/28-12:00:35.75633 {0.1} <4201> completed garbage collection
2025/10/29-12:00:34.85499 {0.1} <4201> completed garbage collection
2025/10/30-12:00:34.96325 {0.2} <4201> completed garbage collection
2025/10/31-12:00:35.39840 {0.0} <4201> completed garbage collection
2025/11/01-12:00:35.11248 {0.0} <4201> completed garbage collection
2025/11/02-13:00:34.39202 {0.0} <4201> completed garbage collection
2025/11/03-13:00:34.70587 {0.0} <4201> completed garbage collection
2025/11/04-13:00:34.18799 {0.0} <4201> completed garbage collection
2025/11/05-13:00:34.44950 {0.0} <4201> completed garbage collection
 

GC failure messages can include, but not limited to, the following:

2025/11/04-13:00:01.62234 {0.1} <4202> failed garbage collection with error MSG_ERR_DDR_ERROR
2025/11/01-12:35:06.62868 {0.2} <4202> failed garbage collection with error MSG_ERR_BACKUPSINPROGRESS
2025/10/13-12:20:07.35498 {0.7} <4202> failed garbage collection with error MSG_ERR_TRYAGAINLATER
2025/10/27-12:07:44.35485 {0.0} <4202> failed garbage collection with error MSG_ERR_DISKFULL
2025/11/02-13:16:39.72027 {0.1} <4202> failed garbage collection with error MSG_ERR_MISC
2025/11/02-13:16:39.72027 {0.1} <4202> failed garbage collection with error MSG_ERR_TIMEOUT
2025/11/02-13:16:39.72027 {0.1} <4202> failed garbage collection with error MSG_ERR_GARBAGECOLLECT
 

If GC has been failing, address this first using the following article as a reference: Avamar: Troubleshooting Garbage Collection (GC) Failures (Resolution Path)
(If any issues have already been resolved, go to the next step.)

 

Step 7. Is GC running long enough?

Warning: Do not confuse this with "MSG_ERR_TIMEOUT" from GC results. That error is something else entirely and can be addressed in the GC Error Resolution Path article. Here, timing out meant as in GC reached its maximum runtime but finishes quietly and cleanly without any error. The information in this step helps confirm if this is occurring.
 
 

a. Run the following command to check the maximum time allowed for GC:

dumpmaintlogs --types=gc --days=30 | grep gcflags 
 

Sample output:

2025/10/07-12:00:20.05509 {0.1} <gcflags gccount="0" gcmincount="0" kill="0" limitadjust="5" maxpass="0" maxtime="14400" refcheck="true" throttlelevel="0" usehistory="false" orphansfirst="false"/>
2025/10/08-12:00:20.09141 {0.1} <gcflags gccount="0" gcmincount="0" kill="0" limitadjust="5" maxpass="0" maxtime="14400" refcheck="true" throttlelevel="0" usehistory="false" orphansfirst="false"/>
2025/10/09-12:00:20.42307 {0.1} <gcflags gccount="0" gcmincount="0" kill="0" limitadjust="5" maxpass="0" maxtime="14400" refcheck="true" throttlelevel="0" usehistory="false" orphansfirst="false"/>
2025/10/10-12:00:20.47775 {0.1} <gcflags gccount="0" gcmincount="0" kill="0" limitadjust="5" maxpass="0" maxtime="14400" refcheck="true" throttlelevel="0" usehistory="false" orphansfirst="false"/>
...
2025/11/02-13:00:19.76100 {0.0} <gcflags gccount="0" gcmincount="0" kill="0" limitadjust="5" maxpass="0" maxtime="14400" refcheck="true" throttlelevel="0" usehistory="false" orphansfirst="false"/>
2025/11/03-13:00:19.92093 {0.0} <gcflags gccount="0" gcmincount="0" kill="0" limitadjust="5" maxpass="0" maxtime="14400" refcheck="true" throttlelevel="0" usehistory="false" orphansfirst="false"/>
2025/11/04-13:00:19.42781 {0.0} <gcflags gccount="0" gcmincount="0" kill="0" limitadjust="5" maxpass="0" maxtime="14400" refcheck="true" throttlelevel="0" usehistory="false" orphansfirst="false"/>
2025/11/05-13:00:19.74984 {0.0} <gcflags gccount="0" gcmincount="0" kill="0" limitadjust="5" maxpass="0" maxtime="14400" refcheck="true" throttlelevel="0" usehistory="false" orphansfirst="false"/>
 

Take note of the maxtime value, which in this example is 14400 (seconds).
(A value of 0 means unlimited)

b. Run the following command to determine how long the GC is running for, and how many "passes" complete:
(Passes are to do with the layers of the stored backup data. Think of the GSAN Capacity like layers of an onion. The outer layers must be peeled back or removed before the inner layers are seen. Each pass is a layer of the GSAN stored data.)

dumpmaintlogs --types=gc --days=30 | grep passes | cut -d ' ' -f1,14-20
 

Sample output:

2025/10/07-12:00:35.24463 passes="24" start-time="1758283220" elapsed-time="250" end-time="1758283235"/>
2025/10/08-12:00:34.60779 passes="3" start-time="1758369620" elapsed-time="70" end-time="1758369627"/>
2025/10/09-12:00:35.14232 megabytes-recovered="1" passes="4" start-time="1758456020" elapsed-time="85" end-time="1758456028"/>
2025/10/10-12:00:34.67590 passes="3" start-time="1758542420" elapsed-time="72" end-time="1758542427"/>
...
2025/11/02-13:00:34.38348 megabytes-recovered="2" passes="18" start-time="1762088419" elapsed-time="89" end-time="1762088427"/>
2025/11/03-13:00:34.69743 passes="18" start-time="1762174819" elapsed-time="9" end-time="1762174828"/>
2025/11/04-13:00:34.17943 megabytes-recovered="8" passes="22" start-time="1762261219" elapsed-time="134" end-time="1762261228"/>
2025/11/05-13:00:34.44187 megabytes-recovered="2" passes="16" start-time="1762347619" elapsed-time="119" end-time="1762347628"/>

 

Take note of the number of passes and the elapsed-time (seconds).

 

c. Assuming that the maxtime is nonzero, calculate 2/3 of maxtime, and compare it to the elapsed time.
(In the example above 2/3 of 14400 is 9600, and all elapsed-time outputs are well below this figure.)

  • If the elapsed-time is less than 2/3 of maxtime, it is likely that GC finished early because there was nothing left to collect and is caught up.

  • If the number of passes is high (14 or more), it is likely that GC is removing sufficient amounts of data.
    Note: Understand that if no data expired and there is nothing to clean, the value of the passes is expected to be low so it is best to understand the entire situation and environment as well. Do not assume that few passes mean that there is a problem.
 

Various issues can cause GC to run slowly, or not scan everything. This can include not having had enough time to run for a significant amount of time or days in the past, incorrect configuration, errors, and more.

If there are concerns about the maxtime, or number of passes, create a Service Request with the Dell Technologies Avamar Support Team to investigate further.

 

Step 8. If it suspected that GC did not remove enough or the expected data:

If confirmed that GC is running long enough, it is possible that data is not being collected for reasons outside of Garbage Collection control. This is a list of the documented reasons that should generally be checked:

a. Verify that backups are configured to at least expire eventually or regularly. If there are not frequent expiring backups, GC will not have much work to do.

b. Use this article to find the "Top Change Rate" Clients: Avamar: How to manage capacity with the capacity.sh script. (Review both the "% OF TOTAL" and "CHGRATE".)

c. Check for skipped hashes per Avamar: Avamar Garbage Collection reports "skipped-hashes" that cannot be cleaned up. If these are occurring but rare, this is normal and this can be skipped.

d. There is a flag or option which forces the Avamar server to keep the last and most recent backup from every client. This is used for safety purposes so that a client does not have every backup accidentally expired. However, this can cause other issues when it comes to data cleanup and Garbage Collection. The Dell Technologies Avamar Support team can confirm if this is enabled.

e. If backups were recently switched from GSAN to DD backend or there was an accidental GSAN backup, but the GSAN Capacity does not decrease, create a Service Request with the Dell Technologies Avamar Support Team to investigate further.

 

Step 9. The Avamar grid is undersized for the amount of current or expected data to be added:

Once all other solutions and possible causes have been reviewed for high capacity, and this is not a configuration issue or issue with accidental data:

This means data may require deletion, or options explored such as migrating certain clients to other Avamar grids, adding new data nodes, and so forth.

 

Step 10. Acknowledge any capacity events and resume the backup scheduler if required:

a. Once capacity issues are addressed, acknowledge all capacity-related events in the Avamar Admin UI.

b. Resume the backup scheduler:

dpnctl start sched
 

For any other Avamar Capacity questions, training, troubleshooting, and more, see: Avamar: Capacity Troubleshooting, Issues, and Questions - All Capacity (Resolution Path)

Additional Information

Manual or "Aggressive" Garbage Collection is not recommended.
(This is a reference to running GC out of the scheduled automatic times.)
  • This action in itself can "mask" and hide the real issues, only making them reappear in a few days or weeks later all over again, causing this manual job to be wasted time.
  • Additionally, the manual GC might not run as efficiently as it is running out of schedule.
 
The resolutions steps above do not mention or recommend changing the maximum disk and capacity settings specific to the GSAN Capacity at all.
  • This change or action is generally not performed and should not be considered by default. An Avamar L2 engineer, or Subject Matter Expert (SME) must approve this change.
  • Unfortunately, such actions can often cause permanent damage to an Avamar grid in various ways that can only be resolved by adding additional storage nodes or redeployment.
 

Understand that neither of the actions listed above are performed because the support team wants to help resolve the Capacity issues in the most beneficial way.

Affected Products

Avamar

Products

Avamar, Avamar Server
Article Properties
Article Number: 000164132
Article Type: Solution
Last Modified: 07 Nov 2025
Version:  10
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.