Avamar: How to Set the Avamar Server into a Known Controlled State

Summary: This article explains how to set an Avamar server into a Known Controlled State.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

There are many automated tasks on an Avamar server which can affect troubleshooting and resolution efforts.

This procedure documents how to put Avamar into what is called a "Known Controlled State" to prevent unexpected or undesirable activity from occurring.

The checklist below should be followed when performing manual and advanced maintenance tasks.

For example:
  • Rebuilding stripes
  • Rebuilding nodes
  • Restarting offline nodes
 

This is a checklist and is not a guide on how to perform the operations or interpret the results.

This is NOT a health check solution. It assumes that the Avamar server is unhealthy.

Cause

Manual or advanced maintenance tasks must be run on an Avamar server.

Resolution

Caution: If used incorrectly, some of the commands referenced below can cause data integrity issues or data loss. If a command or the consequences of running it are not understood, seek assistance from Dell Support or your local Dell Partner representative.    
 

Prerequisites:

 
  • Some commands may not give feedback or take immediate effect. It should not be necessary to run a command more than once.
 
  • For any "avmaint config --ava" commands, always confirm that the change has taken effect by running the command:
avmaint config --ava | grep <setting>
 
  • Manual maintenance should be avoided. Allow the maintenance scheduler to manage the maintenance tasks whenever possible.
 
  • The following questions should be answered by using these commands to understand the general status of the Avamar server BEFORE working on any grid:

a. How old is the last validated checkpoint?

cplist
 

b. Are all the stripes online? Are all the nodes online?

status.dpn
 

c. Are all services up?

dpnctl status
 

Procedure:

1. Suspend the Checkpoint and Checkpoint validation (HFScheck) maintenance activities to ensure that they do not start again if they have to be stopped (in the next step): 

avmaint sched suspend cp --ava
avmaint sched suspend hfscheck --ava
Note: Do not suspend garbage collection (GC) as it may result in an unnecessary rollback.
 
 

2. Run the "status.dpn" command and check for running maintenance activities. If there are no maintenance activities running, continue from step 3.

  • If a checkpoint is running, let it complete.
 
  • If HFScheck is running, it can be stopped if the server is not in admin mode (the waitcgsan phase), or the HFScheck has almost completed:
avmaint hfscheckstop --ava
 
  •  If a GC is running, it can be killed:
avmaint gckill --ava

This may take some time to complete as the current pass must finish.

3. When no maintenance activities are running, stop the maintenance scheduler:

dpnctl stop maint
 

4. Stop the backup scheduler, stop running backups or restores, and suspend new connections:

a. Prevent the Management Console Server (MCS) from starting any new backups:

dpnctl stop sched
 

b. List all the running sessions (backup and replication):

avmaint sessions --ava | grep sessionid
  • If a restore other than replication is running, consult with the customer to determine if the restore should be allowed to complete.
  • If backups or replication are running, consult with the customer to verify that they can be cancelled.
 

c. Once approval is received that the backups can be killed, cancel them using the UI (partial backups will be created) or uncleanly using the kill command:

avmaint kill --waittime=0 <sessionid>
 

d. List any additional backups not listed by the previous command:

avmaint sessions --full
 

e. Depending on the operation to be performed, it may not be acceptable for any backup type operations to run (that is, avmaint getrefby). Therefore, suspend the dispatchers to not allow any manual backups:

avmaint suspend
 

5. Stop Replication (replication source or target).

  • For replication source (restore), use the UI or kill the process to stop replication.
  • For replication target (backup), verify if replication is running: 
avmaint sessions | grep path
 

Look for the /REPLICATE domain in the path. Stop replication on the source grid using the methods in step 4a.

6. Disable crunching with the command:

avmaint config --ava asynccrunching=false

It may take up to 15 minutes for crunching to stop, check the GSAN logs from the data nodes for any "crunch" messages.

7. Disable balancing if enabled:

a. Check if balancing is running:

avmaint config --ava | grep balancemin

Usually, balancing should not be enabled (any value other than 0), but it is still important to check.

b. Disable balancing if anything other than 0:

avmaint config --ava balancemin=0

It may take up to 15 minutes for balancing to stop. Run status.dpn commands to see if any stripes are migrating to confirm balancing has finished.

8. Check the capacity:

a. Check the OS capacity: 

avmaint nodelist --ava | grep fs-perc
 

b. Check the size of the checkpoints:

mapall --noerror './cps'
 

If "cps" does not exist on the data nodes, run the following on the utility node with keys loaded:   

cp /usr/local/avamar/bin/cps .
mapall copy ~/cps
mapall --noerror './cps
 

c. Check the checkpoint listing to determine which checkpoints MUST be kept (the last validated checkpoint on all nodes):

cplist
 

Depending on the above results, determine how many checkpoints can be kept, which ones are critical and how many new checkpoints can be created.

For information about capacity, see article Avamar: Capacity Management Concepts and Training

9. Make yourself aware of the order of the nodes and the differences between logical node numbers and physical node numbers:

nodenumbers
status.dpn
mapall --noerror 'tail -2 /data01/cur/gsan.log'
 

10. Review the hardware.

 

All automated tasks should now be stopped. 

The server should have little activity, and it should be safe to proceed with any manual tasks or commands. 

Check the server by checking the GSAN logs across the data nodes. Verify that there is little or no activity being logged.

The uptime command can also be run using mapall, to check that the "load average" across the data nodes is low (between 0.01 and 0.05)

For example:

mapall --noerror 'uptime' 
(0.0) ssh  -x  admin@10.xx.xx.xxx 'uptime'  
  16:39:29 up 100 days,  6:39,  0 users,  load average: 0.01, 0.02, 0.01 
(0.1) ssh  -x  admin@10.xx.xx.xxx 'uptime'  
  16:39:29 up 100 days,  6:39,  0 users,  load average: 0.02, 0.01, 0.01 
(0.2) ssh  -x  admin@10.xx.xx.xxx 'uptime'  
  16:39:29 up 100 days,  6:39,  0 users,  load average: 0.02, 0.01, 0.01
 
 
Note: The steps above set the server to a nonproduction state. Always revert the changes once all the manual tasks have been completed.

Affected Products

Avamar, Avamar Server
Article Properties
Article Number: 000170876
Article Type: Solution
Last Modified: 06 Aug 2025
Version:  15
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.