Aggressive Garbage Collection

Question

Hello guys, one question, how i can to perform an aggresive garbage collection?, this task is for testing purposes only, i have a single node gen 2 in demo mode, and this node report me <4202> failed garbage collection with error MSG_ERR_KILLED, i know, if the operating system capacity on any data partition on any node exceeds the configuration threshold, a garbage collect operation does not run sending <4202> failed garbage collection with error MSG_ERR_KILLED, any idea to resolve this error?

best regards.

rpervan · Accepted Answer

Hello Ferom,

> you have any idea for this issue? Thanks in advanced
Since the data 01 is on 93% usage >> "93% /data01" you need temporary to tune your Avamar to get GC in active state .

1. ) To get information
# avmaint config --ava | grep disk

2.) Change TEMPORARY the "disknogc" to value 93
# avmaint config disknogc=93 --ava

So do not forget to set default value to 85 after GC is finished .... !!!!

3.) Start GC from MCS and monitor it with "status.dpn"

4.) At least maybe you need to balance your SN ... so could you paste us output from the "status.dpn" so I will be able to check SN balancing status .

Maybe you could open SR with EMC tech. if you are not familiar with this ...
regards,

rpervan · Answer

Hello, Just search powerlink for esg98090 . Rej

Avamar Exorcist · Answer

As for why garbage collection failed, it's best to take a look at the logs directly (see esg108593).

The message typically seen when GC fails due to the physical partitions being higher than 85% full is "MSG_ERR_DISKFULL" so it seems you may have a different issue in this case.

Avamar Exorcist · Answer

esg98090 is deleted.

I understand this question is for a test environment but, for the record, if you have an Avamar system which is completely full it can be correct as follows:-

Avamar version 4:

Contact the Support team to have them take care of this for you.

Avamar version 5.x onwards

Increase the blackout window to be as large as possible. No 'rocket science' is required, no other parameter settings need to be changed.

In both cases, to understand how best to approach Avamar capacity management and avoid any need for any kind of ad-hoc procedure to bring down the server utilisation from a too-high level make sure to read the (not quite yet but soon-to-be-published) KB article esg118578.

Ferrom1 · Answer

Hi Nicholas, You are right, below lists the detail

2010/12/15-04:31:12.73633 {0.0} <4201> completed garbage collection

2010/12/15-04:32:09.96915 {0.0} <4200> starting garbage collection

2010/12/15-04:32:10.05426 {0.0}

2010/12/15-04:32:10.05427 {0.0}

2010/12/14-23:04:21.71097 {0.0} <4202> failed garbage collection with error MSG_ERR_KILLED

2011/02/01-19:17:26.17649 {0.0} <4200> starting garbage collection

2011/02/01-19:17:26.17743 {0.0}

2011/02/01-19:17:26.17744 {0.0} 2011/02/01-19:17:52.63220 {0.0} <4202> failed garbage collection with error MSG_ERR_DISKFULL

(0.0) ssh -x admin@10.64.4.78 'df -h'

Filesystem Size Used Avail Use% Mounted on

/dev/sda5 7.9G 3.4G 4.2G 45% /

/dev/sda1 122M 14M 103M 12% /boot

/dev/sda9 321G 296G 26G 93% /data01

/dev/sdb1 344G 286G 58G 84% /data02

/dev/sdc1 344G 286G 58G 84% /data03

/dev/sdd1 344G 286G 59G 84% /data04

none 2.0G 0 2.0G 0% /dev/shm

/dev/sda8 1.5G 106M 1.3G 8% /var

admin@avamar01:/usr/local/avamar/var/cron/#: avmaint nodelist | grep percent-full

percent-full="57.5"

fs-percent-full="92.2"

percent-full="53.7"

fs-percent-full="83.1"

percent-full="53.7"

fs-percent-full="83.2"

percent-full="53.7"

fs-percent-full="83.1"

you have any idea for this issue? Thanks in advanced.

Avamar Exorcist · Answer

If GC fails with the MSG_ERR_DISKFULL the first thing to check is the output of cplist to confirm that checkpoints are validating and to see whether too many checkpoints are being stored.

If all is well with the checkpoints then either the system is running too full and / or the data change rate on the system is too high.

The checkpoint overhead situation will need to be fixed first so that the data partitions become low enough to allow GC to run (without any tuning).

If you are seeing a problem with checkpoint validation open a support ticket immediately.

Here's an extract from esg118578.

Actions to investigate and help alleviate high OS capacity

1. Find out when the last successful HFScheck completed.

To do this, use either the Avamar Adminstrator or the command line on the Avamar utility node

In the Avamar Administrator go to Server > Checkpoint Management tab

Check the most recent date and time listed in the "Checkpoint Validation" column. This should have occurred within the last 24 hours.

Or, using the Avamar utility node command line:

Run the command 'cplist'

Below is an example of the output. The most recent validated checkpoint listed here is the one dated January 14th 11:14. We can identify it by the flag directly after to the 'valid' marker. Depending on the types of HFSchecks set on the system the flag could be 'rol' or 'hfs'. Here we have a 'rol' (rolling HFScheck)

admin@avamar:~/>: cplist

cp.20110114111419 Fri Jan 14 11:14:19 2011 valid rol --- nodes 3/3 stripes 1131

cp.20110114194457 Fri Jan 14 19:44:57 2011 valid --- --- nodes 3/3 stripes 1131

If the results show that the latest validated checkpoint is older than 24 hours we need to find out why.

This could either be because the HFScheck did not run or because it failed.

2. Confirm if HFScheck ran or if it failed

On the Avamar utility node, run 'status.dpn' and find the line which contains "Last hfscheck"

Example:-

Last hfscheck: finished Sat Jan 15 11:07:17 2011 after 06m 41s >> checked 528 of 528 stripes (OK)

Make a note of when it finished and what the status was (in the line above the status is shown as 'OK')

Note: the 'sched.sh' script can also be used to identify when a HFScheck last ran and whether it was successful

If HFScheck jobs have been failing this should be investigated immediately.

If HFScheck has not run lately, confirm whether maintenance tasks are enabled. Via the command line interface on the Avamar utility node enter 'dpnctl status'

admin@avamar:~/>: dpnctl status

Identity added: /home/admin/.ssh/dpnid (/home/admin/.ssh/dpnid)

dpnctl: INFO: gsan status: ready

dpnctl: INFO: MCS status: up.

dpnctl: INFO: EMS status: up.

dpnctl: INFO: Backup scheduler status: up.

dpnctl: INFO: dtlt status: up.

dpnctl: INFO: Maintenance windows scheduler status: enabled.

dpnctl: INFO: Maintenance cron jobs status: enabled.

dpnctl: INFO: Unattended startup status: disabled.

If the maintenance windows scheduler is 'disabled' it can be enabled with the command

dpnctl start maint

Once HFScheck completes successfully and the oldest checkpoint is 'rolled off' the system, OS capacity should reduce considerably.

If, OS capacity is still too high and garbage collection continues to fail with the MSG_ERR_DISKFULL" message, then EMC Support's assistance may be be required.

Else, if OS capacity is low enough to allow garbage collection to run then proceed to work on lowering "User Capacity" and so bring the "server utilzation" figure down

ionthegeek · Answer

The diskreadonly value may not be changed. Changing diskreadonly will put you out of compliance with your license and endanger the health of your system. Under very specific circumstances, support may increase this value temporarily but this is never something that should be done by customers or partners.

Misho4 · Answer

Hi, What will be the effect of increasing the parameter diskreadonly=65 Is it safe to change it to 70? Running Avamar 7 Thanks

pawankumawat · Answer

Hello, diskreadonly=PERCENT Sets the percentage (PERCENT) of full server storage capacity that triggers conversion of the server from a fully writable condition to read-only. The default setting is 65%. Why do you want to change it to 70% ? Regards, Pawan

Misho4 · Answer

We are running out of space (Utilization  85%). I've read somewhere that increasing the diskreadonly value will let you use more of your space, but I wouldn't change it if that will endanger the health of the system.

Misho4 · Answer

Guys, what is the meaning of the number under column #BU when executing capacity.sh? I've got a number 84 most of the time for the last month.

pawankumawat · Answer

Hello, Please go through following document which describes each column of capacity.sh script output. How to use capacity.sh script to analyze daily data changes on the Avamar system Regards, Pawan

Misho4 · Answer

Thanks a lot for the info!BU=backup so logical

Avamar

Aggressive Garbage Collection

How to use capacity.sh script to analyze daily data changes on the Avamar system

Was this post helpful?