This post is more than 5 years old

4 Posts

35839

February 1st, 2011 13:00

Aggressive Garbage Collection

Hello guys, one question, how i can to perform an aggresive garbage collection?, this task is for testing purposes only, i have a single node gen 2 in demo mode, and this node report me <4202> failed garbage collection with error MSG_ERR_KILLED, i know, if the operating system capacity on any data partition on any node exceeds the configuration threshold, a garbage collect operation does not run sending <4202> failed garbage collection with error MSG_ERR_KILLED, any idea to resolve this error?

best regards.

266 Posts

February 3rd, 2011 01:00

Hello Ferom,

> you have any idea for this issue? Thanks in advanced
Since the data 01 is on 93% usage >> "93% /data01"  you need temporary to tune your Avamar to get GC in active state .

1. )   To get information
# avmaint config --ava | grep disk          

2.) Change TEMPORARY the "disknogc" to value 93
  # avmaint config disknogc=93  --ava

So do not forget to set default value to 85 after GC is finished .... !!!!

3.) Start GC from MCS and monitor it with "status.dpn"

4.) At least maybe you need to balance your SN ... so could you paste us output from the "status.dpn" so I will be able to check SN balancing status .

Maybe you could open SR with EMC tech. if you are not familiar with this ...
regards,

266 Posts

February 1st, 2011 14:00

Hello,

Just search powerlink for esg98090 .

Rej

February 2nd, 2011 01:00

As for why garbage collection failed, it's best to take a look at the logs directly (see esg108593).

The message typically seen when GC fails due to the physical partitions being higher than 85% full is "MSG_ERR_DISKFULL" so it seems you may have a different issue in this case.

February 2nd, 2011 01:00

esg98090 is deleted.

I understand this question is for a test environment but, for the record, if you have an Avamar system which is completely full it can be correct as follows:-

Avamar version 4:

Contact the Support team to have them take care of this for you.

Avamar version 5.x onwards

Increase the blackout window to be as large as possible.  No 'rocket science' is required, no other parameter settings need to be changed.

In both cases, to understand how best to approach Avamar capacity management and avoid any need for any kind of ad-hoc procedure to bring down the server utilisation from a too-high level make sure to read the (not quite yet but soon-to-be-published) KB article esg118578.

4 Posts

February 2nd, 2011 08:00

Hi Nicholas, You are right, below lists the detail

2010/12/15-04:31:12.73633 {0.0} <4201> completed garbage collection

2010/12/15-04:32:09.96915 {0.0} <4200> starting garbage collection

2010/12/15-04:32:10.05426 {0.0}

2010/12/15-04:32:10.05427 {0.0}

2010/12/14-23:04:21.71097 {0.0} <4202> failed garbage collection with error MSG_ERR_KILLED

2011/02/01-19:17:26.17649 {0.0} <4200> starting garbage collection

2011/02/01-19:17:26.17743 {0.0}

2011/02/01-19:17:26.17744 {0.0} 2011/02/01-19:17:52.63220 {0.0} <4202> failed garbage collection with error MSG_ERR_DISKFULL

(0.0) ssh  -x  admin@10.64.4.78 'df -h'

Filesystem            Size  Used Avail Use% Mounted on

/dev/sda5             7.9G  3.4G  4.2G  45% /

/dev/sda1             122M   14M  103M  12% /boot

/dev/sda9             321G  296G   26G  93% /data01

/dev/sdb1             344G  286G   58G  84% /data02

/dev/sdc1             344G  286G   58G  84% /data03

/dev/sdd1             344G  286G   59G  84% /data04

none                  2.0G     0  2.0G   0% /dev/shm

/dev/sda8             1.5G  106M  1.3G   8% /var

admin@avamar01:/usr/local/avamar/var/cron/#: avmaint nodelist | grep percent-full

        percent-full="57.5"

        fs-percent-full="92.2"

        percent-full="53.7"

        fs-percent-full="83.1"

        percent-full="53.7"

        fs-percent-full="83.2"

        percent-full="53.7"

        fs-percent-full="83.1"

you have any idea for this issue? Thanks in advanced.

February 4th, 2011 00:00

If GC fails with the MSG_ERR_DISKFULL the first thing to check is the output of cplist to confirm that checkpoints are validating and to see whether too many checkpoints are being stored.

If all is well with the checkpoints then either the system is running too full and / or the data change rate on the system is too high.

The checkpoint overhead situation will need to be fixed first so that the data partitions become low enough to allow GC to run (without any tuning).

If you are seeing a problem with checkpoint validation open a support ticket immediately.

Here's an extract from esg118578.

Actions to investigate and help alleviate high OS capacity

1. Find out when the last successful HFScheck completed.

To do this, use either the Avamar Adminstrator or the command line on the Avamar utility node

In the Avamar Administrator go to Server > Checkpoint Management tab

Check the most recent date and time listed in the  "Checkpoint Validation" column.  This should have occurred within the  last 24 hours.

Or, using the Avamar utility node command line:

Run the command 'cplist'

Below is an example of the output.  The most recent  validated checkpoint listed here is the one dated January 14th 11:14.   We can identify it by the flag directly after to the 'valid' marker.   Depending on the types of HFSchecks set on the system the flag could be  'rol' or 'hfs'.  Here we have a 'rol' (rolling HFScheck)

admin@avamar:~/>: cplist

cp.20110114111419 Fri Jan 14 11:14:19 2011   valid rol ---  nodes   3/3 stripes   1131

cp.20110114194457 Fri Jan 14 19:44:57 2011   valid --- ---  nodes   3/3 stripes   1131

If the results show that the latest validated checkpoint is older than 24 hours we need to find out why.

This could either be because the HFScheck did not run or because it failed.

2. Confirm if HFScheck ran or if it failed

On the Avamar utility node, run 'status.dpn' and find the line which contains "Last hfscheck"

Example:-

Last hfscheck: finished Sat Jan 15 11:07:17 2011 after 06m 41s >> checked 528 of 528 stripes (OK)

Make a note of when it finished and what the status was (in the line above the status is shown as 'OK')

Note: the 'sched.sh' script can also be used to identify when a HFScheck last ran and whether it was successful

If HFScheck jobs have been failing this should be investigated immediately.

If HFScheck has not run lately, confirm whether  maintenance tasks are enabled.  Via the command line interface on the  Avamar utility node enter 'dpnctl status'

admin@avamar:~/>: dpnctl status

Identity added: /home/admin/.ssh/dpnid (/home/admin/.ssh/dpnid)

dpnctl: INFO: gsan status: ready

dpnctl: INFO: MCS status: up.

dpnctl: INFO: EMS status: up.

dpnctl: INFO: Backup scheduler status: up.

dpnctl: INFO: dtlt status: up.

dpnctl: INFO: Maintenance windows scheduler status: enabled.

dpnctl: INFO: Maintenance cron jobs status: enabled.

dpnctl: INFO: Unattended startup status: disabled.

If the maintenance windows scheduler is 'disabled' it can be enabled with the command

dpnctl start maint

Once HFScheck completes successfully and the oldest  checkpoint is 'rolled off' the system, OS capacity should reduce  considerably.

If, OS capacity is still too high and garbage  collection continues to fail with the MSG_ERR_DISKFULL" message, then  EMC Support's assistance may be be required.

Else, if OS capacity is low enough to allow garbage  collection to run then proceed to work on lowering "User Capacity" and  so bring the "server utilzation" figure down

2 Intern

 • 

2K Posts

January 2nd, 2014 05:00

The diskreadonly value may not be changed. Changing diskreadonly will put you out of compliance with your license and endanger the health of your system. Under very specific circumstances, support may increase this value temporarily but this is never something that should be done by customers or partners.

4 Posts

January 2nd, 2014 05:00

Hi,

What will be the effect of increasing the parameter diskreadonly=65

Is it safe to change it to 70?

Running Avamar 7

Thanks

355 Posts

January 2nd, 2014 05:00

Hello,

diskreadonly=PERCENT

Sets the percentage (PERCENT) of full server storage capacity that triggers conversion of the server from a fully writable condition to read-only. The default setting is 65%.

Why do you want to change it to 70% ?

Regards,

Pawan

4 Posts

January 2nd, 2014 08:00

We are running out of space (Utilization  85%). I've read somewhere that increasing the diskreadonly value will let you use more of your space, but I wouldn't change it if that will endanger the health of the system.

4 Posts

January 2nd, 2014 22:00

Guys,

what is the meaning of the number under column #BU when executing capacity.sh? I've got a number 84 most of the time for the last month.

355 Posts

January 3rd, 2014 01:00

Hello,

Please go through following document which describes each column of capacity.sh script output.

How to use capacity.sh script to analyze daily data changes on the Avamar system

Regards,

Pawan

4 Posts

January 3rd, 2014 11:00

Thanks a lot for the info!

BU=backup so logical

events found

No Events found!

Top