This post is more than 5 years old
4 Posts
1
35839
February 1st, 2011 13:00
Aggressive Garbage Collection
Hello guys, one question, how i can to perform an aggresive garbage collection?, this task is for testing purposes only, i have a single node gen 2 in demo mode, and this node report me <4202> failed garbage collection with error MSG_ERR_KILLED, i know, if the operating system capacity on any data partition on any node exceeds the configuration threshold, a garbage collect operation does not run sending <4202> failed garbage collection with error MSG_ERR_KILLED, any idea to resolve this error?
best regards.
events found
No Events found!


rpervan
266 Posts
0
February 3rd, 2011 01:00
Hello Ferom,
> you have any idea for this issue? Thanks in advanced
Since the data 01 is on 93% usage >> "93% /data01" you need temporary to tune your Avamar to get GC in active state .
1. ) To get information
# avmaint config --ava | grep disk
2.) Change TEMPORARY the "disknogc" to value 93
# avmaint config disknogc=93 --ava
So do not forget to set default value to 85 after GC is finished .... !!!!
3.) Start GC from MCS and monitor it with "status.dpn"
4.) At least maybe you need to balance your SN ... so could you paste us output from the "status.dpn" so I will be able to check SN balancing status .
Maybe you could open SR with EMC tech. if you are not familiar with this ...
regards,
rpervan
266 Posts
0
February 1st, 2011 14:00
Hello,
Just search powerlink for esg98090 .
Rej
Avamar Exorcist
462 Posts
1
February 2nd, 2011 01:00
As for why garbage collection failed, it's best to take a look at the logs directly (see esg108593).
The message typically seen when GC fails due to the physical partitions being higher than 85% full is "MSG_ERR_DISKFULL" so it seems you may have a different issue in this case.
Avamar Exorcist
462 Posts
0
February 2nd, 2011 01:00
esg98090 is deleted.
I understand this question is for a test environment but, for the record, if you have an Avamar system which is completely full it can be correct as follows:-
Avamar version 4:
Contact the Support team to have them take care of this for you.
Avamar version 5.x onwards
Increase the blackout window to be as large as possible. No 'rocket science' is required, no other parameter settings need to be changed.
In both cases, to understand how best to approach Avamar capacity management and avoid any need for any kind of ad-hoc procedure to bring down the server utilisation from a too-high level make sure to read the (not quite yet but soon-to-be-published) KB article esg118578.
Ferrom1
4 Posts
0
February 2nd, 2011 08:00
Hi Nicholas, You are right, below lists the detail
2010/12/15-04:31:12.73633 {0.0} <4201> completed garbage collection
2010/12/15-04:32:09.96915 {0.0} <4200> starting garbage collection
2010/12/15-04:32:10.05426 {0.0}
2010/12/15-04:32:10.05427 {0.0}
2010/12/14-23:04:21.71097 {0.0} <4202> failed garbage collection with error MSG_ERR_KILLED
2011/02/01-19:17:26.17649 {0.0} <4200> starting garbage collection
2011/02/01-19:17:26.17743 {0.0}
2011/02/01-19:17:26.17744 {0.0} 2011/02/01-19:17:52.63220 {0.0} <4202> failed garbage collection with error MSG_ERR_DISKFULL
(0.0) ssh -x admin@10.64.4.78 'df -h'
Filesystem Size Used Avail Use% Mounted on
/dev/sda5 7.9G 3.4G 4.2G 45% /
/dev/sda1 122M 14M 103M 12% /boot
/dev/sda9 321G 296G 26G 93% /data01
/dev/sdb1 344G 286G 58G 84% /data02
/dev/sdc1 344G 286G 58G 84% /data03
/dev/sdd1 344G 286G 59G 84% /data04
none 2.0G 0 2.0G 0% /dev/shm
/dev/sda8 1.5G 106M 1.3G 8% /var
admin@avamar01:/usr/local/avamar/var/cron/#: avmaint nodelist | grep percent-full
percent-full="57.5"
fs-percent-full="92.2"
percent-full="53.7"
fs-percent-full="83.1"
percent-full="53.7"
fs-percent-full="83.2"
percent-full="53.7"
fs-percent-full="83.1"
you have any idea for this issue? Thanks in advanced.
Avamar Exorcist
462 Posts
2
February 4th, 2011 00:00
If GC fails with the MSG_ERR_DISKFULL the first thing to check is the output of cplist to confirm that checkpoints are validating and to see whether too many checkpoints are being stored.
If all is well with the checkpoints then either the system is running too full and / or the data change rate on the system is too high.
The checkpoint overhead situation will need to be fixed first so that the data partitions become low enough to allow GC to run (without any tuning).
If you are seeing a problem with checkpoint validation open a support ticket immediately.
Here's an extract from esg118578.
Actions to investigate and help alleviate high OS capacity
1. Find out when the last successful HFScheck completed.
To do this, use either the Avamar Adminstrator or the command line on the Avamar utility node
In the Avamar Administrator go to Server > Checkpoint Management tab
Check the most recent date and time listed in the "Checkpoint Validation" column. This should have occurred within the last 24 hours.
Or, using the Avamar utility node command line:
Run the command 'cplist'
Below is an example of the output. The most recent validated checkpoint listed here is the one dated January 14th 11:14. We can identify it by the flag directly after to the 'valid' marker. Depending on the types of HFSchecks set on the system the flag could be 'rol' or 'hfs'. Here we have a 'rol' (rolling HFScheck)
admin@avamar:~/>: cplist
cp.20110114111419 Fri Jan 14 11:14:19 2011 valid rol --- nodes 3/3 stripes 1131
cp.20110114194457 Fri Jan 14 19:44:57 2011 valid --- --- nodes 3/3 stripes 1131
If the results show that the latest validated checkpoint is older than 24 hours we need to find out why.
This could either be because the HFScheck did not run or because it failed.
2. Confirm if HFScheck ran or if it failed
On the Avamar utility node, run 'status.dpn' and find the line which contains "Last hfscheck"
Example:-
Last hfscheck: finished Sat Jan 15 11:07:17 2011 after 06m 41s >> checked 528 of 528 stripes (OK)
Make a note of when it finished and what the status was (in the line above the status is shown as 'OK')
Note: the 'sched.sh' script can also be used to identify when a HFScheck last ran and whether it was successful
If HFScheck jobs have been failing this should be investigated immediately.
If HFScheck has not run lately, confirm whether maintenance tasks are enabled. Via the command line interface on the Avamar utility node enter 'dpnctl status'
admin@avamar:~/>: dpnctl status
Identity added: /home/admin/.ssh/dpnid (/home/admin/.ssh/dpnid)
dpnctl: INFO: gsan status: ready
dpnctl: INFO: MCS status: up.
dpnctl: INFO: EMS status: up.
dpnctl: INFO: Backup scheduler status: up.
dpnctl: INFO: dtlt status: up.
dpnctl: INFO: Maintenance windows scheduler status: enabled.
dpnctl: INFO: Maintenance cron jobs status: enabled.
dpnctl: INFO: Unattended startup status: disabled.
If the maintenance windows scheduler is 'disabled' it can be enabled with the command
dpnctl start maint
Once HFScheck completes successfully and the oldest checkpoint is 'rolled off' the system, OS capacity should reduce considerably.
If, OS capacity is still too high and garbage collection continues to fail with the MSG_ERR_DISKFULL" message, then EMC Support's assistance may be be required.
Else, if OS capacity is low enough to allow garbage collection to run then proceed to work on lowering "User Capacity" and so bring the "server utilzation" figure down
ionthegeek
2 Intern
•
2K Posts
1
January 2nd, 2014 05:00
The diskreadonly value may not be changed. Changing diskreadonly will put you out of compliance with your license and endanger the health of your system. Under very specific circumstances, support may increase this value temporarily but this is never something that should be done by customers or partners.
Misho4
4 Posts
0
January 2nd, 2014 05:00
Hi,
What will be the effect of increasing the parameter diskreadonly=65
Is it safe to change it to 70?
Running Avamar 7
Thanks
pawankumawat
355 Posts
0
January 2nd, 2014 05:00
Hello,
diskreadonly=PERCENT
Sets the percentage (PERCENT) of full server storage capacity that triggers conversion of the server from a fully writable condition to read-only. The default setting is 65%.
Why do you want to change it to 70% ?
Regards,
Pawan
Misho4
4 Posts
0
January 2nd, 2014 08:00
We are running out of space (Utilization 85%). I've read somewhere that increasing the diskreadonly value will let you use more of your space, but I wouldn't change it if that will endanger the health of the system.
Misho4
4 Posts
0
January 2nd, 2014 22:00
Guys,
what is the meaning of the number under column #BU when executing capacity.sh? I've got a number 84 most of the time for the last month.
pawankumawat
355 Posts
1
January 3rd, 2014 01:00
Hello,
Please go through following document which describes each column of capacity.sh script output.
How to use capacity.sh script to analyze daily data changes on the Avamar system
Regards,
Pawan
Misho4
4 Posts
0
January 3rd, 2014 11:00
Thanks a lot for the info!
BU=backup
so logical