This post is more than 5 years old

115 Posts

5361

July 27th, 2012 06:00

Effect on older data when changing retention policy

Hi,

I am just getting involved with the backup so the following question might sound very silly but I would really appreciate if someone can explain it or point me to any documentation which explains it.

Our Avamar system is at 90% capacity and we wanted to get rid of some old data to reduce this capacity consumption (buying additional storage nodes is out of question at this point). We do have a "never expire" retention policy (data from 2009 is still on the grid) on most of the servers that are backed up.

My understanding was that if you change the retention policy to lets say "retain for 1 year" all the older data will be deleted in the next garbage collection cycle, but the backup admin says old data will remain on the grids and the new retention policy will apply to any future backups. Is this true? If yes, is there any other way to reclaim some space?

Thanks,

Sri

2 Intern

 • 

2K Posts

July 27th, 2012 07:00

Changes to the retention policies only apply to backups created after the change.

If you have a small number of backups to modify or delete you can use the Avamar Administrator GUI to make the changes.

If you need to perform a bulk operation on many backups, you can use scripts called "expire-snapups" and "delete-snapups" to change the expiration time of backups or delete backups respectively. The scripts are invoked to generate a deletion / expiration script which can then be run to actually perform the requested operation. If you need assistance running either of these scripts, I would recommend opening an SR with support.

115 Posts

July 27th, 2012 07:00

Just a small followup question.

Lets say I go in and delete all the backups older than 1 year for a specific client, does that mean it will delete the first full backup also?

115 Posts

July 27th, 2012 07:00

Thank you very much Ianderson, just what I was looking for

2 Intern

 • 

2K Posts

July 27th, 2012 08:00

Don’t delete too many backups at the same time. That will wreck havoc on your Avamar.  Avamar runs a special file system, and deleting too much data causes it stop working.

Avamar is sensitive to the change rate on the system (addition of new data or expiry of old data) since we use checkpoints to maintain a known good point in time copy of the data that can be used for a rollback in case something goes wrong. Checkpoints are essentially copy on first write snapshots so the more changes that are made on the system, the larger the checkpoint overhead. If the overhead gets too high, the system starts taking actions to protect itself from running out of disk space.

If you're going to be deleting a lot of backups, turning off "asynchronous crunching" can keep the checkpoint overhead down at the cost of reduced backup performance.

While logged into the utility node as the admin user, you can run the following command to disable crunching:

avmaint config --ava asynccrunching=false

Once the system capacity stabalizes, you'll want to turn crunching back on:

avmaint config --ava asynccrunching=true

49 Posts

July 27th, 2012 08:00

Don’t delete too many backups at the same time. That will wreck havoc on your Avamar. Avamar runs a special file system, and deleting too much data causes it stop working. Don’t ask me why, but the Avamar patent filings have highly technical details on how this works.

Avamar’s disk based backups are not suitable for long term retention. In fact, most deduplication backup solutions are not meant long term retention. They rely on the fact that you will have a low change rate and you will expire as much data as you add on a regular basis.

2 Intern

 • 

2K Posts

July 27th, 2012 08:00

Avamar backups have what I like to call "full / incremental duality" (like light has "particle / wave duality"). Because of the way the server de-duplicates the data, any incremental backup is effectively also a full backup so deletion of any backup will not affect backups that were created after that point. It's not like tape where we care about fulls vs. incrementals.

One thing I should point out, however, is that de-duplication is a double-edged sword. If the total backup size is 500GB but the daily change rate is only 2MB, you will only get back 2MB per backup deleted unless you delete the entire client. Some customers are surprised when they remove hundreds of backups and recover relatively little capacity.

I talked more in-depth about capacity management in an "Ask the Expert" session a few months ago which you may find helpful:

https://community.emc.com/thread/133484

115 Posts

July 27th, 2012 08:00

I really don't know the rationale behind "never expire" retention policy that was setup (done way before I joined here ), noone was able to give me a valid reason but I will definetly keep it in mind

115 Posts

August 1st, 2012 08:00

Hi Ianderson,

I was a bit curious how much data was being cleaned by Garbage Collection process on the Avamar nodes that are getting full and ran the script "capacity.sh" with flag --days=1200 (i gave a number of 1200 just to cover all days since this node was installed) and got the below output (only showing part of it)

Date          New Data #BU       Removed #GC    Net Change

----------  ---------- -----  ---------- -----  ----------

2010-02-10    18510 mb 3            0 mb          18510 mb

2010-02-12    29524 mb 4            0 mb          29524 mb

2010-02-18   128139 mb 6            0 mb         128139 mb

2010-02-19     5749 mb 7            0 mb           5749 mb

2010-02-20      659 mb 7            0 mb            659 mb

2010-02-21       93 mb 7            0 mb             93 mb

2010-02-22     1028 mb 8            0 mb           1028 mb

2010-02-23    24815 mb 8            0 mb          24815 mb

2010-02-24    15179 mb 9            0 mb          15179 mb

2010-02-25    20704 mb 8            0 mb          20704 mb

2010-02-26    19770 mb 9            0 mb          19770 mb

2010-02-27    19705 mb 9            0 mb          19705 mb

2010-02-28    19811 mb 8            0 mb          19811 mb

2010-03-01    22142 mb 8            0 mb          22142 mb

2010-03-02    21057 mb 8            0 mb          21057 mb

2010-03-03    19782 mb 8            0 mb          19782 mb

2010-03-04    20663 mb 8            0 mb          20663 mb

2010-03-05    19632 mb 8            0 mb          19632 mb

2010-03-06    19568 mb 9            0 mb          19568 mb

2010-03-07    18458 mb 8            0 mb          18458 mb

2010-03-08    18784 mb 8            0 mb          18784 mb

Under the "Removed #GC" column its showing 0 mb throughout to the present date.

Does this mean garbage collection is not cleaning any data? or is it because of the "never expire" retention policy that garbage collection is showing 0mb?

I also ran the following command

dumpmaintlogs --days=30 --types=gc > testlog.txt

and it shows the following

2010/02/10-13:05:14.49367 {0.0} <4200> starting scheduled garbage collection

2010/02/10-13:05:14.49456 {0.0}

2010/02/10-13:05:14.49456 {0.0}

2010/02/10-13:05:15.59613 {0.0}

2010/02/10-13:05:15.59614 {0.0}

2010/02/10-13:05:15.59669 {0.0} <4201> completed garbage collection

2010/02/11-13:05:15.30044 {0.0} <4200> starting scheduled garbage collection

2010/02/11-13:05:15.31859 {0.0}

2010/02/11-13:05:15.31859 {0.0}

2010/02/11-13:05:51.05098 {0.0}

2010/02/11-13:05:51.05099 {0.0}

2010/02/11-13:05:51.07469 {0.0} <4201> completed garbage collection

2010/02/12-13:05:23.11289 {0.1} <4200> starting scheduled garbage collection

2010/02/12-13:05:23.11420 {0.1}

2010/02/12-13:05:23.11420 {0.1}

2010/02/12-13:05:59.74454 {0.1}

2010/02/12-13:05:59.74455 {0.1}

2010/02/12-13:05:59.74494 {0.1} <4201> completed garbage collection

2010/02/13-13:05:25.28200 {0.1} <4200> starting scheduled garbage collection

2010/02/13-13:05:25.28332 {0.1}

2010/02/13-13:05:25.28333 {0.1}

2010/02/13-13:07:04.59568 {0.1}

2010/02/13-13:07:04.59569 {0.1}

2010/02/13-13:07:04.59605 {0.1} <4201> completed garbage collection

2010/02/14-13:05:27.29986 {0.1} <4200> starting scheduled garbage collection

2010/02/14-13:05:27.30092 {0.1}

2010/02/14-13:05:27.30093 {0.1}

2010/02/14-13:05:36.96937 {0.1}

2010/02/14-13:05:36.96938 {0.1}

2010/02/14-13:05:36.96972 {0.1} <4201> completed garbage collection

Looking at the above output it seems that garbage collection is infact recovering some data.

So, what am I doing wrong? or is it supposed to show the output that way?

Thanks,

Sri

2 Intern

 • 

2K Posts

August 1st, 2012 08:00

Some versions of Avamar shipped with an older version of the capacity.sh script since it's a support tool and not part of the core distribution. You probably just have to update capacity.sh to the most recent version.

If you're comfortable at the command prompt, it's a pretty easy update. Log into the utility node as the admin user and run the following commands:

cd /usr/local/avamar/bin

mv capacity.sh capacity_sh_`date -I`.old

curl -O ftp://avamar_ftp:anonymous@ftp.avamar.com/software/scripts/capacity.sh

chmod +x capacity.sh

Let me know how it goes!

Edit: Fix link

No Events found!

Top