Avamar capacity utilization 95%

Question

Hello, we have Avamar virtual edition with 0.5TB capacity, the storeage status is 95% full I want to delete some old backups, I ran GC and it redused from 98% to 95%. also I recive this issue: The server has reached the capacity health check limit. Backup scheduling and dispatching have been suspended. You must acknowledge this event for backup scheduling and dispatching to resume. Please advice. Regards, Ahmed Salah

ionthegeek · Answer

I would recommend reviewing the Ask the Expert session on Avamar Server Management Best Practices. I talked quite a bit about capacity management in the session and provided a list of tools that you may find useful for determining how your capacity is being used.

If you still need assistance after reviewing the thread, I would recommend opening a service request with support.

When the system reaches the health check limit, it suspends the backup scheduler. You must acknowledge this event in the Administrator Server GUI before you will be able to resume the backup scheduler.

aj2546 · Answer

Don’t delete too many backups at once. Your system will become unusable. Best thing to do is to open a ticket with EMC support.

SnapBigData · Answer

I would agree with our expert dude on reviewing his link. You may need to lower the retention policy and see which client(s) is/are causing the 45 degree growth on your box. . From there have a talk with PO of these servers to understand the data type that your box is backing up . I would also review the backup logs for hints . Or simply run exclusion list for none essential volumes. the bottom line you need to get your box into Steady State.

On Jalal comments , I really don't think deleting many or all backup sets will cause the grid to panic . So If push comes to shove then I would start deleting old backup sets / increase GC win from 3 to 5 . Meantime, I would plan ahead for another box that sized right for your business needs. From there you can run R2R migration or simply migrate your clients via EM to the new box.

Alright Saleh, keep us in the loop

ahmed.aboutabl · Answer

Hello All,

Thanks for the reply's, I just want to know how to delete the old backups, I changed the retention for the existing polices but the old backups is still in the grid storage, and I want to remove them also we will manage to expand our storage soon but until than we have to work on the existing one.

please advice, is there a command or a way to list the saved backups and to delete them.

Thanks.

ionthegeek · Answer

It is important not to throw too much change at an Avamar at once (e.g. by deleting terrabytes of backups at once) since it can cause a spike in checkpoint overhead but the worst that will generally happen is that you end up on the phone with support to run a manual hfscheck and get checkpoint overhead back under control.

Changes to retention policy only apply to backups created after the change. If you want to manually change the retention of older backups or delete older backups you can do so through the GUI or by using the expire-snapups script.

The general syntax is:

expire-snapups --before='2012-08-01' --days=15 --domain=/ > do-expire.sh

This will generate a script called 'do-expire.sh' that will set the expiration for every backup created before August 1 to 15 days.

There are some additional options you can specify on the command line to control which clients and backups are selected. Details are in the help output (expire-snapups --help).

Once the do-expire script has been generated, review it to make sure:

It is operating on the correct clients
It is selecting the correct backups
It is setting the correct expiration date

This step is really important since an expired backup can't be "unexpired". Once you've confirmed the script is correct, you can mark it executable and run it:

chmod +x do-expire.sh

./do-expire.sh > do-expire.log 2>&1 &

You can tail the log to monitor progress with tail -f do-expire.log.

A few caveats to be aware of:

As I mentioned above, expiration is permanent. Be careful.
The system must be writeable for the expiration changes to succeed. That means the system can't be running garbage collection or checkpoints at the time you run the script. If you're operating on a lot of backups, the script may run into the blackout window in which case you will have to interrupt the script, remove the commands that completed successfully and run the updated script later.

If you have any doubts about using expire-snapups, I would urge you to work with support. There is the potential for catastrophic data loss if you make an error.

SnapBigData · Answer

This is the message that you would get if you expired a huge chunk of pointers which cause overhead on CP.

Last GC: finished Sat Jul 7 10:47:31 2011 after 04h 32m >> recovered 600 GB (MSG_ERR_DISKFULL)

I have gone through this drill before and you can either run another cp from the GUI or CLI to clear the overhead CP. If you own the data, and you are not worry about RTO, then I would gradually expire/delete old data set from the backup management or as advised by our almighty Anderson . Review Admin guide for more help.

Again, you need to RCA on the spike of your AVE box and zoom down on the biggest talker of your nodes.

I will echo the same statement of my buddy Anderson.

If you have any doubts, I would urge you to work with support. There is the potential for catastrophic data loss if you make an error.

Good luck Saleh

J_H_ · Answer

you also have to consider that this is de-dup data, expiring backups might not get you much space unless you expire all backups for one server.  as the backup that happened on Monday may be little changed from Friday, and as you expire Monday all the files remain on disk because Friday is still pointing to them.

Avamar

Was this post helpful?