advice and guidance needed - various operational issues with avamar servers

advice and guidance needed - various operational issues with avamar servers

We have 2x 9+1 node Avamar servers with spare node. Both are high utilisation, primary grid is 90% used, secondary is 95% used.

* primary and secondary grids are showing almost a 5% difference in utilisation - The secondary node is simply a replication target and therefore should be using the same. When we have manually deleted snapups and clients, we have manually deleted them from both grids.

* GC does not complete all stripes, currently completes around 2000/4000 stripes in 4 hours. Does this mean we are leaving more and more redundant data on the system each day? If so what can we do to reverse this trend?

* second checkpoint is failing some days on the primary grid as hfscheck runs into the backup window - this is also causing scheduling issues with large backups. First daily cp does complete successfully and hfscheck does complete. hfscheck does take much longer on the primary grid though - 11am - 2am. Secondary grid hfscheck takes from 11am - 9pm.

* secondary avamar grid is very close to 95% utilisation. We will at some point need to run emergency garbage collection, because GC does not complete all stripes on a daily basis, will this mean we have a large amount of redundant data, or will we need to look at deleting some snapups before we run the emergency GC?

Long term we will be adding some nodes to these grids, but we need a short term solution. I believe 4 hours is the maximum for daily GC, what are our options to rectify the scheduling issues we have? We are currently adding approx 8 GB a day to the grids after GC, average new data is 106503mb, average gc is -98242mb.

Any help would be appreciated. Thanks

Responses(6)

Hormigo

131 Posts

1

February 4th, 2010 04:00

Your Avamar is operating beyond capacity. You have a lot of data to just Avamar.

Regards

Hormigo

You can still do backups? It was to be in Read-Only ...

For this size environment you need a large maintenance window, I could see that by not occurring.

I recommend increasing the Maintenance Window. Blackout windows in 6 hours and Maintenance windows in 10 Hours.

These routines must be completed. Without them your environment is at risk!

Run manually GC, a CHECKPOINT and a HFScheck Full. Let the Backup Window down to finalize these routines.

Then we enter the values again

Hope that helps.

DaveH8888

36 Posts

0

February 4th, 2010 05:00

The gc, checkpoint and hfscheck completes on a daily basis - it just runs into the backup window and therefore the checkpoint which should take place after hfscheck sometimes fails.

When you mention too much data, do you mean too much raw data or too much changed data? Are there any guidlines on how much changed data an avamar system can be expected to handle?

edit: servers are not read-only yet - the secondary grid is at 94.5% utilisation

Message was edited by: bobalob

Hormigo

131 Posts

0

February 4th, 2010 10:00

Avamar with a certain time of use should reach the Steady State.
Steady State is the amount of new data is less than the amount of expired or orphaned.

What harms reach Steay State: the frequency of new data being backed up in the environment, and retention time very long

DaveH8888

36 Posts

0

February 5th, 2010 01:00

These grids have not yet reached the longest retention which is 6 months and will occur in about 1 month. retention on the servers is 6 monthly backups and 30 daily backups for all servers.

M

markaldridge

9 Posts

1

February 8th, 2010 06:00

Do a gc_report to see how affective the garbage collect is. Sounds like you will need a new node though!

DaveH8888

36 Posts

0

February 8th, 2010 07:00

I've now logged an SR with EMC, the short term fix is to remove a high change rate client, we will push this back to our legacy backup system and then long term we will add more nodes in order to back up the other systems.

View All

No Events found!