I was recently asked to modify several of our snapshot schedules to have an infinite retention, or never expire. My initial concern was how many snapshots the cluster can handle and it looks like its 1000 per directory and 20,000 total at my code level 220.127.116.11.
After reading up on what little I could find it seems that the job for deleting expired snapshots can be affected as the number of snapshots increases. So my concern is what number is safe for the cluster? 1000 per directory just seems a bit ambiguous for me. Also are there any other concerns aside from the delete job and gui running slow? That is about as detailed an explanation I have been able to find.
With infinite retention, the snaps delete job will never run. Problem solved...
Of course 1000 is somewhat ambiguous, and ambitious at the same time.
It's safe to say that with 1001 snaps you are not falling off the cliff,
and that the one thousand range is enough to cover most real-world demands.
In practice, one often keeps, say, weekly snapshots for a longer time than daily or hourly snapshots. This retains all files that have existed for at least 7 days, while short-lived files get wiped when the daily snapshots are deleted after a few weeks or so.
Is this something you could "sell" to your users/bosses/customers?
Note that technically this breaks the "rule" of deleting snapshots back to front in chronological order, which yields best performance of the snaps delete job. But it's usually acceptable with modern OneFS, where things have been improved for out-of-order deletions. Even more so if you have SSD acceleration for the metadata.
depending on how long you have to keep the snapshots and how many that are it may make more sense to keep a physical copy.
also if they want to keep the snapshots "forever" you should argu to do backups.to get the data off the Cluster. That's just not for what snapshots are designed for.
with increasing quantity of snapshots you have a more complex snapshot relationship. the more snapshots, the more complex the process of freeing unreferenced blocks.
also keep in mind the snapshots you may get from syncIQ if in use.
i wanted to modify my snapshotdelete Job to run off_hours but EMC had some serious concerns regarding this, because we would have around 150 snapshots when SnapshotDelete starts which would raise the needed cpu ressources in comparison to a regulary running SnapshotDelete
Thanks for the replies fellas.
I understand the head scratching on this one, I did the same thing at first.
The reasoning is that we have some customers that deleted stuff they shouldn't have and we do not do any long term backups of the Isilon. The goal is to implement infinite snapshots until we upgrade to 8 and then apply enterprise smart lock to these shares.
Fortunately it is only 4 shares and only for around 6-8 months. We only kick off a snapshot once a day so this will bump up the current number of snapshots to about 700-1000 on top of our current 600 total.
Right now based on what I understand the only issue I'm likely to run into would be long snapshot delete jobs after we put these back to 30 days. What do you guys think?
i have 13 nodes x400 cluster (no SSDs). Snapshots are taken every day (once a day) with 35 day expiration so my total snapshot count today is around 2k snapshots. I don't see any issues with snapshotdelete job, completes within 2 hours (daily).
Thanks Dynamox. Though I'm guessing that once I expire these and set the schedule back to 30 days I'm going to be deleting a few more than you usually do at around 700 snapshots! I really don't think that will be a problem because its a finite amount of snapshots it has to delete and even if it runs long it will eventually catch up.
Well, I feel better about now, I think reading about the old limit had me overly concerned about this. Still not sure if this is the right way to solve this problem But I digress, at least it doesn't sound like the cluster will kill over.