Our security admins are requesting that we exclude certain file extensions (i.e. malware extensions) from all backups. Is it possible to have global exclusions that affect every dataset, without having to add the same exclusions to each dataset?
Secondly, how much do file extension exclusions affect backup performance? If it's significant, what's the best way to minimize the impact?
In order to exclude certain file extensions, you will have to tell avtar which file extensions. Unfortunately, i believe there are only 2 ways to do that:
1. Add it to every dataset for file system
2. Add it to avtar.cmd for every file system client
In most cases, it would be easier to add it to dataset than avtar.cmd for every file system client.
Regarding the second question, excluding a file will mean that the file isn't backed up and therefore isn't processed by avtar. Effectively this will result in a positive change to performance compared with a backup where the file is not excluded.
Moving beyond malware concerns, if you exclude extensions for many large and frequent changing files this can result in significantly quicker backups.
The following article discusses this in more detail
nr is largely correct but I thought I should clarify that nothing is free.
If you are no longer backing up the excluded files, there will likely be a performance gain from that (potentially a very large performance gain, especially if the files have high change rates). However, there is still an overhead cost associated with checking whether or not each file matches any of the defined exclude patterns. This overhead cost is extremely low on a file-by-file basis but may add up if there are very large numbers of excludes (thousands) or very large numbers of files (millions). This is because each file must be checked against all the items in the exclude list one by one. This may only add, say, a few microseconds to the processing of each file but if you have thousands of excludes defined and a dataset of millions of files, that overhead can add up.
To minimize this overhead, keep the number of excludes defined in the dataset to a reasonable number.
Thanks for the replies.
Ian, that overhead is what I'm concerned about. Some of our datasets have upwards of 8 million files. I have been given five exclusions so far, but I'm likely to be given more in the future. Would you expect much impact to my backups?
Thanks Ian. I assume the previous posters were correct with regard to the only ways to exclude being either per client or per dataset (i.e. no actual global exclusions).
Michael, I've been asked to exclude extensions relevant to the recent Wannacry malware attack:
Thanks brastedd! I was thinking that might be it, but always on the lookout for additional information and techniques/practices. Much appreciated!