Unsolved
This post is more than 5 years old
2 Posts
0
5732
need to backup large number of files.
Hi, We need to take backup of windows server having millions of small size files. Regular backup using Avamar 6.0.1.66 is taking too long and gc process would eventually cancel the backup. Whats the best way to take backup in this case ?
Lucky85
173 Posts
1
August 9th, 2012 23:00
I’m afraid there is no way, if the filesystem structure allows, you can cut it to several pieces But than you are not able to backup all data every day. I think with avamar there is only way increase backup as much as possible.
Generally Windows server with milions of small files is unbackupable thing. The only way how to backup it, I know, is Networker with snapimage module. I would suggest customer to redesign their environment move data to NAS device and use NDMP protocol…
Lukas
Avamar Exorcist
462 Posts
1
August 10th, 2012 03:00
Another option you have is to try to identify the main bottlenecks associated with this particular backup. You may be able to work around certain types of bottlenecks.
The Powerlink KB article esg120355 should be able to help you with this. It also explains the expected performance of an Avamar backup so you can determine whether the number of files in the dataset can realistically be backed up within the available timeframe.
The KB has just been updated and may take a day or two to become visible in Powerlink
February 2016 - **The KB article ID and URL have changed**
ionthegeek
2K Posts
1
August 10th, 2012 06:00
Keep in mind that we can typically "stat" (query) roughly 1 million files per hour. We have to stat every file to check whether or not it has changed.
Further to Nicholas Ricioppo's reply above, if you calculate the bottleneck and determine that an incremental backup of the client would fit in the backup window, you could take an "incremental" approach to the backup. Start with a small amount of data in the dataset, then add more to it each day.
This would let you back up the client in pieces while overcoming the 7 day retention limit for partial backups.
Lucky85 also had a good suggestion -- NDMP would be a better solution in this case. With NDMP bacukps, we don't have to "stat" the files (instead, we trust the NAS device to tell us what files have changed since the last successful backup).
sheilaa2
39 Posts
1
August 14th, 2012 06:00
I'm backing up a large file server with millions of files without issue but it did take some time to get it configured. Server itself is just about 4TB with several SAN LUNS attached. When the system was added into avamar, I started out by doing a system backup during business hours as this does not impact performance too much. Once that was done, I modified the dataset to include one of the LUNS for that night. It did take some time. Each night, I added more LUNS to the dataset until I had everything covered. Like I said, it did take some time but now, it backs up nightly in just under 3 hours! I think that's great performance...
Hope this helps.
aj2546
49 Posts
0
August 14th, 2012 09:00
Can you restore these files if the server crashed? Restoring millions of files with any backup application is a challenge.
sheilaa2
39 Posts
0
August 15th, 2012 11:00
I agree that it will be a challenge but the plan is to do this LUN by LUN. Will take some time.
BSeizer
12 Posts
0
February 23rd, 2016 11:00
I have a similar problem.
The problem is the enormous number of files that have to be checked.
Even if there were no changes to the files since the last backup, the system still has to create a "to-do" list of every file that needs to be checked. 11 million files makes it a big list.
De-dupe is not going to help you in this situation.
My SQL server is a VM guest server.
One drive letter, I: drive has over 11 million files on it.
The SQL backup of the server is not the problem.
The file system backup of the server is the problem.
This is what I did.
Created several datasets that are intended for this particular server.
1st dataset backs up all drive letters and then excludes drive I: This runs every day.
2nd dataset backs up the I:\DOC directory and its sub--directories. This runs M-W-F, and runs for 7 hours.
3rd dataset backs up the I:\IMAGES directory and its sub--directories. This runs T-Th-Sat, and runs for 6 hours.
There is then a seperate Policy group that points to their respective dataset.
The I: drive backups are allowed overtime every day, so they dont get cut off for maintenance.
I also have a separate backup that creates an Image Level Backup through VMWare and takes a complete snapshot of the server including the non-SQL files.
This is great except for backing up the I: drive files quickly, but the ILB does not retain the ACLs that are associated with the folder structure. The SQL admins want the ACLs retained.
I could always do a granular restore and just inherit above, but there is no guarantee that this is the way it was originally.
So, in all, I run 4 backups on this server each night.
1. SQL backup
2. File system backup, excluding the I: drive
3. I: drive of either DOC or IMAGES depending on the day of the week.
4. VMWare ILB backup of the server.
Thats about the best that I could come up with.
I hope that is helpful.
Dani_
15 Posts
0
February 24th, 2016 02:00
Hi UB,
I think the first thing you need to do is check if the local file/hash cache files (size) are up to the job. Do you know if there are lots of file mutations between backups? Normally a file server can be fairly static.
Are the files spread over different disks? If so you could try using different file cache files per disk. Because by default only 1/8 of the memory is reserved for the file cache. You can also increase the maximum size of the file/hash cache. But be careful, best practice is not to use more than 25% of memory for backups.
Then set the backup to run until it’s done. The next day the caches are filled and optimized and the backup should run better.
Regards,
Daniel