I am new to EMC, new to Celerra, and new to NDMP.. trying to keep up.
I am in charge of implementing a backup strategy for roughly 15 TB of files/folders on our Celerra NS960. Currently we have two virtual data movers: server_2 and server_3. Using Netbackup 6.5.4 I am seeing these backup speeds over fiber SAN direct to tape (shared tape drives):
Server_2 102 MB/Sec (362 GB/Hour) with 4 NDMP backup threads (PAX??)
Server_3 96 MB/Sec (338 GB/Hour) with 3 NDMP backup threads
At this combined speed of 700 GB/Hour, we are at 22 hours to do a full backup. All this seems slow to me and was hoping I could get a good read on about how fast I should be seeing and possibly any tips/advice for speeding things up. I've seen alot of configurable parameters for NDMP backups on the celerra: bufsz, threads, etc... I wonder which i should really look into?
Greetings! A good place to start is to take a look at the document, "Configuring NDMP Backups on Celerra". In the middle of the document, there's a section on Tuning Backup Parameters. There's a nice flowchart you can use to try different backup parameters.
You can also consider using VBB instead of PAX (default) for your backups. We experimented with VBB on our NS80, and saw only a slight improvement on our backups (mostly small files on 1TB to 6TB filesystems). Also, we have requirements for our restores, so read the notes about VBB carefully. You should try a backup and see if VBB performance is worth the change.
Please let us know what you find.
Most of the 15TB I have to backup is small files as well, unfortunately.
What kind of performance do you see? Faster/Slower than 700GB/hour
The huge problem for me is it seems that most backup parameters require a reboot of the virtual data mover to test the change. This isn't possible for me as it will take 3 weeks of paper work to get approval to reboot on a maintence night. So I need to gather as much information I can on real-world configurations to make the best move I can.
as usual with backups your performance depends on the file sizes and if there is enough disk performance
you might want to see how your disks are doing - reshuffling so that not too many hitting the same disks at the same time might help if there is a bottleneck there
for lots of small files it's worth to take a look at VBB
also - since you have a NS-960 you could upgrade to DART 6.0 - that will give you more usable data mover memory and allow you to increase the number of NDMP jobs to up to 8
VBB will definitely make a difference for lots of small files
just make sure you understand the single-file restore limitation if you also use deduplication on that fs
I'm using bufsz 256, LTO-4 drives, NBU 6.5.3, PAX and I average 280 to 400 GB/hr, based on the filesystem. I get better performance on filesystems with large CD and DVD ISOs (no, not my movie collection! ).
Yes, unfortunately lots of the tuning process will involve reboots. We're an education institution, so I scheduled a day during Christmas break for testing. If people want better backup performance badly enough, they'll negotiate for it. Otherwise, they didn't care that much about it and they'll let it go...
I'm trying to reply to everyone at once, sorry for confusion:
We are at DART 5.6.44-4 for what that's worth. Our amazing rate of adoption will keep DART 6.0 in the distant future.
Here's a specific question maybe one of you'll be able to answer.
I'm using Netbackup connecting via NDMP to write data directly to tape from the Celerra. I have a volume on the Celerra that I can only specify the path as /root_vdm_1/data4/tc_data/
PAX reports only 1 stream running with that specific policy, which I get a dismal ~20 MB/Sec from. (When all of my Policies are running (hitting different areas and volumes) i get a combined performance of 700 GB/hour.) Is that the way NDMP is supposed to work here?
-My first idea to fix this problem is to break up my backup selection by using regular expressions ..\[A-L] and ..\[M-Z], but alas, NDMP doesn't support that.
Karl, you're making me feel less alone in the world haha, similiar configuration. NBU 6.5.4, PAX, LTO-4 Drives (16 available.. but as i understand it i can only use 4 of them with the celerra?)
Please understand this is my first dive into the world of NDMP and NAS backups. My expertise is elsewhere. So, feel free to correct me when I use bad terminology.
I'd like to show you what I'm seeing from server_pax -stats command. Maybe someone can see where I am going horribly wrong
[admin@emcnas ~$] server_pax server_2 -stats -verbose
SUMMARY PAX STATS ****************
NASS STATS -
nass thid 0 **
Total file processed: 21
throughput: 0 files/sec
Total nass wait nasa count: 0
Total nass wait nasa time: 0 msec
Total time since last reset: 101 sec
fts_build time: 100 sec
getstatpool: 508 buffers putstatpool: 3 buffers
nass01 is not doing backup
nass02 is not doing backup
nass03 is not doing backup
NASA STATS -
nasa thid 0 is running backup with tar format **
Backup root directory: /root_vdm_1/data4/tc_data
Total bytes processed: 1857482007
Total file processed: 21
throughput: 17 MB/sec
average file size: 86378KB
Total nasa wait nass count: 12
Total nasa wait nass time: 36466 msec
Total time since last reset: 101 sec
Tape device name: c16t5l1
dir or 0 size file processed: 10
1 -- 8KB size file processed: 1
8KB+1 -- 16KB size file processed: 1
16KB+1 -- 32KB size file processed: 3
32KB+1 -- 64KB size file processed: 0
64KB+1 -- 1MB size file processed: 4
1MB+1 -- 32MB size file processed: 0
32MB+1 -- 1GB size file processed: 1
1G more size file processed: 1
nasa01 is not doing backup/restore
nasa02 is not doing backup/restore
nasa03 is not doing backup/restore
NASW STATS -
nasw00 BACKUP (in progress)
Nasw Total Time: 00:01:41 (h:min:sec)
Nasw Idle Time: 00:01:23 (h:min:sec)
KB Transferred: 1813896 Block Size: 64512 (63 KB)
Average Transfer Rate: 17 MB/Sec (61 GB/Hour)
__Point-in-Time__ (over the last 10 seconds)
Rate: 16 MB/Sec (57 GB/Hour) Idle: 850 msec/sec
Get Pool: 0 buffers Put Pool: 255 buffers
nasw01 BACKUP (terminated)
nasw02 BACKUP (terminated)
nasw03 BACKUP (terminated)
reading pax stats isn't that easy and usually requires changes and trying different params
I think there was a TechNote on Powerlink about pax stats