Artyom

9 Posts

3466

September 14th, 2010 09:00

Celerra NS960 NDMP Backup Questions

Hello,

I am new to EMC, new to Celerra, and new to NDMP.. trying to keep up.

I am in charge of implementing a backup strategy for roughly 15 TB of files/folders on our Celerra NS960. Currently we have two virtual data movers: server_2 and server_3. Using Netbackup 6.5.4 I am seeing these backup speeds over fiber SAN direct to tape (shared tape drives):

Server_2 102 MB/Sec (362 GB/Hour) with 4 NDMP backup threads (PAX??)

Server_3 96 MB/Sec (338 GB/Hour) with 3 NDMP backup threads

At this combined speed of 700 GB/Hour, we are at 22 hours to do a full backup. All this seems slow to me and was hoping I could get a good read on about how fast I should be seeing and possibly any tips/advice for speeding things up. I've seen alot of configurable parameters for NDMP backups on the celerra: bufsz, threads, etc... I wonder which i should really look into?

Thanks!

Responses(17)

R

Rainer_EMC

8.6K Posts

0

September 14th, 2010 13:00

Hi,

as usual with backups your performance depends on the file sizes and if there is enough disk performance

you might want to see how your disks are doing - reshuffling so that not too many hitting the same disks at the same time might help if there is a bottleneck there

for lots of small files it's worth to take a look at VBB

also - since you have a NS-960 you could upgrade to DART 6.0 - that will give you more usable data mover memory and allow you to increase the number of NDMP jobs to up to 8

Rainer

umichklewis_ac7b91

300 Posts

0

September 14th, 2010 13:00

Greetings! A good place to start is to take a look at the document, "Configuring NDMP Backups on Celerra". In the middle of the document, there's a section on Tuning Backup Parameters. There's a nice flowchart you can use to try different backup parameters.

You can also consider using VBB instead of PAX (default) for your backups. We experimented with VBB on our NS80, and saw only a slight improvement on our backups (mostly small files on 1TB to 6TB filesystems). Also, we have requirements for our restores, so read the notes about VBB carefully. You should try a backup and see if VBB performance is worth the change.

Please let us know what you find.

Thanks!

Karl

R

Rainer_EMC

8.6K Posts

0

September 14th, 2010 13:00

VBB will definitely make a difference for lots of small files

just make sure you understand the single-file restore limitation if you also use deduplication on that fs

Rainer

Artyom

9 Posts

0

September 14th, 2010 13:00

I'd like to show you what I'm seeing from server_pax -stats command. Maybe someone can see where I am going horribly wrong

[admin@emcnas ~$] server_pax server_2 -stats -verbose

server_2 :

SUMMARY PAX STATS ****************

NASS STATS -

nass thid 0 **

Total file processed: 21

throughput: 0 files/sec

Total nass wait nasa count: 0

Total nass wait nasa time: 0 msec

Total time since last reset: 101 sec

fts_build time: 100 sec

getstatpool: 508 buffers putstatpool: 3 buffers

nass01 is not doing backup

nass02 is not doing backup

nass03 is not doing backup

-

NASA STATS -

nasa thid 0 is running backup with tar format **

Backup root directory: /root_vdm_1/data4/tc_data

Total bytes processed: 1857482007

Total file processed: 21

throughput: 17 MB/sec

average file size: 86378KB

Total nasa wait nass count: 12

Total nasa wait nass time: 36466 msec

Total time since last reset: 101 sec

Tape device name: c16t5l1

dir or 0 size file processed: 10

1 -- 8KB size file processed: 1

8KB+1 -- 16KB size file processed: 1

16KB+1 -- 32KB size file processed: 3

32KB+1 -- 64KB size file processed: 0

64KB+1 -- 1MB size file processed: 4

1MB+1 -- 32MB size file processed: 0

32MB+1 -- 1GB size file processed: 1

1G more size file processed: 1

nasa01 is not doing backup/restore

nasa02 is not doing backup/restore

nasa03 is not doing backup/restore

-

NASW STATS -

nasw00 BACKUP (in progress)

Nasw Total Time: 00:01:41 (h:min:sec)

Nasw Idle Time: 00:01:23 (h:min:sec)

KB Transferred: 1813896 Block Size: 64512 (63 KB)

Average Transfer Rate: 17 MB/Sec (61 GB/Hour)

__Point-in-Time__ (over the last 10 seconds)

Rate: 16 MB/Sec (57 GB/Hour) Idle: 850 msec/sec

Get Pool: 0 buffers Put Pool: 255 buffers

nasw01 BACKUP (terminated)

nasw02 BACKUP (terminated)

nasw03 BACKUP (terminated)

umichklewis_ac7b91

300 Posts

0

September 14th, 2010 13:00

I'm using bufsz 256, LTO-4 drives, NBU 6.5.3, PAX and I average 280 to 400 GB/hr, based on the filesystem. I get better performance on filesystems with large CD and DVD ISOs (no, not my movie collection! ).

Yes, unfortunately lots of the tuning process will involve reboots. We're an education institution, so I scheduled a day during Christmas break for testing. If people want better backup performance badly enough, they'll negotiate for it. Otherwise, they didn't care that much about it and they'll let it go...

Thanks!

Karl

dynamox

2 Intern

•

20.4K Posts

0

September 14th, 2010 13:00

i have IBM 3592 tape drives (CDL emulation) presented so i set bufsz to 384 and get about 420GB/hour for 4 concurrent streams.

Artyom

9 Posts

0

September 14th, 2010 13:00

Most of the 15TB I have to backup is small files as well, unfortunately.

What kind of performance do you see? Faster/Slower than 700GB/hour

The huge problem for me is it seems that most backup parameters require a reboot of the virtual data mover to test the change. This isn't possible for me as it will take 3 weeks of paper work to get approval to reboot on a maintence night. So I need to gather as much information I can on real-world configurations to make the best move I can.

Artyom

9 Posts

0

September 14th, 2010 13:00

I'm trying to reply to everyone at once, sorry for confusion:

We are at DART 5.6.44-4 for what that's worth. Our amazing rate of adoption will keep DART 6.0 in the distant future.

Here's a specific question maybe one of you'll be able to answer.

I'm using Netbackup connecting via NDMP to write data directly to tape from the Celerra. I have a volume on the Celerra that I can only specify the path as /root_vdm_1/data4/tc_data/

PAX reports only 1 stream running with that specific policy, which I get a dismal ~20 MB/Sec from. (When all of my Policies are running (hitting different areas and volumes) i get a combined performance of 700 GB/hour.) Is that the way NDMP is supposed to work here?

-My first idea to fix this problem is to break up my backup selection by using regular expressions ..\[A-L] and ..\[M-Z], but alas, NDMP doesn't support that.

Karl, you're making me feel less alone in the world haha, similiar configuration. NBU 6.5.4, PAX, LTO-4 Drives (16 available.. but as i understand it i can only use 4 of them with the celerra?)

Please understand this is my first dive into the world of NDMP and NAS backups. My expertise is elsewhere. So, feel free to correct me when I use bad terminology.

R

Rainer_EMC

8.6K Posts

0

September 14th, 2010 13:00

reading pax stats isn't that easy and usually requires changes and trying different params

I think there was a TechNote on Powerlink about pax stats

R

Rainer_EMC

8.6K Posts

1

September 14th, 2010 14:00

in case your tc_data contains mostly files of just a few blocks your numbers aren't that bad

on any system for small files the work to find the file and the metadata is larger than reading the data itself

then it's actually more a matter of files/sec then MB/sec

even for a single NDMP job the data mover does internally use multiple threads to help performance - but there is only so much you can do with small files

you might consider using more checkpoints for restore and doing a full backup less often

or consider volume based backup

Artyom

9 Posts

0

September 14th, 2010 14:00

Like this How would you classify this, mostly small files right?

1283326876: NDMP: 4: Thread nasa01 write count 6281360260775

1283326876: NDMP: 4: Thread nasa01 time used 169597 sec

1283326876: NDMP: 4: Thread nasa01 write rate 36168 KB/Sec

1283326876: NDMP: 4: Thread nasa01 read rate 0 KB/Sec

1283326876: NDMP: 4: Thread nasa01 average file size: 1136KB

1283326876: NDMP: 4: Thread nasa01 dir or 0 size file processed: 704135

1283326876: NDMP: 4: Thread nasa01 1B -- 8KB size file processed: 863560

1283326876: NDMP: 4: Thread nasa01 8KB -- 16KB size file processed: 218955

1283326876: NDMP: 4: Thread nasa01 16KB -- 32KB size file processed: 473924

1283326876: NDMP: 4: Thread nasa01 32KB -- 64KB size file processed: 569903

1283326876: NDMP: 4: Thread nasa01 64KB -- 1MB size file processed: 1876819

1283326876: NDMP: 4: Thread nasa01 1MB -- 32MB size file processed: 670717

1283326876: NDMP: 4: Thread nasa01 32MB -- 1GB size file processed: 20890

1283326876: NDMP: 4: Thread nasa01 1G more size file processed: 229

1283326876: NDMP: 6: Thread nasa01 server_archive: emctar vol 1, 5399132 files, 0 bytes read, 6281360260775 bytes written

In my search of the server_log I find an alarming number of these:

1283326992: FCP: 3: c64t0l0: abort called

|_SCSI ERR ON #6400 camstat 0x04 bstat DeviceError

1283327025: CAM: 3: 1: The SCSI HBA 64 is operating normally.

1283327038: CAM: 3: check_timeout (64) : c64t0l0 req 0x5f5f5098 timeout (10 sec 57605296257 cyc)

func 0x01 flags 0xc1 stat 0x00 cdb 00000000000000000000

Any idea what those are all about?

R

Rainer_EMC

8.6K Posts

0

September 14th, 2010 14:00

if you are unsure about your file sizes - a good first step is to look at your server_log after the backup has completed

PAX puts a file size histogram there

Rainer

R

Rainer_EMC

8.6K Posts

0

September 14th, 2010 14:00

actually not that small

don't know about the errors - I suggest to open a service request if you care about them

Artyom

9 Posts

0

September 14th, 2010 15:00

Anything look out of place here?

nasadmin@emcnas slot_2$ server_param server_2 -facility PAX -list

server_2 :

param_name facility default current configured

checkUtf8Filenames PAX 1 1

dump PAX 0 0

nPrefetch PAX 8 8

nThread PAX 64 128 128

writeToArch PAX 1 1

paxReadBuff PAX 64 64

writeToTape PAX 1 1

filter.numDirFilter PAX 5 5

paxWriteBuff PAX 64 256 256

filter.numFileFilter PAX 5 5

filter.dialect PAX '' ''

nFTSThreads PAX 8 16 16

paxStatBuff PAX 128 512 512

readWriteBlockSizeInKB PAX 64 64

nRestore PAX 8 8

filter.caseSensitive PAX 1 1

scanOnRestore PAX 1 1

noFileStreams PAX 0 0

allowVLCRestoreToUFS PAX 0 0

global_params:

param_name facility configured_value

nFTSThreads PAX 16

nThread PAX 128

paxWriteBuff PAX 256

paxStatBuff PAX 512

R

Rainer_EMC

8.6K Posts

0

September 14th, 2010 16:00

don't know

sorry - but troubleshooting backup performance takes more than a forum post and a couple of minutes

I don't work for support or professional services - maybe others want to chime in

You can also inquire about a backup assessment through your sales channels

Rainer

1
2

View All

No Events found!