Have you upgraded or enabled dedupe? After I upgraded to Dart6 and the super-de-dupe ran my NDMP backups increased several hours - but the data on the tape is a lot smaller...when I say several hours, it actually went from 40 to 52 and 90% of it is FC.
I did check deduplication as we tested it some time ago but for various reasons it was not enabled anyway.I also deleted checkpoints just in case.
We have auditing enabled on the filesystem but for a time being I did not disable it.
EMC support agreed that network has got nothing to do with the problem.They ran performance collection on Clariion backend and I can see heavy load on Celerra raid groups.
What I don't understand is why other filesystems residing on the same raidgroup are not affected.Some LUNs used by fs01 are heavily used but surely those LUNs are part of the RAID array used by other LUNs allocated to other filesystems.I would expect to see some correlation in degradation on other filesystems as platters are used across entire array, seems like my assumptions are wrong.
Next thing was to identify what slows down the filer.From Celerra logs I could see constant CIFS Ops/sec, above 2000 , day and night - this did not look normal to me.Throughput data graph did not really correlate with Ops/sec, however IOPS/sec on Clariion array did.
First thing I did is ran server_tcpdump on the filer to see what IP creates most packets as this would almost probably correlate to Celerra CIFS Ops/sec.Seems that Backup Exec server doing NDMP backup was generating quite a lot of packets.This was a surprise for me as I would expect NDMP ethernet traffic to be very light, but again, it seems like my assumptions were wrong.
Next thing I will try to identify whether bottleneck is iops, throughput, or both.Criteria would be a transfer time for the big file to and from array.It should be as close as possible on all filesystems and reasonably short.
Transfer times definetly improve when backup server is off and CIFS Ops/sec drops to low value.I still need to see how they change when throuhput is high.I have been looking for the tool that would do file transfers on a scheduled basis and graph them but could not find anything that suits me. Most benchmarking tools produce a lot of data without simple graph output.Wha I need is a graph showing transfer times of file size X done every Y minutes.
I didn't think NDMP could generate CIFS traffic - are you sure you are using NDMP and the backups aren't running over CIFS? CIFS-based backups are pretty slow. You can do over-the-wire NDMP backups if you aren't set up for this - there are both EMC and Symantec documents that cover this setup.
Celerra is backed up using NDMP backup only.I can see data stream going to the tape during the backup job and CIFS would not be able to sustaing such high speeds.CIFS jobs done through a separate backup server and they are few times sloweAt the same time I can see a lot of traffic on the ethernet side.The only way I can explain it is that Backup exec traverses filesystem for catalogue.
However I would need to consult documentation about it.
DanJost
190 Posts
1
January 18th, 2011 12:00
Have you upgraded or enabled dedupe? After I upgraded to Dart6 and the super-de-dupe ran my NDMP backups increased several hours - but the data on the tape is a lot smaller...when I say several hours, it actually went from 40 to 52 and 90% of it is FC.
Dan
uninitializing
3 Posts
0
January 19th, 2011 00:00
Thank you Dan,
I did check deduplication as we tested it some time ago but for various reasons it was not enabled anyway.I also deleted checkpoints just in case.
We have auditing enabled on the filesystem but for a time being I did not disable it.
EMC support agreed that network has got nothing to do with the problem.They ran performance collection on Clariion backend and I can see heavy load on Celerra raid groups.
What I don't understand is why other filesystems residing on the same raidgroup are not affected.Some LUNs used by fs01 are heavily used but surely those LUNs are part of the RAID array used by other LUNs allocated to other filesystems.I would expect to see some correlation in degradation on other filesystems as platters are used across entire array, seems like my assumptions are wrong.
Next thing was to identify what slows down the filer.From Celerra logs I could see constant CIFS Ops/sec, above 2000 , day and night - this did not look normal to me.Throughput data graph did not really correlate with Ops/sec, however IOPS/sec on Clariion array did.
First thing I did is ran server_tcpdump on the filer to see what IP creates most packets as this would almost probably correlate to Celerra CIFS Ops/sec.Seems that Backup Exec server doing NDMP backup was generating quite a lot of packets.This was a surprise for me as I would expect NDMP ethernet traffic to be very light, but again, it seems like my assumptions were wrong.
Next thing I will try to identify whether bottleneck is iops, throughput, or both.Criteria would be a transfer time for the big file to and from array.It should be as close as possible on all filesystems and reasonably short.
Transfer times definetly improve when backup server is off and CIFS Ops/sec drops to low value.I still need to see how they change when throuhput is high.I have been looking for the tool that would do file transfers on a scheduled basis and graph them but could not find anything that suits me. Most benchmarking tools produce a lot of data without simple graph output.Wha I need is a graph showing transfer times of file size X done every Y minutes.
DanJost
190 Posts
0
January 19th, 2011 04:00
I didn't think NDMP could generate CIFS traffic - are you sure you are using NDMP and the backups aren't running over CIFS? CIFS-based backups are pretty slow. You can do over-the-wire NDMP backups if you aren't set up for this - there are both EMC and Symantec documents that cover this setup.
Dan
uninitializing
3 Posts
0
January 19th, 2011 05:00
Celerra is backed up using NDMP backup only.I can see data stream going to the tape during the backup job and CIFS would not be able to sustaing such high speeds.CIFS jobs done through a separate backup server and they are few times sloweAt the same time I can see a lot of traffic on the ethernet side.The only way I can explain it is that Backup exec traverses filesystem for catalogue.
However I would need to consult documentation about it.
Rainer_EMC
4 Operator
•
8.6K Posts
1
January 19th, 2011 18:00
NDMP will never use CIFS
It shouldn't be difficult to find out which client is causing the CIFS usage - try server_stats or server_cifs -o audit