Unsolved

This post is more than 5 years old

5632

December 15th, 2014 10:00

Data Domain Initial Directory Replication Speed?

Relatively new to Data Domain administration, so apologies if this lacks information.

To decommission our DD660, I need to move an 8TB (compressed) CIFS directory to a DD670 that is on the same network. Referring to the Data Domain Administration Guide, I prepared a CIFS directory replication session between two DDs on the same local network. I have also redirected the replication traffic over a private network of three aggregated 1Gb links.

Source: DD660 5.2.2.4-370754

Source directory is off of /backup/

Source Date is 8 TBs compressed, and I believe around 25TBs uncompressed.

Destination: DD670 5.4.1.1-411752

Despite this high bandwidth private network connection, the data transfer rate for the initial data copy process appears to be significantly slower then what I would have expected. I have been referring to the 'replication watch' command and the Stats graph view in Enterprise Manager to monitor progress and activity.

The Replication Summery view shows all references for time completion as unknown. About 16 hours into the replication, the 'replication watch' command indicates only 10% of the files have been copied. Assuming this maintained rate, the initial copy will take nearly seven days to complete.

Pet the Status / Stats graphs on source DD:

  • Disk read averages between 12 and 28 MiB/s

  • Private network ports averaging between 1 and 5 MiB/s per port

  • Replication peaks and valleys between 2 and 12 MiB/s

  • The CPU activity for both DDs is relatively light

From the DD Admin Guide and internet searches, I was not able to get a clear understanding of how fast an initial transfer for most replication types should take between two DDs on the same network. However, a few articles indicate transfer rates much faster then what I am seeing.

Some online resources have referred to ensuring the compression type is the same on both source and destination. I have not been able to find this information on the DDs. Other sources discuss increasing streams as well as converting the directory to an Mtree, then performing an Mtree replication. I have not had time to look into these further.

I have reviewed the recent article Replication Sizing Guide (https://community.emc.com/docs/DOC-40889). I have no concerns with the discussed items, but this does support my belief that I should be seeing better performance on this replication.

I would appreciate some feedback on what kind of replication performance I should actually be seeing.

If possible, how might I improve the performance.

Is there a more effective way to copy the CIFS directory data to the destination DD?

Thank you very much,

Bryan

December 15th, 2014 10:00

Thank you, rprnairj.

Both DDs appear to be set as LZ compression.

1 Rookie

 • 

45 Posts

December 15th, 2014 10:00

Just wanted to share,

"Some online resources have referred to ensuring the compression type is the same on both source and destination. I have not been able to find this information on the DDs."


This information can be seen on the enterprise manager, Data Management>filesystem>configuration.

1 Rookie

 • 

45 Posts

December 15th, 2014 11:00

Also, as you mentioned, that you have redirected the replication traffic, you see traffic moving on that specific interface?

1 Rookie

 • 

45 Posts

December 15th, 2014 11:00

what is your stream utilization on both source and destination data domain,

you can check this using #sys sh perf

1 Rookie

 • 

45 Posts

December 15th, 2014 12:00

it would be under rd/wr/r+/w+

December 15th, 2014 12:00

Yes, traffic is passing through each of the three aggregated ports at bot ends.

However, a coworker suggested switching to LACP from round robin.

I will give that a shot shortly.

December 15th, 2014 12:00

Sorry, I am not sure which columns of information you are looking for.

I have collected the output of the last few hours from each DD to a log.

Trying to post it.

1 Attachment

December 15th, 2014 13:00

No difference when changing the aggregate protocol.

However, I just learned there may be a problem with the switch that I am using.

I will let you know.

December 15th, 2014 13:00

For the last entries of each:

Source shows 0/ 1/ 0/ 0

Dest, shows 0/ 0/ 0/ 0

A sys perf output file is posted. You may want to look at that.

9 Legend

 • 

20.4K Posts

December 16th, 2014 11:00

i am not sure what kind of data you replicate but when i replicate Oracle RMAN backups, i get around 150MB/s from DD880 to DD890 (10G interfaces) on layer 2 network.

December 16th, 2014 11:00

The network switch issue has been addressed. In fact, we just bypassed it and ran the cables between the network ports of the two DDs (auto sensing ports).

While there has been a slight increase in replication performance, the average peak is only 25MiB/s. This is still not what I was expecting to see.

The Estimated Completion Times under the Replication Summery in System Manager now states about 14 days.

Before attempting to troubleshoot, I would like to get answers to my original questions;

  • What kind of replication rate should I expect to see under an ideal configuration?
  • Is this the best way to transfer this much CIFS directory data between two Data Domain?

Thank you.

December 16th, 2014 13:00

Thank you dynamox. That's the kind of throughput that I would expect to see.

The data being replicated is the resulting backup files of our exchange serves that are backed up via EMC SourceOne. Many of the files are the email attachments, while the others are container files of the email text. Those files range from KB to about a GB.

I can understand how different data types can be processed at different speeds, but I would be surprised that this data type is the reason for this kind of slow performance. The CPU does not appear to be working very hard, averaging between 4 and 8%. In fact, none of the monitored resources on either the source or destination DD appear to be very busy.

Might you have any suggestions on what else to look for as a replication or network configuration mistake, or a better ways to move this data?

9 Legend

 • 

20.4K Posts

December 16th, 2014 14:00

Bryan,

as a data point, could you create a directory a drop a couple of 2G ISO files (different flavors of Windows or Linux) and try to create directory replication just of that folder. Curious to see what kind of throughput you would get ?

Are you sure that replication traffic is going through the interface you think it's going ?

December 16th, 2014 14:00

Yep, I am sure the replication traffic is going over the private network. That one I have checked a few times.

I will setup some other misc. types of test data and run that through tonight. I’ll let you know tomorrow.

208 Posts

December 17th, 2014 01:00

Hi,
If your target DD is new and has no other data on it, then a migration or collection replication will be faster - less chatter - it just gets sent.

system show stats view net interval 5

That will give you specific interface throughput figures.

Going direct connect with 2 cables will probably make no difference, I suspect only 1 port will still be used.

If you are or were using LACP aggregation, check the above for throughput - if the 2 cables are going between 2 different switches then you may find only one path is used unless you use Vlag or VPc between the switches.

Your initial configuration was "failover" right? In that case only one interface is ever "active" anyway, so probably nothing changed anyway from your changes.

If your deduplication ratio is low, it will slow down the speed because it all has to come from disk, especially if the target has never seen this data or type of data before, though this may start to "pick up" as it starts to seed more and more data.

This would make the actual "NIC" speed a bit of a red herring.

There is not really a default answer to how fast it can be or should be - sadly "it depends".

You may find all traffic was/is going over the "management" interface anyway and maybe it's still going via your switch and thus your Veth0 interface is not being used at all.

Put in a route to force it but you'll need to go back on your switch to use LACP.

I've made masses of assumptions here, but hopefully you can pick some stuff out of this to help you.

Regards,

Jonathan

Top