Unsolved

This post is more than 5 years old

5732

December 15th, 2014 10:00

Data Domain Initial Directory Replication Speed?

Relatively new to Data Domain administration, so apologies if this lacks information.

To decommission our DD660, I need to move an 8TB (compressed) CIFS directory to a DD670 that is on the same network. Referring to the Data Domain Administration Guide, I prepared a CIFS directory replication session between two DDs on the same local network. I have also redirected the replication traffic over a private network of three aggregated 1Gb links.

Source: DD660 5.2.2.4-370754

Source directory is off of /backup/

Source Date is 8 TBs compressed, and I believe around 25TBs uncompressed.

Destination: DD670 5.4.1.1-411752

Despite this high bandwidth private network connection, the data transfer rate for the initial data copy process appears to be significantly slower then what I would have expected. I have been referring to the 'replication watch' command and the Stats graph view in Enterprise Manager to monitor progress and activity.

The Replication Summery view shows all references for time completion as unknown. About 16 hours into the replication, the 'replication watch' command indicates only 10% of the files have been copied. Assuming this maintained rate, the initial copy will take nearly seven days to complete.

Pet the Status / Stats graphs on source DD:

  • Disk read averages between 12 and 28 MiB/s

  • Private network ports averaging between 1 and 5 MiB/s per port

  • Replication peaks and valleys between 2 and 12 MiB/s

  • The CPU activity for both DDs is relatively light

From the DD Admin Guide and internet searches, I was not able to get a clear understanding of how fast an initial transfer for most replication types should take between two DDs on the same network. However, a few articles indicate transfer rates much faster then what I am seeing.

Some online resources have referred to ensuring the compression type is the same on both source and destination. I have not been able to find this information on the DDs. Other sources discuss increasing streams as well as converting the directory to an Mtree, then performing an Mtree replication. I have not had time to look into these further.

I have reviewed the recent article Replication Sizing Guide (https://community.emc.com/docs/DOC-40889). I have no concerns with the discussed items, but this does support my belief that I should be seeing better performance on this replication.

I would appreciate some feedback on what kind of replication performance I should actually be seeing.

If possible, how might I improve the performance.

Is there a more effective way to copy the CIFS directory data to the destination DD?

Thank you very much,

Bryan

December 17th, 2014 08:00

Hi Jonathan. Thanks for your input on this.

The destination DD is already hosting data from other production environments, so I’m not able to do the Collection Replication.

I previously verified that the replication traffic was going over the private network a few times. But that command was an interesting view of each port utilization.

The private network was setup as a round-robin aggregation of three 1gb links. Per the various port monitoring options, all ports were evenly supporting the total traffic passed to the veth port. Other network configurations were tried, to the point of the current configuration of a single direct link between the two DDs with no network switch. The 1gb Ethernet ports are auto-sensing, and don’t require a cross-over cable. I probably should have tried this sooner. In any case, there has been no difference to the replication rate or the amount of traffic being passed. I think I have ruled out any network connectivity and configuration issues at this point.

You may be on to something with the deduplication suggestion though. I am doing some replication performance testing with various data types, and nothing that has been on either DD before. I’ll let you know what I see.

Thanks.

December 18th, 2014 08:00

Hi rjgreg. Thanks for the suggestion.

I had considered the OS version differences, but per the Data Domain Administrators Guide, Replication should work find provided the two DDs are within two revision codes of each other, which I believe these are. A review of these DDOS versions found no known issues with the replication feature. If I had more time to work on this, I suppose I would do an upgrade anyway, but having to work other tasks, I am not likely going to get to that any time soon.

Thanks anyway.

December 18th, 2014 08:00

I did some CIFS Replication performance testing with a variety of data types, to include typical misc. text based documents files, large videos, virtual machine images, ISO images, pre-compressed files, etc…

The majority of these file types replicated relatively quickly. So I believe the network and DD Replication configurations are fine. I suspect that the slow performance may just be with this particular data.

I recently learned that Data Domains are not able to process encrypted and/or pre-compressed data as efficiently as other data. The data I am replicating is the backup files from our SourceOne server that backs up our MS Exchange mail servers. So I suspect this may be the case, but I am awaiting confirmation from our system admin who manages those environments.

In any case, I don’t believe there is anything more that I can do but to just let it ride as-is and wait it out.

Thank you all for your feedback and suggestions. They definitely helped.

Bryan

December 24th, 2014 06:00

This sounds very similar to what I am seeing.  I have been seeding data from one DD990 to another DD990 with a 10Gbit NIC on both sides for the replication traffic over CIFS.  I have 3 different replication contexts going.  One is Oracle datapump/export type traffic.  Another is RMAN based backups (archive log and full backups), and the 3rd is SQL Server backups (tlogs and full backups).  The SQL Server backups are loaded on a CIFS share.  Typically I am seeing about 15MB/s running on the SQL replication context where I see 90-120MB/s on the RMAN replication.  I really do think it's an artifact of the type of data it is.

On a side note, my replication performance increased when I upgraded to 5.4.4.2.  I was on 5.4.0.8, so it's not a surprise, but it did help by upgrading both units.

December 29th, 2014 07:00

Hi Blake. I think your wright about the OS version, I probably should have had the DD660 upgraded before getting too deep in this replication.

Fortunately, as the replication progressed, the displayed estimated completion time started to fall. It actually completed in about eight days as opposed to the two weeks it originally estimated.

With the DD660 now offline, I will be getting the OS of my remaining DDs upgraded and in sync.

Thanks for the feedback.

No Events found!

Top