So from the screen shot above you are using pool based replication (which is mtree replication under the covers). Mtree replication works as follows (this is a very basic overview):
- Takes mtree snapshots on source and destination
- Compares snapshots on source and destination to work out the differences (i.e. which files are different on the source and therefore need to be replicated to the destination)
- Replicates the differences from source -> destination such that the snapshot on the source can be built on the destination
- Once replication is complete and the mtree snapshot from the source built on the destination, the destination 'exposes' the new snapshot (i.e. its contents are used to replace the contents of the mtree on the destination)
As mentioned above snapshot comparison is used to work out which files need to be replicated. For each file the source will:
- Read the file to work out what data it contains
- Ask the destination if it already has a copy of this data
- Physically replicate any data which the source doesn't already have
As a result mtree replication is 'de-dupe aware' and will only send data that the destination doesn't already have rather then performing 'full' replication of any new files.
Note that:
- Replication contexts don't report how much unique/de-duplicated data they have left to send - instead they report how much pre-compressed (i.e. logical/unhydrated) data they have left to send. In the screen shot above it shows that the first context has ~2981Gb pre-compressed data remaining - as replication is de-dupe aware replication will not need to send 2981Gb physical data over the wire
- To further illustrate this the replication context reports that it is achieving a compression ratio of ~19x (meaning that it is only sending 1Kb of physical data for ~19Kb logical data)
If replication is slow/lagging then the main reasons for this are generally:
- A lack of network bandwidth between the DDRs (or bandwidth throttling being imposed). I notice that you have low bandwidth optimisation enabled so imagine that you do not have good bandwidth between systems as this option is only designed for connections of < 6Mbps
- Poor read performance on the source DDR - as the source DDR needs to read the file to work out what data it references poor read performance (which can be caused by a number of factors) can cause poor replication performance
If you would like this looked at more closely/explained in more detail I would recommend opening a support case.
We have a very similar scenario. As400 using VTL and replication between 2 sites. Did you end up discovering something that helped improve replication? We have a 50MB VPN tunnel but it's taken 3 days to replicate the 65GB post-comp data..Also have you taken a look at your iostat and system performance while replication is running? For some reason we're only pushing out 900kb/s.
You could confirm there isn't any data domain replication throttles set. If one is set it will limit the replication bandwidth used.
Also you could run iperf between the two data domains to confirm there is the expected bandwidth. Sometimes although there is a VPN tunnel with a certain bandwidth it is not always dedicated to data domain replication. iperf is built into the data domain with the "net iperf client" and "net iperf server" commands. If iperf doesn't go as fast as you expect its probably a network issue and not the data domain.
James_Ford
30 Posts
1
January 13th, 2017 03:00
Hi Paul,
So from the screen shot above you are using pool based replication (which is mtree replication under the covers). Mtree replication works as follows (this is a very basic overview):
- Takes mtree snapshots on source and destination
- Compares snapshots on source and destination to work out the differences (i.e. which files are different on the source and therefore need to be replicated to the destination)
- Replicates the differences from source -> destination such that the snapshot on the source can be built on the destination
- Once replication is complete and the mtree snapshot from the source built on the destination, the destination 'exposes' the new snapshot (i.e. its contents are used to replace the contents of the mtree on the destination)
As mentioned above snapshot comparison is used to work out which files need to be replicated. For each file the source will:
- Read the file to work out what data it contains
- Ask the destination if it already has a copy of this data
- Physically replicate any data which the source doesn't already have
As a result mtree replication is 'de-dupe aware' and will only send data that the destination doesn't already have rather then performing 'full' replication of any new files.
Note that:
- Replication contexts don't report how much unique/de-duplicated data they have left to send - instead they report how much pre-compressed (i.e. logical/unhydrated) data they have left to send. In the screen shot above it shows that the first context has ~2981Gb pre-compressed data remaining - as replication is de-dupe aware replication will not need to send 2981Gb physical data over the wire
- To further illustrate this the replication context reports that it is achieving a compression ratio of ~19x (meaning that it is only sending 1Kb of physical data for ~19Kb logical data)
If replication is slow/lagging then the main reasons for this are generally:
- A lack of network bandwidth between the DDRs (or bandwidth throttling being imposed). I notice that you have low bandwidth optimisation enabled so imagine that you do not have good bandwidth between systems as this option is only designed for connections of < 6Mbps
- Poor read performance on the source DDR - as the source DDR needs to read the file to work out what data it references poor read performance (which can be caused by a number of factors) can cause poor replication performance
If you would like this looked at more closely/explained in more detail I would recommend opening a support case.
Thanks, James
jmbaxt1
4 Posts
0
February 17th, 2017 07:00
We have a very similar scenario. As400 using VTL and replication between 2 sites. Did you end up discovering something that helped improve replication?
We have a 50MB VPN tunnel but it's taken 3 days to replicate the 65GB post-comp data..Also have you taken a look at your iostat and system performance while replication is running? For some reason we're only pushing out 900kb/s.
Ryan_Johnson
73 Posts
0
February 17th, 2017 13:00
You could confirm there isn't any data domain replication throttles set. If one is set it will limit the replication bandwidth used.
Also you could run iperf between the two data domains to confirm there is the expected bandwidth. Sometimes although there is a VPN tunnel with a certain bandwidth it is not always dedicated to data domain replication. iperf is built into the data domain with the "net iperf client" and "net iperf server" commands. If iperf doesn't go as fast as you expect its probably a network issue and not the data domain.