Replication performance is largely bound by network speed, but in some cases, can be bound by disk latency. In cases I've observed, an idle VNX will replicate a single filesystem on a single GbE at the speed you've described (you didn't specify if you had 10GbE interfaces in your 5700s, only that you had 10GbE networking between them).. However, I can continue to add more replication sessions (other filesystems) and continue to scale until I can saturate the network pipe (you'll see transfer rates fall off all replications when you do so).
Re-tranmission timeouts (RTOs) can sometimes be a problem on fast networks. If your 10GbE switches have any QoS or other bandwidth limitations enabled, that could contribute as well.
Have you checked tcp stats to see your retransmit rates? Are they below .01%?
The thing is even from the beginning where there were barely any users on the vNX the speed was always stuck at 80000 KB/sec like we somehow had a 1Gb bottleneck somewhere. To answer your question we do have 10GbE interfaces in both vnx5700s we are using both configured together as an LACP. There is no Quality of service.
When one replication was running it would transfer @ 80000KB/sec
With 2, each was @ 40000 with 3, each was @25000.
The disks are SATA 2TB at source 3 TB SATA at destination.
There is no way we can saturate our network pipe.
I have run 6 256 thread emcopies while backups over Ethernet are running and replication and can never get more than 20% of their bandwidth used per networking.
Ouch... Take a look at the performance threads about SnapSure snapshots - I think they might have covered some of what's happen. This is the end result. It's the COW behavior that bogs down filesystem performance horrendously. Creating replication sessions with the SavVol in the same pool could be the culprit. Any chance you have a separate pool on the destination side you can drop the SavVol into with the -pool flag during set? Of course, your source SavVol will be on the source fs - no easy way to change that....
ok, i'll buy that for continuous replication but this is the first full copy, there should be no blocks to protect on the DR side (maybe a bunch of 0's). Dig dipper man, 48MB/s is laughable . I just tried to robocopy something to the same file system on the source, got 200MB/s ..so we know the plumbing is there but so far replicator is sucking wind.
I was under the impression that all the COW activity is on the DR SavVol, because of the internal checkpoints. Replicator creates two checkpoints on the source and two on the DR side. The block changes between the filesystem state and the internal source ckpt (the delta) gets transferred to the DR side and refreshes the destination internal ckpt. Unless I'm mistaken, the changes are all being written to the DR ckpt, then pushed into the filesystem. If for some reason we rebooted the DR datamover, we could resume the replication without having to start from scratch - we can start from the last consistent state of the checkpoint. Again, I could be off my rocker on this one....
if this is the first fully copy, there should be no COW activity ? Unfortunately i only have one pool on the target VNX so i can't test that theory. Target VNX is very lightly utilized by Block activity, i can't imagine if there was any COW it would be slowed down by disk. Any other ideas/tweaks ?
Seen exactly the same problem. Two VNX 7600 arrays connected directly via 4x 10G links in LACP via a single Nexus 7k, and dont get any better than ~40MB/s speeds even when there is no other IO to the underlying pool and DMs doing nothing else. As laughable it is, that looks like a limit in how the VNX OE is built.
umichklewis
3 Apprentice
•
1.2K Posts
0
October 31st, 2012 10:00
Replication performance is largely bound by network speed, but in some cases, can be bound by disk latency. In cases I've observed, an idle VNX will replicate a single filesystem on a single GbE at the speed you've described (you didn't specify if you had 10GbE interfaces in your 5700s, only that you had 10GbE networking between them).. However, I can continue to add more replication sessions (other filesystems) and continue to scale until I can saturate the network pipe (you'll see transfer rates fall off all replications when you do so).
Re-tranmission timeouts (RTOs) can sometimes be a problem on fast networks. If your 10GbE switches have any QoS or other bandwidth limitations enabled, that could contribute as well.
Have you checked tcp stats to see your retransmit rates? Are they below .01%?
dynamox
9 Legend
•
20.4K Posts
0
October 31st, 2012 13:00
i'll try this tonight .. i have 2 x 5700 connected to Nexus 5k, layer 2 between the two.
mjzraz
1 Rookie
•
93 Posts
0
October 31st, 2012 13:00
I haven't checked RTO's but I will.
The thing is even from the beginning where there were barely any users on the vNX the speed was always stuck at 80000 KB/sec like we somehow had a 1Gb bottleneck somewhere. To answer your question we do have 10GbE interfaces in both vnx5700s we are using both configured together as an LACP. There is no Quality of service.
When one replication was running it would transfer @ 80000KB/sec
With 2, each was @ 40000 with 3, each was @25000.
The disks are SATA 2TB at source 3 TB SATA at destination.
There is no way we can saturate our network pipe.
I have run 6 256 thread emcopies while backups over Ethernet are running and replication and can never get more than 20% of their bandwidth used per networking.
dynamox
9 Legend
•
20.4K Posts
0
November 1st, 2012 08:00
this is sad !!! 500G file system, 10G pipe, layer 2 ..absolutly no users on the NAS side on source and target VNX.
Max Out of Sync Time (minutes) = 10
Current Transfer Size (KB) = 524288000
Current Transfer Remain (KB) = 520208000
Estimated Completion Time = Thu Nov 01 14:22:12 EDT 2012
Current Transfer is Full Copy = Yes
Current Transfer Rate (KB/s) = 49546
Current Read Rate (KB/s) = 3307
Current Write Rate (KB/s) = 2945
Previous Transfer Rate (KB/s) = 0
Previous Read Rate (KB/s) = 0
Previous Write Rate (KB/s) = 0
Average Transfer Rate (KB/s) = 0
Average Read Rate (KB/s) = 0
Average Write Rate (KB/s) = 0
umichklewis
3 Apprentice
•
1.2K Posts
0
November 1st, 2012 08:00
Ouch... Take a look at the performance threads about SnapSure snapshots - I think they might have covered some of what's happen. This is the end result. It's the COW behavior that bogs down filesystem performance horrendously. Creating replication sessions with the SavVol in the same pool could be the culprit. Any chance you have a separate pool on the destination side you can drop the SavVol into with the -pool flag during set? Of course, your source SavVol will be on the source fs - no easy way to change that....
dynamox
9 Legend
•
20.4K Posts
0
November 1st, 2012 09:00
ok, i'll buy that for continuous replication but this is the first full copy, there should be no blocks to protect on the DR side (maybe a bunch of 0's). Dig dipper man, 48MB/s is laughable
. I just tried to robocopy something to the same file system on the source, got 200MB/s ..so we know the plumbing is there but so far replicator is sucking wind.
umichklewis
3 Apprentice
•
1.2K Posts
0
November 1st, 2012 09:00
I was under the impression that all the COW activity is on the DR SavVol, because of the internal checkpoints. Replicator creates two checkpoints on the source and two on the DR side. The block changes between the filesystem state and the internal source ckpt (the delta) gets transferred to the DR side and refreshes the destination internal ckpt. Unless I'm mistaken, the changes are all being written to the DR ckpt, then pushed into the filesystem. If for some reason we rebooted the DR datamover, we could resume the replication without having to start from scratch - we can start from the last consistent state of the checkpoint. Again, I could be off my rocker on this one....
dynamox
9 Legend
•
20.4K Posts
0
November 1st, 2012 09:00
if this is the first fully copy, there should be no COW activity ? Unfortunately i only have one pool on the target VNX so i can't test that theory. Target VNX is very lightly utilized by Block activity, i can't imagine if there was any COW it would be slowed down by disk. Any other ideas/tweaks ?
mjzraz
1 Rookie
•
93 Posts
0
November 1st, 2012 21:00
We also got similar speeds with emcopy 175 MB/s. what kind of drives do you have on source and destination?
dynamox
9 Legend
•
20.4K Posts
0
November 2nd, 2012 03:00
source - dedicated raid groups, NL-SAS R6
target - pool with EFD,SAS and NL-SAS
jmhere
1 Rookie
•
16 Posts
0
June 6th, 2017 04:00
Seen exactly the same problem. Two VNX 7600 arrays connected directly via 4x 10G links in LACP via a single Nexus 7k, and dont get any better than ~40MB/s speeds even when there is no other IO to the underlying pool and DMs doing nothing else. As laughable it is, that looks like a limit in how the VNX OE is built.