VNX File System replication speeds - How fast can it transfer?

Question

We have 10 Gb networking between the Source 5700 and destination 5700 and almost no latency <1 ms. For all practical purposes they might as well be side by side in the datacenter plugged into the same switch. I have never been able to get the transfer to even 1 Gb. I am replicating a new file system and current transfer is 44712 KB/sec which is .34 Gb/sec No bandwidth limits have been applied to the interconnect. What kinds of transfer rates are people seeing over what speed netowork links? to me this seems unreasonable.

umichklewis · Answer

Replication performance is largely bound by network speed, but in some cases, can be bound by disk latency. In cases I've observed, an idle VNX will replicate a single filesystem on a single GbE at the speed you've described (you didn't specify if you had 10GbE interfaces in your 5700s, only that you had 10GbE networking between them).. However, I can continue to add more replication sessions (other filesystems) and continue to scale until I can saturate the network pipe (you'll see transfer rates fall off all replications when you do so).

Re-tranmission timeouts (RTOs) can sometimes be a problem on fast networks. If your 10GbE switches have any QoS or other bandwidth limitations enabled, that could contribute as well.

Have you checked tcp stats to see your retransmit rates? Are they below .01%?

dynamox · Answer

i'll try this tonight .. i have 2 x 5700 connected to Nexus 5k, layer 2 between the two.

mjzraz · Answer

I haven't checked RTO's but I will.

The thing is even from the beginning where there were barely any users on the vNX the speed was always stuck at 80000 KB/sec like we somehow had a 1Gb bottleneck somewhere. To answer your question we do have 10GbE interfaces in both vnx5700s we are using both configured together as an LACP. There is no Quality of service.

When one replication was running it would transfer @ 80000KB/sec

With 2, each was @ 40000 with 3, each was @25000.

The disks are SATA 2TB at source 3 TB SATA at destination.

There is no way we can saturate our network pipe.

I have run 6 256 thread emcopies while backups over Ethernet are running and replication and can never get more than 20% of their bandwidth used per networking.

dynamox · Answer

this is sad !!! 500G file system, 10G pipe, layer 2 ..absolutly no users on the NAS side on source and target VNX.

Max Out of Sync Time (minutes) = 10

Current Transfer Size (KB) = 524288000

Current Transfer Remain (KB) = 520208000

Estimated Completion Time = Thu Nov 01 14:22:12 EDT 2012

Current Transfer is Full Copy = Yes

Current Transfer Rate (KB/s) = 49546

Current Read Rate (KB/s) = 3307

Current Write Rate (KB/s) = 2945

Previous Transfer Rate (KB/s) = 0

Previous Read Rate (KB/s) = 0

Previous Write Rate (KB/s) = 0

Average Transfer Rate (KB/s) = 0

Average Read Rate (KB/s) = 0

Average Write Rate (KB/s) = 0

umichklewis · Answer

Ouch...  Take a look at the performance threads about SnapSure snapshots - I think they might have covered some of what's happen.  This is the end result.  It's the COW behavior that bogs down filesystem performance horrendously.  Creating replication sessions with the SavVol in the same pool could be the culprit.  Any chance you have a separate pool on the destination side you can drop the SavVol into with the -pool flag during set?  Of course, your source SavVol will be on the source fs - no easy way to change that....

dynamox · Answer

ok, i'll buy that for continuous replication but this is the first full copy, there should be no blocks to protect on the DR side (maybe a bunch of 0's). Dig dipper man, 48MB/s is laughable . I just tried to robocopy something to the same file system on the source, got 200MB/s ..so we know the plumbing is there but so far replicator is sucking wind.

umichklewis · Answer

I was under the impression that all the COW activity is on the DR SavVol, because of the internal checkpoints.  Replicator creates two checkpoints on the source and two on the DR side.  The block changes between the filesystem state and the internal source ckpt (the delta) gets transferred to the DR side and refreshes the destination internal ckpt. Unless I'm mistaken, the changes are all being written to the DR ckpt, then pushed into the filesystem.  If for some reason we rebooted the DR datamover, we could resume the replication without having to start from scratch - we can start from the last consistent state of the checkpoint.  Again, I could be off my rocker on this one....

dynamox · Answer

if this is the first fully copy, there should be no COW activity ?  Unfortunately i only have one pool on the target VNX so i can't test that theory. Target VNX is very lightly utilized by Block activity, i can't imagine if there was any COW it would be slowed down by disk. Any other ideas/tweaks ?

mjzraz · Answer

We also got similar speeds with emcopy 175 MB/s. what kind of drives do you have on source and destination?

dynamox · Answer

source - dedicated raid groups, NL-SAS R6 target - pool with EFD,SAS and NL-SAS

jmhere · Answer

Seen exactly the same problem. Two VNX 7600 arrays connected directly via 4x 10G links in LACP via a single Nexus 7k, and dont get any better than ~40MB/s speeds even when there is no other IO to the underlying pool and DMs doing nothing else. As laughable it is, that looks like a limit in how the VNX OE is built.

VNX

Was this post helpful?