Replication Problems with VNX 5300

Question

Hello Group,

We have two VNX 5300's. One @ the main site and one @ the target site.

We have a dedicated replication 10MB cicuit between the two sites. The ciruit has a service level agreement for performance and all indications is the circuit is working as advertised.

We have a 1TB high priority datastore (NFS partition) that we are trying to get synced. So far no luck. Two weeks ago we worked with EMC Support and deleted the replication link and recreated it. The replication process started and the transfer rate was 850KBps for the link.

First off, 850KBps seems low to me and I wonder why the VNX isn't grabbing more bandwidth as it is configured to do. EMC support tells me 850KBps is a fine transfer rate for a 10MB circuit? uh???

There are no throttle controls configured on the link.

The replication has gotten down to 566GB remaining and has been stuck for two days? The VM's on this partition are small servers, no heavy traffic, no heavy I/O.

We have a few other Replication links... One is CIFS share (500GB - just data files) which stays in sync and we have no troubles. We also have a reverse replication link from the target site to the main site, this link replicates a small nfs partition that holds three VM's. That link stays in sync no problem.

So the problem is only with this one particular high priority partition.

I have reviewed numerous circuit reports and the reports show no errors, no CRC errors, all ping test pass with no packet loss, average 6ms times.

Nothing is making sense? Has anyone else experienced suchreplcaition issues with the VNX 5300 ?

EMC Support did some trace files and they mentioned that the HIGH priority link had some retry transmissions while the other two replication links had NO retry transmissoins.? Again? Very odd that only one link would have retry transmissions. Of course, EMC wants to point the problem to our side and our network but there is NO evidence that we are having any network issues. And LIke I mentioned we have a dedicated 10MB replication link.

Is there anything that anyone can think of something else I might check? There is such little in terms of configurations on the EMC regarding the replication configurations. It's just create the link and hope that it works??

I also have installed VM monitoring tools from WhatsUpGold and I can verify that my VM's are not even breathing hard.

One other point. This configuration was installed by a professional EMC consultanting company and it seems odd to me that we are trying to replicate an entire 1TB partition? If we can't get the 1TB partition to get synced.. We have NO DISASTER RECOVERY. Why not create multiple partitions and therefore hopefully put less eggs in the replciation basket? What are your thoughts on this logic? We have 10 VM's? Maybe 3 VM's per partition and 3 different replications LInks would have been a better design? It just seems odd to put all the eggs in one replication basket

Any input would be greatly appreciated..

Thanks!

flya750 · Answer

I think I may have an idea on what is causing my replication problem... NOT enough disk space on the partition for the checkpoint......causing the replication sync to fail....

dynamox · Answer

checkpionts are stored in savvol which is a hidden file system, do you have enough space in the pool on VNX (file pool that is) ?

VNX

Replication Problems with VNX 5300

Was this post helpful?