I have one filesystem called vm1-sas that is not replicated to our destination. It says last full replication was Feb 27th. On the destination I do have an alert created Feb 28 that says lot2:1393595868: /nas/sbin/rootnas_fs -x root_rep_ckpt_88_154419_1 QOSsize=20000M Error 3024: There are not enough free disks available to satisfy the request..
Is this something I can resolve myself?
server_mount server_2 | grep ckpt
root_rep_ckpt_90_32645_1 on /root_rep_ckpt_90_32645_1 ckpt,perm,ro,accesspolicy=NATIVE,nolock
root_rep_ckpt_90_32645_2 on /root_rep_ckpt_90_32645_2 ckpt,perm,ro,accesspolicy=NATIVE,nolock
root_rep_ckpt_51_37708_1 on /root_rep_ckpt_51_37708_1 ckpt,perm,ro
root_rep_ckpt_51_37708_2 on /root_rep_ckpt_51_37708_2 ckpt,perm,ro
root_rep_ckpt_79_81203_1 on /root_rep_ckpt_79_81203_1 ckpt,perm,ro
root_rep_ckpt_79_81203_2 on /root_rep_ckpt_79_81203_2 ckpt,perm,ro
root_rep_ckpt_88_154419_1 on /root_rep_ckpt_88_154419_1 ckpt,perm,ro
root_rep_ckpt_88_154419_2 on /root_rep_ckpt_88_154419_2 ckpt,perm,ro
root_rep_ckpt_92_160596_1 on /root_rep_ckpt_92_160596_1 ckpt,perm,ro
root_rep_ckpt_92_160596_2 on /root_rep_ckpt_92_160596_2 ckpt,perm,ro
The failing command wanted to extend the checkpoint savvol by 20 GB. But there was not enough free disk space in the storage pool to satisfy this request.
As this messages are more than 3 weeks old, the replication will be unable to restart incremental.
The fastest way back to a synchronized copy will be, deleting the replication and the destination fs and restart the replication from scratch.
Ok well I was able to make 500GB available in this storage pool, so how can I kick off this replication again? I don't want to start over it will take a long time to replicate 1.2TB 30mbps during hours, 40mbps after hours, plus daily change rate.
Unisphere doesn't really do much, I click stop and then refresh but it's like the stop command never makes it. In fact I can't do much in Unisphere. To unshare, unmount and delete an old filesystem, I had to do everything command line because Unisphere just blinks and then sits there staring at you. Should I be doing this from the destination or the source side?
Also I get all these warnings in Unisphere, despite both sides being pingable, having the same password and looking at the same NTP server for time.
> Query file systems All:All. HTTP communication not permitted: Time skew between local and remote system dcusan1 may exceed 10 minutes OR nas_cel passphrase mismatch exists. details...
Check the Controlstation time of both systems, seems there is a difference of more than 10 minutes.
I can only repeat my advice: deleting the replication and the destination fs and restart the replication from scratch will save your working time and will be faster than trying to do incremental restart, running again out of space and then restarting from scratch.
Even if you are able to restart the replication incrementaly, I doubt your destination space will be enough, because 27 days * 2 % change rate = 54 % ... most of it will need to be copied to the checkpoint savvol.
Thank you. The time was off about 15 minutes. I used the date command line su'd to root to correct this. Seems we have to do this every year despite the NTP servers being added in.
Anyway support got the replication going again but it says that the current transfer estimated end time is Friday may 2. Thats over a month away. It was originally april 30th but then overnight it went out more.
They didn't delete anything... they just looked around and restarted it via command line.
Max Out of Sync Time (minutes) = 10
Current Transfer Size (KB) = 64521864
Current Transfer Remain (KB) = 38943384
Estimated Completion Time = Fri May 02 03:03:44 EDT 2014
Current Transfer is Full Copy = No
Current Transfer Rate (KB/s) = 12
Current Read Rate (KB/s) = 0
Current Write Rate (KB/s) = 1005
Previous Transfer Rate (KB/s) = 947
Previous Read Rate (KB/s) = 795
Previous Write Rate (KB/s) = 368
Average Transfer Rate (KB/s) = 1007
Average Read Rate (KB/s) = 1308
Average Write Rate (KB/s) = 409
The problem is, the current write Rate is only 1MB/s. It will take an eternity to transfer that much data at that speed.
Do you have any WAN optimization appliances (ie. Riverbed) in the path?
No I do not have any of those devices. Do you have any experience? Would a wan optomizer like a Riverbed be beneficial here? Not sure if theres any type of data that is more favorable.
Do you put one at each site, or just one at the HQ? Does it sit inline between the router and the WAN or the core switch and the router, or is the device the router itself?
If its worth it, I'll surely look into it.
Short answer, if you're using Replicator, I'd (personally) recommend AVOIDING WAN accelerators. Why?
VNX Replicator does it's own work to limit the amount of data being sent on the wire, and computes CRCs for blocks sent. WAN accelerators take the data being sent and apply their own compression techniques to the data in-flight. This often makes the CRC or data transmitted invalid, and causes the VNX to re-send data, which is further modified and makes the situation worse.
Sure, there are WAN accelerators that are on the EMC Support Matrix for VNX, but in my experience, it's caused more problems that it's solved. I'm sure they work great for other protocols, but all three of the environments I worked with (NS700G, NS40 and VNX5400) all had issues that were resolved by turning off WAN acceleration on the Replicator sessions/ports. As with all things, your mileage will vary....