kjstech
3 Argentium

Replication stalled on one filesystem and seeing Error 3024 on destination.

I have one filesystem called vm1-sas that is not replicated to our destination.  It says last full replication was Feb 27th.  On the destination I do have an alert created Feb 28 that says Smiley Frustratedlot2:1393595868: /nas/sbin/rootnas_fs -x root_rep_ckpt_88_154419_1 QOSsize=20000M Error 3024: There are not enough free disks available to satisfy the request..

Is this something I can resolve myself?

server_mount server_2 | grep ckpt

root_rep_ckpt_90_32645_1 on /root_rep_ckpt_90_32645_1 ckpt,perm,ro,accesspolicy=NATIVE,nolock

root_rep_ckpt_90_32645_2 on /root_rep_ckpt_90_32645_2 ckpt,perm,ro,accesspolicy=NATIVE,nolock

root_rep_ckpt_51_37708_1 on /root_rep_ckpt_51_37708_1 ckpt,perm,ro

root_rep_ckpt_51_37708_2 on /root_rep_ckpt_51_37708_2 ckpt,perm,ro

root_rep_ckpt_79_81203_1 on /root_rep_ckpt_79_81203_1 ckpt,perm,ro

root_rep_ckpt_79_81203_2 on /root_rep_ckpt_79_81203_2 ckpt,perm,ro

root_rep_ckpt_88_154419_1 on /root_rep_ckpt_88_154419_1 ckpt,perm,ro

root_rep_ckpt_88_154419_2 on /root_rep_ckpt_88_154419_2 ckpt,perm,ro

root_rep_ckpt_92_160596_1 on /root_rep_ckpt_92_160596_1 ckpt,perm,ro

root_rep_ckpt_92_160596_2 on /root_rep_ckpt_92_160596_2 ckpt,perm,ro

0 Kudos
10 Replies

Re: Replication stalled on one filesystem and seeing Error 3024 on destination.

Hi,

Please refer to the EMC KB article emc112050 which may help you.

Thanks

Jyothi

0 Kudos
Peter_EMC
3 Zinc

Re: Replication stalled on one filesystem and seeing Error 3024 on destination.

The failing command wanted to extend the checkpoint savvol by 20 GB. But there was not enough free disk space in the storage pool to satisfy this request.

As this messages are more than 3 weeks old, the replication will be unable to restart incremental.

The fastest way back to a synchronized copy will be, deleting the replication and the destination fs and restart the replication from scratch.

0 Kudos

Re: Replication stalled on one filesystem and seeing Error 3024 on destination.

Hi,

You can also  find more information in EMC KB article  emc183815

Thanks

Jyothi

0 Kudos
kjstech
3 Argentium

Re: Replication stalled on one filesystem and seeing Error 3024 on destination.

Ok well I was able to make 500GB available in this storage pool, so how can I kick off this replication again?  I don't want to start over it will take a long time to replicate 1.2TB 30mbps during hours, 40mbps after hours, plus daily change rate.

Unisphere doesn't really do much, I click stop and then refresh but it's like the stop command never makes it.  In fact I can't do much in Unisphere.  To unshare, unmount and delete an old filesystem, I had to do everything command line because Unisphere just blinks and then sits there staring at you.  Should I be doing this from the destination or the source side?

Also I get all these warnings in Unisphere, despite both sides being pingable, having the same password and looking at the same NTP server for time.

Warning Query Data Mover Interconnects NONE:All. HTTP communication not permitted: Time skew between local and remote system dcusan1 may exceed 10 minutes OR nas_cel passphrase mismatch exists. details...
Error  Data Mover interconnect name information does not exist. details...
Info  You cannot modify the replication session from the destination side. details...
Error Query file systems All:All. Cannot access any Data Mover on the remote system, dcusan1 details...
Warning Query file systems All:All. HTTP communication not permitted: Time skew between local and remote system dcusan1 may exceed 10 minutes OR nas_cel passphrase mismatch exists. details...
Error  Unable to find the name for file system ID 88. details...
0 Kudos
Peter_EMC
3 Zinc

Re: Replication stalled on one filesystem and seeing Error 3024 on destination.

> Query file systems All:All. HTTP communication not permitted: Time skew between local and remote system dcusan1 may exceed 10 minutes OR nas_cel passphrase mismatch exists. details...

Check the Controlstation time of both systems, seems there is a difference of more than 10 minutes.

I can only repeat my advice: deleting the replication and the destination fs and restart the replication from scratch will save your working time and will be faster than trying to do incremental restart, running again out of space and then restarting from scratch.


Even if you are able to restart the replication incrementaly, I doubt your destination space will be enough, because 27 days * 2 % change rate = 54 % ... most of it will need to be copied to the checkpoint savvol.

0 Kudos
kjstech
3 Argentium

Re: Replication stalled on one filesystem and seeing Error 3024 on destination.

Thank you.  The time was off about 15 minutes.  I used the date command line su'd to root to correct this.  Seems we have to do this every year despite the NTP servers being added in.

Anyway support got the replication going again but it says that the current transfer estimated end time is Friday may 2.  Thats over a month away.  It was originally april 30th but then overnight it went out more.


They didn't delete anything... they just looked around and restarted it via command line.

Max Out of Sync Time (minutes) = 10

Current Transfer Size (KB)     = 64521864

Current Transfer Remain (KB)   = 38943384

Estimated Completion Time      = Fri May 02 03:03:44 EDT 2014

Current Transfer is Full Copy  = No

Current Transfer Rate (KB/s)   = 12

Current Read Rate (KB/s)       = 0

Current Write Rate (KB/s)      = 1005

Previous Transfer Rate (KB/s)  = 947

Previous Read Rate (KB/s)      = 795

Previous Write Rate (KB/s)     = 368

Average Transfer Rate (KB/s)   = 1007

Average Read Rate (KB/s)       = 1308

Average Write Rate (KB/s)      = 409

UGH....

0 Kudos
umichklewis
4 Beryllium

Re: Replication stalled on one filesystem and seeing Error 3024 on destination.

The problem is, the current write Rate is only 1MB/s.  It will take an eternity to transfer that much data at that speed.

Do you have any WAN optimization appliances (ie. Riverbed) in the path?

0 Kudos
kjstech
3 Argentium

Re: Replication stalled on one filesystem and seeing Error 3024 on destination.

No I do not have any of those devices.  Do you have any experience?  Would a wan optomizer like a Riverbed be beneficial here?  Not sure if theres any type of data that is more favorable.

Do you put one at each site, or just one at the HQ?  Does it sit inline between the router and the WAN or the core switch and the router, or is the device the router itself?

If its worth it, I'll surely look into it.

0 Kudos
Highlighted
umichklewis
4 Beryllium

Re: Replication stalled on one filesystem and seeing Error 3024 on destination.

Short answer, if you're using Replicator, I'd (personally) recommend AVOIDING WAN accelerators.  Why?

VNX Replicator does it's own work to limit the amount of data being sent on the wire, and computes CRCs for blocks sent.  WAN accelerators take the data being sent and apply their own compression techniques to the data in-flight.  This often makes the CRC or data transmitted invalid, and causes the VNX to re-send data, which is further modified and makes the situation worse.

Sure, there are WAN accelerators that are on the EMC Support Matrix for VNX, but in my experience, it's caused more problems that it's solved.  I'm sure they work great for other protocols, but all three of the environments I worked with (NS700G, NS40 and VNX5400) all had issues that were resolved by turning off WAN acceleration on the Replicator sessions/ports.  As with all things, your mileage will vary....