kjstech

1 Rookie

•

358 Posts

0

2978

March 21st, 2014 12:00

Replication stalled on one filesystem and seeing Error 3024 on destination.

I have one filesystem called vm1-sas that is not replicated to our destination. It says last full replication was Feb 27th. On the destination I do have an alert created Feb 28 that says :Slot2:1393595868: /nas/sbin/rootnas_fs -x root_rep_ckpt_88_154419_1 QOSsize=20000M Error 3024: There are not enough free disks available to satisfy the request..

Is this something I can resolve myself?

server_mount server_2 | grep ckpt

root_rep_ckpt_90_32645_1 on /root_rep_ckpt_90_32645_1 ckpt,perm,ro,accesspolicy=NATIVE,nolock

root_rep_ckpt_90_32645_2 on /root_rep_ckpt_90_32645_2 ckpt,perm,ro,accesspolicy=NATIVE,nolock

root_rep_ckpt_51_37708_1 on /root_rep_ckpt_51_37708_1 ckpt,perm,ro

root_rep_ckpt_51_37708_2 on /root_rep_ckpt_51_37708_2 ckpt,perm,ro

root_rep_ckpt_79_81203_1 on /root_rep_ckpt_79_81203_1 ckpt,perm,ro

root_rep_ckpt_79_81203_2 on /root_rep_ckpt_79_81203_2 ckpt,perm,ro

root_rep_ckpt_88_154419_1 on /root_rep_ckpt_88_154419_1 ckpt,perm,ro

root_rep_ckpt_88_154419_2 on /root_rep_ckpt_88_154419_2 ckpt,perm,ro

root_rep_ckpt_92_160596_1 on /root_rep_ckpt_92_160596_1 ckpt,perm,ro

root_rep_ckpt_92_160596_2 on /root_rep_ckpt_92_160596_2 ckpt,perm,ro

Responses(10)

Jyothi_P_Bharat

317 Posts

0

March 24th, 2014 03:00

Hi,

Please refer to the EMC KB article emc112050 which may help you.

Thanks

Jyothi

Peter_EMC

674 Posts

0

March 24th, 2014 06:00

The failing command wanted to extend the checkpoint savvol by 20 GB. But there was not enough free disk space in the storage pool to satisfy this request.

As this messages are more than 3 weeks old, the replication will be unable to restart incremental.

The fastest way back to a synchronized copy will be, deleting the replication and the destination fs and restart the replication from scratch.

K

kjstech

1 Rookie

•

358 Posts

0

March 24th, 2014 06:00

Ok well I was able to make 500GB available in this storage pool, so how can I kick off this replication again? I don't want to start over it will take a long time to replicate 1.2TB 30mbps during hours, 40mbps after hours, plus daily change rate.

Unisphere doesn't really do much, I click stop and then refresh but it's like the stop command never makes it. In fact I can't do much in Unisphere. To unshare, unmount and delete an old filesystem, I had to do everything command line because Unisphere just blinks and then sits there staring at you. Should I be doing this from the destination or the source side?

Also I get all these warnings in Unisphere, despite both sides being pingable, having the same password and looking at the same NTP server for time.

Jyothi_P_Bharat

317 Posts

0

March 24th, 2014 06:00

Hi,

You can also find more information in EMC KB article emc183815

Thanks

Jyothi

Peter_EMC

674 Posts

0

March 24th, 2014 23:00

> Query file systems All:All. HTTP communication not permitted: Time skew between local and remote system dcusan1 may exceed 10 minutes OR nas_cel passphrase mismatch exists. details...

Check the Controlstation time of both systems, seems there is a difference of more than 10 minutes.

I can only repeat my advice: deleting the replication and the destination fs and restart the replication from scratch will save your working time and will be faster than trying to do incremental restart, running again out of space and then restarting from scratch.

Even if you are able to restart the replication incrementaly, I doubt your destination space will be enough, because 27 days * 2 % change rate = 54 % ... most of it will need to be copied to the checkpoint savvol.

K

kjstech

1 Rookie

•

358 Posts

0

March 25th, 2014 10:00

Thank you. The time was off about 15 minutes. I used the date command line su'd to root to correct this. Seems we have to do this every year despite the NTP servers being added in.

Anyway support got the replication going again but it says that the current transfer estimated end time is Friday may 2. Thats over a month away. It was originally april 30th but then overnight it went out more.

They didn't delete anything... they just looked around and restarted it via command line.

Max Out of Sync Time (minutes) = 10

Current Transfer Size (KB) = 64521864

Current Transfer Remain (KB) = 38943384

Estimated Completion Time = Fri May 02 03:03:44 EDT 2014

Current Transfer is Full Copy = No

Current Transfer Rate (KB/s) = 12

Current Read Rate (KB/s) = 0

Current Write Rate (KB/s) = 1005

Previous Transfer Rate (KB/s) = 947

Previous Read Rate (KB/s) = 795

Previous Write Rate (KB/s) = 368

Average Transfer Rate (KB/s) = 1007

Average Read Rate (KB/s) = 1308

Average Write Rate (KB/s) = 409

UGH....

umichklewis

1.2K Posts

0

March 31st, 2014 07:00

The problem is, the current write Rate is only 1MB/s. It will take an eternity to transfer that much data at that speed.

Do you have any WAN optimization appliances (ie. Riverbed) in the path?

K

kjstech

1 Rookie

•

358 Posts

0

March 31st, 2014 08:00

No I do not have any of those devices. Do you have any experience? Would a wan optomizer like a Riverbed be beneficial here? Not sure if theres any type of data that is more favorable.

Do you put one at each site, or just one at the HQ? Does it sit inline between the router and the WAN or the core switch and the router, or is the device the router itself?

If its worth it, I'll surely look into it.

umichklewis

1.2K Posts

1

March 31st, 2014 08:00

Short answer, if you're using Replicator, I'd (personally) recommend AVOIDING WAN accelerators. Why?

VNX Replicator does it's own work to limit the amount of data being sent on the wire, and computes CRCs for blocks sent. WAN accelerators take the data being sent and apply their own compression techniques to the data in-flight. This often makes the CRC or data transmitted invalid, and causes the VNX to re-send data, which is further modified and makes the situation worse.

Sure, there are WAN accelerators that are on the EMC Support Matrix for VNX, but in my experience, it's caused more problems that it's solved. I'm sure they work great for other protocols, but all three of the environments I worked with (NS700G, NS40 and VNX5400) all had issues that were resolved by turning off WAN acceleration on the Replicator sessions/ports. As with all things, your mileage will vary....

K

kjstech

1 Rookie

•

358 Posts

1

March 31st, 2014 08:00

Thank you. That saves me a few grand in investing in wan acceleration.

View All

No Events found!

Celerra

Replication stalled on one filesystem and seeing Error 3024 on destination.