tolinrome1

29 Posts

3687

August 8th, 2012 11:00

RecoverPoint - Replicating large amounts of data

We use a vnx 5300 in our production environment as well as our DR environment with RecoverPoint in both places with a 20mb wan link connecting both.

In the RecoverPoint we have 3 consistency groups. Although all 3 groups occasionaly reach high load, their is one that is frequently at high load. What I'm trying to determine is why is there such a large amount of data being syncronized accross the WAN in such short time periods, seconds apart. If you look at the journal and sample images in literally seconds there are different images of MBs of data being replicated. Even large amounts 10mb, 19, 38, even 453 and 893mb! See screenshot. Where does the Sample image gather it's data from, especially the size in MB? I need to know what data in this group is actually being transferred and why so much. Thanks.

Responses(6)

dschul0323

5 Posts

0

August 14th, 2012 08:00

Hi tolinrome,

A complete answer would require a little more information like RP version, number of RPA's, bandwidth reduction settings in RP, latency, etc but we can start a conversation :-)

The bulk of you answer can probably be arrived at by looking at the source luns in the consistency group, the journal volumes on the target side, and your WAN. RP splits the writes to a replicated lun and sends them across your WAN to the remote site, if the WAN fills up or the target journal volume is too slow to receive the data then data queues up on the source side.

In my limited experience, and looking at your data, the issue is usually a larger amount of data being written to the source then you can pump through the WAN since all RecoverPoint is doing is replicating the writes to your source luns in all consistency groups. To keep the discussion simple a 20Mb link is equal to 2.5 MB/s of data, this of course excludes many factors like latency and retransmissions.

Just glancing at your numbers it appears that you are running some level of compression in RecoverPoint or perhaps WAN optimization as 445MB in 50 seconds on your 20Mb pipe is 3x+ compression if my math is correct (uncompressed it is at least 178 seconds to transmit that amount of data on a 20Mb/s line). If that is true then anything more than 7.5 to 10MB/s being written to the source is going to fill the pipe and start queueing up on the source side. When this condition occurs you will see larger chunks of data being sent on a less regular basis if you accept that RecoverPoint sends whatever is queued up when the last transmission completes.

For example at 11:54:36 RP finishes replicating a 132MB chunk, over the next 35 seconds 235MB of data is sent but during that same 35 seconds 445MB of additional data is written to the source lun and queued for replication. That 445 MB of data is replicated when the last 235MB chunk completes replication at 11:55:11 and it finishes at 11:56:01.

How much compression are you running on the consistency group (Bandwidth Reduction under the policy tab)? If you are running at least version 3.4 SP3 P2 do you have deduplication enabled? If you are not at that level or beyond don't check that box ;-)

My examples are overly simplistic since I am ignoring the fact that you have other CG's replicating at the same time but I work best with simple conversation.

These are just my observations and some guess work is involved but I hope it helps.

Cheers,

Doug

tolinrome1

29 Posts

0

August 14th, 2012 14:00

Hi Doug,

Thanks for the informative reply, I understand your logic and I have to look into a couple of the things you mentioned. The big question I have is - What is causing all these high writes? I understand that increasing the WAN link will help alot but I cant seem to think what it is in my environment that would cause these huge amount of writes over the WAN.

From what I understand about the VNX and RPA's etc (which is little so far) is that any (block level or is it file level?) write is replicated over the WAN. So, one 1 of the CG's I have 3 servers, Exchange, SQL and a 2008 DC. I cannot even think of all these huge amounts of data being sent every second - what dat acould possibly be requiring 235MB, 445MB, 560MB etc every second, even within the same second. Once the CG catch up and are synchronized then it starts all over again.

is it that every single time an email is opened and someone replies, that is a write to a DB and that makes a change and so on. I need to determine somehow what is actually being written to the journal from the source CG's. I'm still looking into this, thanks.

dschul0323

5 Posts

0

August 14th, 2012 15:00

I would use the VNX tools on the source side to look at write activity to the luns you are replicating. That should lead you to the busy application. RP is simply replicating the application writes.

With the applications you mention I would bet the traffic is SQL, especially if Exchange is greater than 2003, but look at the VNX performance collection and I bet it will be pretty clear :-)

Best wishes!

etaljic81

1K Posts

0

August 14th, 2012 17:00

What exactly are you replicating in that CG? Is it just dabase and log LUNs? Is it possible that you are replicating the page file? A page file replication can explain these #s.

S

StephN_AU

51 Posts

0

August 20th, 2012 19:00

If you are replicating SQL, is the tempdb drive also being replicated ? I don thave much experience with Recover Point but we have several SQL apps on our array that have insanely busy tempdb's - Check with your DBA's but I would assume these dont need to be replicated - a bit like a page file - an a system reboot SQL tempdbs are recreated so none of the data in it is persistant.

HTH

Steph

tolinrome1

29 Posts

0

August 21st, 2012 11:00

No, its not the temp db's we have that on another san thats not replicating and I'll have to check into the page file. But for now I'm going to try and figure out how to monitor lun writes on the san and maybe that might help pinpoint this problem. Thanks.

View All

No Events found!