tolinrome1

29 Posts

2705

November 12th, 2012 11:00

Data writes and RPA

Hello, I'm tryinng to understand better how data replication occurs and what is exactly being replicated. I read that the RP is block based and not file based, so any change on the block level will be replicated to the other to the remote journal. Since my consistency groups are often in High Load and I know for sure it cannot eb the amount of data that we are writing, I cant figure out what is causing these writes and high load. So my question are:

1. Besides the actualy writing of data, what constitutes a write to the recoverpoint? I'm thinking maybe it could be backups? If a database or file server is being backed up does this cause disk writes? Could ti be anyting else that is causing these writes? Possibly page file or anything else?

2. If I look at the diagram (see file) of the RP manage application and see the consistency group, I see that replicating is going from the production site to the remote journal. Why is the local journal greyed out?

1 Attachment

rpa.PNG

Responses(8)

forshr

1.1K Posts

0

November 12th, 2012 12:00

Hi there,

The comments belwo might be helpful.

1. The splitter captures writes only and sends these to the RPA. It's perfectly feasible that backups could be causing writes if the target of the backup is a source LUN in a CG. However, backups can constiture highloads in other ways, e.g. if they are used in conjunction with image access and the distribution of the writes to the journal cannot be completed fully because the image is in a prolonged stated of access to faciliate a backup.

2. This is normal and is depicting the data writes to the remote/target journal as part of the dsitribution phase. Writes are not written to the local journal expect in the event of failover.

I would recommend running the CLI 'detect_bottlenecks' command using option 3 for highload detection in an attempt to determine what is causing the highloads. Highloads during normal operation (without initialization) occur for three main reasons; excessive throughput at the RPAs, Slow WAN or slow write distribution at the remote journal/replica.

Regards,

Richard Forshaw

Principal Corporate Systems Engineer - RecoverPoint (EMEA)

Data Mobility Business Unit - Enterprise Storage Division

EMC2 Computer Systems (UK) Ltd

tolinrome1

29 Posts

0

November 12th, 2012 12:00

Hi Richard,

The backups target either tape media or a LUN that is not being replicated and the RPA's do not see, so I suppose that cannot be it. I do not understand what you mean by this below: (if you can explain it to me in other terms please)

"backups can constiture highloads in other ways, e.g. if they are used in conjunction with image access and the distribution of the writes to the journal cannot be completed fully because the image is in a prolonged stated of access to faciliate a backup."

I did previously use the CLI detect bottlenecks command and sent the report to support and they didnt see anything out of the ordinary. I know that if we increase bandwidth that would help but it would probably just mask the constant writes that would still be happening. I'm searching for a way to see disk writes in the vmware environment.

forshr

1.1K Posts

0

November 12th, 2012 14:00

We recommend the use of RecoverPoints VSS (kvss) utility from v3.5 to facilitate database consistent images in conjunction with the bookmarking process for Exchange or SQL databases. I'm not aware of instances of highloads being associated with this type of activity.

You have mentioned a VMware environment. Do you perform any vmotion or svmotion activities?

In respect of my previous comment relating to highloads due to prolonged image access, this is due to an image being placed into logged image access mode and the RPA being unable to complete the distribution of new writes. Thisi is because image access prevents the completion of phases 2 - 5 of their distribution to the journal and replica. Only phase 1 is completed which is the write to the head of the 'do' queue in the journal. You can see this as journal lag in the Journal tab of the CG copy folder.

Eventually the 'do' queue will fill up and writes are also written to the 'undo' queue which acts as a back-up buffer. This invokes a state of permanent highload and eventually pauses the replication. At this point the local RPAs will send metadata of new writes to the local journal and no replication will take place until image access is disabled.

Can you confirm your version and whether you are using compression with deduplication.

tolinrome1

29 Posts

0

November 12th, 2012 14:00

I'm wondering if Microsoft VSS Snapshots could be causing this on one of the servers that are being replicated, and how so?

D

DonRush

12 Posts

0

November 12th, 2012 14:00

To also add if this is a VMware environment are you replicating the VMswap? If the ESX server is having to swap to disk that could case the highloads. In a RecoverPoint setup you do not need to replicate the VMswap.

tolinrome1

29 Posts

0

November 12th, 2012 14:00

I have version 3.4 SP2 and using medium compression and deduplication. Thanks.

forshr

1.1K Posts

0

November 12th, 2012 15:00

The old vmswap replication debate. Should I or shouldn't? What is the best practice? Well there's no wrong or right answer. Replicating swap will incur a high overhead for both the RPAs and the link. I guess as long as they don't cause any performance degradation to the environment as a whole or detriment to other CGs then they can be replicated.

Certainly worth raising as a potential issue - thanks Don.

My recommendations for immediate action would be to run the CLI 'balance_load' command, upgrade to 3.4 SP4 and turn off dedup for a period of time.

tolinrome1

29 Posts

0

November 12th, 2012 16:00

yes, we are replicating the vmswap. I'd have to research it and see what, if any, repurcussions might take place if we decide not to replicate it.

thanks for the recommendation to upgrade to 3.4SP4 and th ebalance load and turning off the dedup. Of course before doing any of that I'd have to explain why doing those things may improve the performance, which would take research on my part.Thanks.

View All

No Events found!