jgebhart1

14 Posts

3252

March 1st, 2016 10:00

XtremIO + RecoverPoint Limitations

I've requested a code update to the most recent version, but to my knowledge, there's what I consider to be pretty severe limitations when replicating XtremIO volumes with RecoverPoint.

First, there is a limit of 128 Consistency Groups per RP cluster, if I'm not mistaken. If I create a consistency group per copy, then I'm limited to replicating 128 volumes within a single RP cluster.

So my thought was that instead of one Consistency Group per copy/production volume, I'd create one Consistency Group per VMware cluster (since my XtremIO array is a home for VMware data only). However, the problem with that is if I have to add capacity to the VMware cluster, it wipes out the journal when I add a new production volume to a Consistency Group, or remove one to resize it, then add it back in.

Am I correct that these two limitations still exist in the current version of RecoverPoint?

If so, how are the rest of you users of XtremIO and RecoverPoint working around these limitations?

Responses(10)

chrismahon

18 Posts

0

March 1st, 2016 11:00

Hi jgebhart!

You are correct with the 128 CG limit with XtremIO / RecoverPoint. However, that does NOT mean you can only replicate 128 volumes as you can have multiple volumes per CG.

You are also correct with your thinking about the modifying a CG's replication sets. That has always been the case to my knowledge. Lately, I have been advising customers to look at RecoverPoint for VMs especially for these concerns and cross-consistent bookmarking concerns where VMware is used.

One way around this is create fewer, larger datastores to stay under the 128 CG limit if you are creating a single CG per volume. I'm curious to hear what others can provide to this!

jgebhart1

14 Posts

0

March 1st, 2016 11:00

I chose not to go with RecoverPoint for VM because I wanted a single replication method for the array, and it's conceivable that we could have ended up with volumes on the XtremIO array that don't below to a VM.

And the problem I see with creating larger datastores is simply that they become more cumbersome to replicate. I use 2TB datastores because 2TB will replicate in a reasonable amount of time over the WAN connection we have.

I'm really starting to miss SnapMirror... so simple and effective, comparatively. Resizing volumes and LUNs is no problem... just deltas are sent no matter what as long as you have a common snapshot.

jgebhart1

14 Posts

0

March 1st, 2016 12:00

Avi, how do you typically see datastores and datastore clusters allocated? I've chosen (and it's not too late to change it) to create datastores based on the data protection requirements of the data. For example, one with no data protection at all (guest swap disks, tempdb disks, etc.), one with only local snapshotting performed on the XtremIO (where operational recovery is convenient for the admin, but not required by SLA), and a third with remote replication to our DR site (where bound by SLA with our customers).

Basically, I will create enough 2TB volumes/datastores of each type to satisfy our capacity requirements for each type, and create a datastore cluster for each type. So there's one "remote replication" datastore cluster per VMware cluster, and I've put all those volumes in a CG.

K

Kumar_A

727 Posts

0

March 1st, 2016 12:00

Typically customers create the consistency group which provides some logical level of separation. Creating a CG per volume or creating a CG for the entire virtual environment are the two extremes which is not typically done in the field. For example, if you want to maintain consistency at the application level, you may want to create a consistency group of a set of VMs that are involved with that application.

echolaughmk

522 Posts

0

March 1st, 2016 17:00

I'm not sure how you would create a CG for a group of VM's unless those VM's were 1:1 with the datastores that their VMDK's live on and then that would likely go against any LUN consolidation efforts with the XtremIO. Otherwise, given you are replicating at the LUN level, grouping by VM's at the CG level might be confusing since you have many VM/VMDK's sitting on a datastore in most scenarios.

If you are tiering your datastores/datastore clusters like you state above, then with the remote replication datastores - are you placing all of your VM's that require replication on those datastores? Or are you breaking those up into multiple LUNs where the different datastores hold VM's related to each other? Given either way, I wouldn't worry too much about the 2TB limit with RP/XtremIO replication from my experience - is there something specific you are worried about?

I think the tiering you have setup is OK and it really comes down to how you want to failover - at a VM level or LUN level. If you want the former, something like SRM could be ideal for you. If you have to group CG's together, you could also use group sets, though parallel bookmarking is not supported.

jgebhart1

14 Posts

0

March 2nd, 2016 07:00

We're a little weird in the way we do things. We have right around 300 VMs, and there's an extremely small number of those (3-5% is probably a high estimate) which we would ever desire to recover in full at our DR site. Instead, our SOP is to create unique VMs at the DR site which basically sit and wait for us to attach data volumes to them from production either in a real DR situation, or more often, in a DR Test situation and then just start the application. For the most part, all we care about replicating is the data volumes, not the actual VMs or their OS disks or backup disks.

SRM won't work for us due to limitations in the way we've architected several parts of our infrastructure and DR plan.

The 2TB limit I'm speaking of is a self-imposed limit. It has nothing to do with my confidence in the capabilities of XtremIO or RecoverPoint. It has more to do with our available WAN bandwidth and how long it would take to "re-replicate" a 4TB+ volume vs a 2TB volume if we resized one and how likely we'd be to violate SLAs and put our RPO and RTO at risk by losing restore points by modifying a CG and wiping out the journal. I suppose we can get around that partially by using a snapshot schedule on the destination side, and then just recover using snapshots outside of RecoverPoint, if we had to.

Also, we're only going to be replicating about 40 volumes to start with... so we've got some room to grow before we hit the limit of 128 CGs per RP cluster. I'd just like to architecture the solution with some forethought so we don't have to redo things later on since we are seeing about 50% YOY growth in our VMware environment.

Cherry251

28 Posts

0

March 4th, 2016 13:00

Avi,

Imagine an all database environment being hosted on XtremIO together with SRM and RecoverPoint. If the requirement is to have the fail-over on a database server level, the only option in this case is to dedicate a datastore per database server, then create a CG per that server. We could only protect 128 such servers as 128 is the CG limit in RecoverPoint.

B

boyler05

143 Posts

0

March 17th, 2016 12:00

I'll just throw in one additional tip for all the VMware folks out there -- VMware vSphere Replication, included in your existing vSphere licensing, allows for "per VM" replication (and individual VM failover/recovery). It's super simple to set up and works great. Version 6 and later support compression across the WAN. It's not as robust as RecoverPoint (only 24 recovery points are supported, there is no concept of "consistency groups", and the smallest RPO is 15 minutes), but it's a great solution and you already own it, so it might be worth exploring, at least for the "bottom tier" of VMs. Then maybe use RecoverPoint for the "top tier." The best thing I like about vSphere Replication is that you can failover individual VMs, rather than entire LUNs. That makes DR testing for certain apps really easy -- just power up the replica VM at the remote site with the NIC disconnected.

A

Anonymous

5 Practitioner

•

274.2K Posts

0

March 17th, 2016 13:00

You really should be comparing it to RP4VMs which does have individual VM replication and failover with the added benefits of Consistency Groups, RPO’s from ) to seconds or minutes, RDM replication, advanced WAN acceleration, VMDK exclusion, and others.

Mark Collins

(720) 841-1935

K

Kumar_A

727 Posts

0

March 18th, 2016 14:00

And there is a promotion around RP4VM and XtremIO coming soon. Feel free to reach out to your account team and have them contact us if they are not already aware of it.

View All

No Events found!