Unsolved
This post is more than 5 years old
3 Posts
0
1915
March 4th, 2010 11:00
RecoverPoint and SRM for DR
We've had RecoverPoint running for DR purposes for about a year now. We just installed and configured SRM for VMware. We only use RecoverPoint for DR, so we don't really care how many snapshots we have at the DR location, just that we have the latest possible snapshot. We need to test our DR senario from time to time, but need to do it in a test network "bubble" with no connection to our normal network at all. SRM provides for that. However, we are finding that when SRM brings up a snapshot volume (with virtual access, I think) the servers we're testing run out of space within 10-15 minutes and blue screen.
We are assuming that's because the changes being made in production are being queued in the Remote Journal LUN as well as any changes we're making to the snapshot image we're testing with. I was wondering if the Source and Remote Journal LUNs have to be the same size? We have SAN space at the DR site to increase the Journal sizes, but not on the production side. If they can be bigger at the target site, is there a way to keep the snapshots from filling up the LUN so we can have as much free space for testing as possible? Thanks.
Doug


RickBrunner1
117 Posts
0
March 4th, 2010 17:00
Doug,
Niv's response is spot on if you are running RP 3.3 , which since it has not GA'd yet, I can assume you are not. If you upgrade to RP 3.3 when it GA's in a couple of weeks, then Niv's suggestion are one option for resolution.
In your current version, you are correct using SRM in test mode accesses the volumes through virtual access, In RP 3.3 we use logged access.
In Virtual Access we limit the amount of change to 40GB per RPA. This space is used to track the changes to the replica lun of the DR host, and is not used to track the data being replicated, the journal itself is used for that. When the virtual access area is depleted a failed IO is returned to the DR HOST.. if it happens to be a VM system drive, then a blue screen would be generated.
I would probably guess that you are using the default location for VM swap file space in your environment as you mention the short time it takes to fill up. The reason that this makes a difference is that when Vmware powers a VM up, they zero the swap space, which means that you create xMB * #VM's * swap_file_size when you do the DR Test. Assuming 10 VM's each 4GB of memory assigned and a default swap file space, this would equal to 40GB of change at boot, which potentially could fill up the virtual access area. Even if you had 2GB VM's you would still consume 50% of the space and leave only 20GB for any additional changes the application need to make during your test.
Since by default .vswp files are created on the same device that the .vmx files are stored, the .vswp file will add to the overall usage of the replicated infrastructure ( both san fabric, RecoverPoint, and bandwidth) even though the .vswp files are of no use in disaster recovery scenarios. As a best practice recommended by Vmware (SEE the SRM release notes) and EMC it is advisable to change the vm swap file locations to point to a non-replicated VMFS san LUN at both the source and target sites. (In fact you should look at this as a good time to review all application/os swapfile and scratch space usage and determine if there is a good way to isolate these to none replicated luns. )This does require a reboot of the VM for this change to go into effect. I would typically do this by going into the settings of each ESX server and assigning a unique vmfs lun for each esx server for exclusive swap file use and in this manner you balance the load. The location of the .vswp file for a given VM will then be determined by which ESX server it is running on. This is a recommendation regardless of the version of RecoverPoint in use. In fact this is a recommendation when replicating Vmware with any replication tool period. It makes both technical and business since to adhere to this best practice.
This should help you in your current version, but ultimately the long term and correct solution to this is to use RP 3.3 since it changes the access method to use logged access and allows the change tracking area to be larger based on the policy and or increasing the journal size.
Hope this helps
-rick
dareames
3 Posts
0
March 8th, 2010 13:00
Thank you both for your answers! I didn't even know about the ability to increase the portion of a journal for TSP.
Rick, are you saying that each RPA has 40GB set aside for tracking changes to the replica LUNs? If so, where is this located, local to each RPA or on the SAN somewhere? Can it be increased? The data being replicated to the DR site during a test is stacking up in the journal on the DR side or the Source side? Basically what you are saying is, there is no way to do a big functionality test in DR with version 3.2. If we upgrade to 3.3 when it is released, it will use logged access mode. Where will the changes to the replica LUNs be placed? How can we control the size of it?
You are correct. We use the VMware defaults for .vswp files. They are stored with the VMs. We are currently looking into following your suggestion; configuring each host to have a dedicated LUN for .vswp files. It would be pretty easy to do this for the .vswp files, but not so easy for the Windows swap files within each VM.
By the way, how will I know when the RP 3.3 GA's? Thanks!
Doug
RickBrunner1
117 Posts
0
April 7th, 2010 05:00
Doug,
Yes.. for virtual access mode each RPA is limited to 40GB of change tracking local to that RPA if I recall.. but need to validate.
No it can not be increased.
Data being replicated is always recorded in the journal itself at the DR site, one of the unique features of RP is that you can do a DR test, but still protect your production site with replication.. ie do not need to pause the replication during a test.
In logged access mode changes are tracked in the journal Target Side Process area ( default is 20% of the overall journal size but can be increased via GUI/CLI policy change on a per copy/journal basis)
Sorry for the delay in answering .. too busy of late and must have missed the alert on the questions
.. but in any case.. RP 3.3 GA March 17th I believe and RPA 3.3 SP1 will GA in June
-rick