Victor2100

10 Posts

3627

December 6th, 2009 04:00

Celerra replicator V2: what are the differences between "switch over" and "reverse" ?

I just had my Celerra upgraded to 5.6 plus V2 replication. While reading the technotes and playing with replication control, I found that the so called "Administrative failover" under 5.5 code level, can now be done either by "switchover + restart/reverse" or just simply by "reverse" action.

Unless the purpose is to pause replication for a while after enabling NAS services at the destination side, "reverse" seems like a better option than "switchover". Does "switchover" have more value of simulating "failover" for DR test ?

Meanwhile, source/destination Celerras are normally not described as Active/Active, which is alway true for a specific replication session. But looks like there is nothing wrong if: Celerra 1 as source of VDM1, detination of VDM2; Celerra 2 as destination of VDM1, source of VDM2.

Can someone bring more lights to the question and thought ?

Thanks,

Victor

Responses(8)

Dan_McNair

17 Posts

0

December 7th, 2009 11:00

Hi, Victor.

Bill explained them pretty well, so I won't try to do that. I will offer an example, though, that may help clarify switchover/reverse. It is helpful to think about those two commands in the context of a migration. Both commands are very useful for migrating file systems between Celerras because they will have no data loss. So let's use an example where you order a new Celerra and you want to migrate your data to it...

In example 1, you want to repurpose the older Celerra for some other purpose. Maintaining a copy of the migrated data on the older Celerra is not required. In this case you would use the Switchover command after the initial sync is finished because it would literally "switch over" to the destination. The older system (the one that was originally the source) would no longer receive updates from writes to the file system.

In example 2, you want to keep the older Celerra as a DR system. You need to maintain a copy of the migrated data on the older system with replication from the new system to the old system. In this case you would use the Reverse command after the initial sync is finished because it would do everything that Switchover does plus ensure that replication is started from the new system to the old system to keep the two sides in sync.

Understanding the value of the two commands for a migration will help you to see how they could each be used to test DR... it just depends on what you are trying to accomplish.

Note that both Switchover and Reverse use the same initial steps to make the destination Read-Write (1-3):

1. take the source read-only (destination is already read-only)

2. copy the last remaining delta set (at this point neither source or destination can be written)

3. make the destination read-write/production (you'll likely need to redirect hosts/apps to the destination)

4. (Reverse operation only) start the replication again in the reverse direction

If you want to minimize the amount of time that final copy (step 2) takes then look at your RPO (max time out of sync) setting for the replication session and consider lowering it shortly before the Switchover/Reverse command so that less data needs to be copied in step 2. It'll really depend on the change rate of your data set as to how low you want to go. If you've got a 24-hour RPO specified you may want to take it down to 30 min or 15 min until you've finished the switchover/reverse, then change it back to 24 hours. This isn't required, but it is a strategy that can help to minimize your outage windows during a Switchover/Reverse by letting less data accumulate in the delta set.

Cheers.

-Dan

BillStein-Dell

Moderator

•

285 Posts

1

December 6th, 2009 20:00

Hi Victor.

The "administrative failover" option that was added is called "switchover." Switchover will ensure that you do not lose any data in the transition, so it makes perfect sense to use it for failover testing. During a switchover operation, the following things take place:

Writes are suspended to the source filesystem.
A deltaset is sent to the destination side.
Once confirmed, the source side is set to R/O and the destination side is set to R/W.

This is different from the "failover" option, in which there is no deltaset generated; the source is immediately set to R/O (if the source can be reached) and the destination is immediately set to R/W. During a failover, you will lose any data since the last deltaset.

Note that once you have switched over or failed over, replication is still set to run from the original source to the original target, although there will not be any updates made. This is where the "reverse" option comes in. Reverse will change the direction of the replication, so the original source becomes the destination and the original destination becomes the source. If you intend on running on your destination side for any period of time, you will have changes that will need to be replicated back to your source side. Thus, you would "reverse" replication so that the destination is replicating back to the source. This is useful depending on how you want to run your failover test. You may choose to discard the changes made at the destination side (for example, if you are doing a failover test and you simply want to ensure you can write to your destination side correctly, but it is test data only), in which case you would not reverse your replication. When you did your next switchover (to return to normal operations) and your source became R/W again, you would overwrite your destination and discard any of the changes made there. Or, you can switchover and reverse your replication so that you keep all of the changes made at the destination. This way, you could run on your destination for an extended period of time and still switch back in a very short amount of time, because there wouldn't be many changes to send across, just one deltaset at the beginning of your switchover to return to normal ops

These options provide you with far more flexibility in using replication during DR testing. You can switchover, test replication, and switch back and simply discard the changes made at the destination, or you can switchover and reverse your replication so that you keep all of the changes made at the destination. All these new options allow you to test replication failover without the risk of losing any data doing it.

Victor2100

10 Posts

0

December 7th, 2009 11:00

Hi Dan,

Thanks for the examples, which confirmed what I thought.

Any comments for:

There is a CIFS replication doc for 5.5, which highlighted that if a failover happened while the source side is unavailable, special action is required to ensure the original source DMs won't be connected to the network until its CIFS unloaded, filesystem changed to R/O -- to avoid confliction.

I don't see the same info in 5.6 doc set.

Thanks,

Victor

Victor2100

10 Posts

0

December 7th, 2009 11:00

Hi Bill,

Thanks for the explanation.

I understand the difference between failover/switchover.

Between switchover/reverse, may I draw a conclusion from your info that: if for some reason, the potential updates to be happened on the original destination side may be dropped after a DR test, switchover will be the sole option to use, as it will stop the replication until further instrctions.

If the preference is to keep all updates from both sides and have them replicated back to the original source, both simple reverse and switchover + restart with reverse can meet the requirement. As I tested, reverse option will change the current replication direction, no matter if failover or switchover has happened before. I assume reverse will ensure data get syned, same protection as switchover, before changing the direction,

For reverse after failover under 5.6, I'm not sure if something is missing. There is a CIFS replication doc for 5.5, which highlighted that if a failover happened while the source side is unavailable, special action is required to ensure the original source DMs won't be connected to the network until its CIFS unloaded, filesystem changed to R/O -- to avoid confliction.

I don't see the same info in 5.6 doc set.

Could you advice more ?

Thanks,

Victor

Dan_McNair

17 Posts

1

December 7th, 2009 12:00

Hi, Victor.

I suspect that is warning you to consider potential IP address, WINS/Netbios, and DNS conflicts. You don't want to have two systems or CIFS servers trying to use the same IP address, WINS/Netbios name, or DNS name at the same time. If you use completely different IP addresses, WINS/Netbios name, and DNS name at the destination I don't think it should be a problem. Otherwise you need to have a plan for dealing with this. Perhaps Bill Stein or someone else who is more technical than I am can offer additional details on this.

Cheers.

-Dan

BillStein-Dell

Moderator

•

285 Posts

1

December 7th, 2009 12:00

There is a CIFS replication doc for 5.5, which highlighted that if a failover happened while the source side is unavailable, special action is required to ensure the original source DMs won't be connected to the network until its CIFS unloaded, filesystem changed to R/O -- to avoid confliction.

Dan is correct that you want to avoid "split-brain" syndrome or duplicate CIFS servers on your network. If your source was offline or otherwise unavailable during the failover, it would not have been set R/O, and VDMs would not have been unloaded, so the CIFS servers would still be online and the filesystems would still be accessible if it came back up. You would have to ensure that the destination was offline or the VDMs unloaded before you brought the source back online.

I think the lack of documentation in the 5.6 manual is simply an oversight.

Victor2100

10 Posts

0

December 7th, 2009 13:00

Thanks for the help,

Victor

dynamox

2 Intern

•

20.4K Posts

0

April 4th, 2012 08:00

please start a new thread

View All

No Events found!