Unsolved
This post is more than 5 years old
2 Intern
•
20.4K Posts
0
9477
SyncIQ - clean up after failback
Hello guys/gals,
Below is copy/paste from OneFS WebAdmin guide. I have gone thorugh multiple failover/failback on my virtual cluster without issue, steps are very simple. One thing that i don't undertand is why after we complete the failback process we do not get rid of the _mirror policy. It did its job, it pushed new/changed data back to primary cluster so why do we need it. Is it an over-site in the documention or there is a good reason to keep this policy around ?
Thank you
**********************************************
Procedure
- On the primary cluster, click Data Protection > SyncIQ > Policies .
- In the SyncIQ Policies table, in the row for a replication policy, from the Actions column, select Resync-prep.SyncIQ creates a mirror policy for each replication policy on the secondary cluster.SyncIQ names mirror policies according to the following pattern: _mirror
- On the secondary cluster, replicate data to the primary cluster by using the mirror policies.You can replicate data either by manually starting the mirror policies or by modifying the mirror policies and specifying a schedule.
- Prevent clients from accessing the secondary cluster and then run each mirror policy again.To minimize impact to clients, it is recommended that you wait until client access is low before preventing client access to the cluster.
- On the primary cluster, click Data Protection > SyncIQ > Local Targets .
- In the SyncIQ Local Targets table, from the Actions column, select Allow Writes for each mirror policy.
- On the secondary cluster, click Data Protection > SyncIQ > Policies .
- In the SyncIQ Policies table, from the Actions column, select Resync-prep for each mirror policy.
After you finish
Redirect clients to begin accessing the primary cluster.
**********************************************
bhalilov1
114 Posts
0
August 10th, 2015 13:00
When the *_mirror policy is created on DR cluster, protection domain is created on your PROD cluster, you can see it if you run
isi_classic domain list
When you're finished and failed back to PROD, the domain is in Writable state. If you delete the *_mirror policy the domain also gets deleted. The next time you failover and run "isi sync recovery resync-prep" to failback, you will have to wait for that domain mark job to run again - it may take long time.
On the other side of the coin, if you decide to keep the mirror policy and the protection domain on your PROD cluster, you will not be able to mv files to the directory that's under the domain, even if it's in "Writable" state
bhalilov1
114 Posts
0
August 10th, 2015 13:00
Yes,
Step 6 changes your protection domain in PROD cluster from "Write Disabled" to Writable
if you run the forward policy at this point it will fail, since your DR side is still also Writable
in step 8 by running the resync-prep from PROD you change the state of the domain in DR to "Write Disabled"
dynamox
2 Intern
2 Intern
•
20.4K Posts
0
August 10th, 2015 13:00
Hi Burhan,
i guess i don't understand what happens in step 8 ? Why do we need to run resync-prep ? My data has been copied from DR back to primary cluster via the mirror policy and now i am just ready to restart my primary --> DR replication. Does step 8 do something to DR cluster location to prepare it to become replication target again (makes it read/only) ?
Thank you
sluetze
300 Posts
1
August 11th, 2015 00:00
the mirror policy also leaves a snapshot (at least under 7.0.2.x) that grows like "nether world" (this americans with their censorship of common not harmful words like hell) - in our case several TB in a few weeks and is not needed, since it is the snapshot that gets created on the source of a replication. It is deleted automatically after one year. We started to delete it right after we failed back, to not waste unnecessary clusterspace
sluetze
300 Posts
0
August 11th, 2015 02:00
"SIQ-%ID-latest"
The %ID is the id of the policy.
i had an SR# with EMC and they said they are useless. Currently we are trying to either use our processtool for DR to automatically delete these snaps or we will create a feature request to let them delete automatically inside OneFS.
Regards
--sluetze
dynamox
2 Intern
2 Intern
•
20.4K Posts
0
August 11th, 2015 02:00
Burhan, this make sense but is there any reason to keep the mirror policy around after completing step 8 ?
dynamox
2 Intern
2 Intern
•
20.4K Posts
0
August 11th, 2015 02:00
Do you remember the name of the policy, i want to check on my 7.1.x cluster.
johnsonka
130 Posts
0
August 11th, 2015 06:00
Hello Dynamox,
I conferred with the subject matter expert for SyncIQ in Support and he stated that the reason we do not recommend removing the mirror policy is it destroys the LIN map being used by it. So, if you were to have to redo the failover process it has to recreate the LIN map for the mirror policy from scratch. This can be a HUGE time penalty when your failover window is tight and there are millions of files to scan in to creating the new LIN map to be used by the new mirror policy once failover is complete.
I hope this answers some of your question, but please let me know if there is any other information that I can get for you! I am more than happy to help.
dynamox
2 Intern
2 Intern
•
20.4K Posts
0
August 11th, 2015 17:00
Katie,
I am still not following you, maybe you can work through the below steps with me and explain what's happening with these SIQ snapshots and then how the mirror policy is playing its role.
here is a test scenario:
policy name = upgrade_ir
1) I am replicating from primary cluster to secondary cluster. When i check on primary cluster i see snapshot SIQ-3d510f0edc31e18bdb2100dc1441306e-latest and on secondary cluster i see snapshot SIQ-Failover-upgrade_ir-2015-08-11_19-18-35
This is what i consider my normal state.
************
2) i am failing over from primary cluster to secondary cluster. Upon completing that step on primary cluster i still see the same snapshot but on secondary cluster i now see two snapshots:
SIQ-3d510f0edc31e18bdb2100dc1441306e-restore-latest
SIQ-Failover-upgrade_ir-2015-08-11_19-33-34
***********
3) I am failing back from secondary back to primary. I go to primary cluster, select my policy and hit "Resync Prep". At this point i look on primary cluster and see these snapshots:
SIQ-3d510f0edc31e18bdb2100dc1441306e-latest
SIQ-w2isilonpoc-upgrade_ir_mirror-2015-08-11_19-37-17
SIQ-w2isilonpoc-upgrade_ir_mirror-latest
secondary cluster has on only this snapshot
SIQ-0050568f48352c87ca553e2004ae341d-latest
Now i am going to the secondary cluster and run upgrade_ir_mirror policy. When i look on primary cluster i see these snapshots:
SIQ-3d510f0edc31e18bdb2100dc1441306e-latest
SIQ-Failover-upgrade_ir_mirror-2015-08-11_19-41-39
and on secondary
SIQ-0050568f48352c87ca553e2004ae341d-latest.
Next on primary cluster i go to Local Targets and select Write Enable for the upgrade_ir_mirror policy. At this point on primary i see these snapshots
SIQ-3d510f0edc31e18bdb2100dc1441306e-latest
SIQ-0050568f48352c87ca553e2004ae341d-restore-latest
SIQ-Failover-upgrade_ir_mirror-2015-08-11_19-48-25
and on secondary
SIQ-0050568f48352c87ca553e2004ae341d-latest
Finally i am going back to Secondary cluster and select Resync-prep for upgrade_ir_mirror policy. Primary cluster has this snapshot
SIQ-3d510f0edc31e18bdb2100dc1441306e-latest
and secondary
SIQ-0050568f48352c87ca553e2004ae341d-latest
SIQ-n2isilonpoc-upgrade_ir-2015-08-11_19-51-57
SIQ-n2isilonpoc-upgrade_ir-latest
So the question is why mirror policy needs to stay and why do i end up with 3 SIQ snapshots on secondary cluster compared to my "normal" state in 1)
Thank you very much for your time
johnsonka
130 Posts
0
August 13th, 2015 06:00
Hello dynamox!
In reading through your question I have some information that I hope is helpful for you in understanding the mirror policy and snapshots. The mirror policy is left in place to preserve the LIN map for the policy so any future need to failover/failback would faster and more efficient. This will reduce time to recovery in the event of an unforeseen catastrophic failure or planned migration between the clusters.
As for the snapshots, this is what I received from the SyncIQ SME here in Support in reagrds to your question:
All of the SIQ- snaps interact with yet another database used by the active cluster(whether it's Primary/Secondary) to sync up the LIN map over at the side. That is why you need the SIQ- snaps at all times. If you delete these from say the mirror policy referencing them then you have destroyed that mirror policy.
With regards to the snaps he pasted at end of question. [Based on naming convention]
SIQ-0050568f48352c87ca553e2004ae341d-latest --- explained this as part of the SIQ- explanation above.
SIQ-n2isilonpoc-upgrade_ir-2015-08-11_19-51-57 -- This one is a manually enabled option configured by the Primary's policy. It has nothing to do with the mirror. This can be disabled to prevent creating this snap.
>SIQ-n2isilonpoc-upgrade_ir-latest -- This is the same reasoning as above.
If you have any more questions or there is anything I can clarify, please do let me know.
dynamox
2 Intern
2 Intern
•
20.4K Posts
0
August 13th, 2015 08:00
Katie,
thank you for trying but your SME replies are still very ambiguous .
i don't understand why mirror policy needs to be there, especally for failover. If you look at my "normal" state, i have a snapshot on primary cluster and SIQ-Failover snapshot on secondary. I can failover to secondary within seconds so why the need to keep this mirror policy in place for "future" failovers ?
I relaise we need snapshots for failover/failback operations but you have yet to explain why at the end of failover/failback i am not back to my original state of one snapshot on source and one snapshot on target. In the reply above why <SIQ-n2isilonpoc-upgrade_ir-2015-08-11_19-51-57> snapshot not deleted, why is it left behind and not cleaned up by the platform ? Sloppy implementation or there is a good reason for it ?
Thank you
sluetze
300 Posts
0
August 13th, 2015 09:00
Katie,
also some additional questions:
EMC told me in one of my SRs (if you want i can give you the number) that the SIQ-PolicyID snaps are safe to delete. And i deleted them on ALL of my clusters. Now you wrote that i destroy the mirror_policy. Also the SIQ-PolicyID Snaps have a expirationdate of one year... so do i have to Failover / Failback every year to save this snap?
The snap GROWS depending on the file modifications on the cluster. It grows to several TB (and i have not a fast changing Environment!)
When i Failover / switchover to secondary site the SIQ-snap is deleted and a new one is created (as far as i could see). The snap is only valid and useful on the SOURCE of a replication.
Regards
Steffen
johnsonka
130 Posts
0
August 13th, 2015 12:00
Hello dynamox and Steffen,
Thanks for replying here, we have been looking at this question today and my SME has discovered a couple more things:
In this case, dynamox, you are correct. Engineering added 2 extra redundant snaps for protection in case something unexpected happens during a failback.
As for the mirror policy, it is not required to stay. One keeps the mirror to achieve faster failovers past the 1st ever failover.
If you choose to keep the mirror policy then SIQ snaps should remain there because it is used by the mirror policy once it becomes active. Without them the mirror policy will break. If you choose to delete the mirror policy then the snap below can be deleted based on the reasoning the follows the original reasoning.
SIQ-0050568f48352c87ca553e2004ae341d-latest – This one cannot be deleted since this one will destroy the mirror relationship. You can delete it you want to by deleting the mirror policy.
As for these snaps, they can be removed safely as they are not related to the mirror and they are not needed here. As to why they are not removed? That appears to be a design decision.
SIQ-n2isilonpoc-upgrade_ir-2015-08-11_19-51-57 – This one is fine to delete since it’s a manual snap. It’s not really tied to the internals of SyncIQ.
SIQ-n2isilonpoc-upgrade_ir-latest -- This one is also fine to delete since it’s not really tied to the internals of SyncIQ as well.
sluetze,
I would like to see that SR number where the SyncIQ snapshots would be OK to remove. I'd like to double check the information you were given. You can leave that here, or private message me the information if you'd prefer.
dynamox
2 Intern
2 Intern
•
20.4K Posts
0
August 13th, 2015 12:00
Katie,
none of the snaps i listed were created manually by me, they were created during failover/failback. So when you say "manual" snap and "not tied to the internals of SyncIQ" i am even more perplexed and confused ? If SyncIQ created these snapshots how can they be not tied to SyncIQ ?
I am not doing anything exotic here, feel fee to recreate my steps in your environment and see what happens.
sluetze
300 Posts
0
August 14th, 2015 01:00
Hi Katie,
SR#69012210
Still wanting to know why the snapshots have an expiration date set when they are so important.
Appreciate your Feedback.
Regards
Steffen