3a) after doing the failover (allow-write) You have to do a Prepare Resync to setup the mirror-policy. You may want to configure a schedule (afaik it's on manual after creation)
3b) it is an incremental copy. In case all of the data has changed this won't help you. I did some testings in the last time and as long as you don't screw up the policy relation between the normal and the mirror - policy you won't have to perform an FullSync.
Something more to mind:
The application will not be able to change the 40TB instantly. This would also take some time. so it may not be necessary to do a complete restore of all the files in case the upgrade fails but only of parts of the data.
Check for "overlayfs" for Linux. It's simple, you can always revert to the "underlying" read-only data by just unmounting and wiping the scratch overlay, which only contains the diffs written since mounted. No per-file recoveries, that makes reverting fast.. As you (just) estimated about 2TB worth of diffs in your case, you would need at least that amount of local scratch.
Thanks sluetze. Yes, i don't think they will change 40TB worth of data, let's say they only change 2TB. Option 2 would be great if only it were to revert changed files, not restore entire 40TB.
So for 3a i will need to complete these 3 steps (part of the fail back procedure) to get my replication going from secondary to primary.
On the primary cluster, click Data Protection > SyncIQ > Policies .
In the SyncIQ Policies table, in the row for a replication policy, from the Actions column, select Resync-prep.SyncIQ creates a mirror policy for each replication policy on the secondary cluster.SyncIQ names mirror policies according to the following pattern: _mirror
On the secondary cluster, replicate data to the primary cluster by using the mirror policies.You can replicate data either by manually starting the mirror policies or by modifying the mirror policies and specifying a schedule.
i am not sure what you mean by "overlay". I don't know (nor the app owner) if upgrade process changes metadata only or actual data files so i have to be able to recover "everything", there is no going and restoring individual files.
I can confirm fail back in incremental. After speaking to a few technical support engineers in the backups (they cover SyncIQ)
They also made the recommendation to open and SR, after the initial sync from source to target, to validate everything is good to go. They can also be available to help with the fail back if needed.
Thank you Shane. Any particular reason they are recommending to validate initial sync ? We setup SyncIQ policies all the time and assumption is if policy completes without errors, then all data is on secondary cluster.
Confusion of steps when setting SyncIQ up for fail over fail back. Cases in the past have come down to procedure not being followed correctly, which results in a full sync. It's a better safe then sorry/sanity check.
this is how you have the fail back procedure documented in the online help. I think it's pretty straight forward, anything missing ?
Fail back data to a primary cluster
After you fail over to a secondary cluster, you can fail back to the primary cluster.
Before you begin
Fail over a replication policy.
Procedure
On the primary cluster, click Data Protection > SyncIQ > Policies .
In the SyncIQ Policies table, in the row for a replication policy, from the Actions column, select Resync-prep.SyncIQ creates a mirror policy for each replication policy on the secondary cluster.SyncIQ names mirror policies according to the following pattern: _mirror
On the secondary cluster, replicate data to the primary cluster by using the mirror policies.You can replicate data either by manually starting the mirror policies or by modifying the mirror policies and specifying a schedule.
Prevent clients from accessing the secondary cluster and then run each mirror policy again.To minimize impact to clients, it is recommended that you wait until client access is low before preventing client access to the cluster.
On the primary cluster, click Data Protection > SyncIQ > Local Targets .
In the SyncIQ Local Targets table, from the Actions column, select Allow Writes for each mirror policy.
On the secondary cluster, click Data Protection > SyncIQ > Policies .
In the SyncIQ Policies table, from the Actions column, select Resync-prep for each mirror policy.
After you finish
Redirect clients to begin accessing the primary cluster.
sluetze
2 Intern
•
300 Posts
1
July 17th, 2015 05:00
hi dynamox,
3a) after doing the failover (allow-write) You have to do a Prepare Resync to setup the mirror-policy. You may want to configure a schedule (afaik it's on manual after creation)
3b) it is an incremental copy. In case all of the data has changed this won't help you. I did some testings in the last time and as long as you don't screw up the policy relation between the normal and the mirror - policy you won't have to perform an FullSync.
Something more to mind:
The application will not be able to change the 40TB instantly. This would also take some time. so it may not be necessary to do a complete restore of all the files in case the upgrade fails but only of parts of the data.
Regards
--sluetze
Peter_Sero
4 Operator
•
1.2K Posts
0
July 17th, 2015 06:00
Check for "overlayfs" for Linux. It's simple, you can always revert to the "underlying" read-only data by just unmounting and wiping the scratch overlay, which only contains the diffs written since mounted. No per-file recoveries, that makes reverting fast.. As you (just) estimated about 2TB worth of diffs in your case, you would need at least that amount of local scratch.
Stdekart
104 Posts
0
July 17th, 2015 06:00
dynamox,
I just want to make sure to procedure for the fail over/fail back is documented in this thread for future reference.
I agree with sluetze that using SyncIQ fail over/fail back would be the best viable option/fastest recovery time.
If there is a failure simply make the target (pre-upgrade data) read/write, and point the app at the DR clusters IP's.
7.1.0.x (Page 220-221)
https://support.emc.com/docu50220_OneFS-7.1.0-Web-Administration-Guide.pdf?language=en_US
7.1.1.x (Page 238-240)
https://support.emc.com/docu54201_OneFS-7.1.1-Web-Administration-Guide.pdf?language=en_US
7.2.0.x (Page 258-260)
https://support.emc.com/docu56049_OneFS-7.2-Web-Administration-Guide.pdf?language=en_US
Peter_Sero
4 Operator
•
1.2K Posts
0
July 17th, 2015 06:00
A bit out of the left field: why not doing a test run of the new app version first,
keeping all changes local to the app server...
WHAT?
Here is what I have in mind:
- take a snapshot of App3/
- mount the snapshot it read-only on a test server
- overlay that read-only NFS mount with a local, writable scratch filesystem
- launch new app version against the overlay
- if overlay (diffs to r/o mounted) exceeds local scratch size... have to cancel the test...
- otherwise, test new app to satisfaction
- if test ends fine, launch new app in production
- (you will still want to have some "backup" to revert to,
but as an app failure has become less likely,
your regular backup/restore SLA might suffice.
Plus, from the overlay test you might have learned something
about how the new app acts on the data,
and might be able to refine the specific backup strategy on the Isilon.
E.g. imagine the app just rebuilding some index files but
leaving 99.99% of stuff untouched.)
Just fwiw. Curious to see how the project will evolve, good luck!
-- Peter
dynamox
9 Legend
•
20.4K Posts
0
July 17th, 2015 06:00
Thanks sluetze. Yes, i don't think they will change 40TB worth of data, let's say they only change 2TB. Option 2 would be great if only it were to revert changed files, not restore entire 40TB.
So for 3a i will need to complete these 3 steps (part of the fail back procedure) to get my replication going from secondary to primary.
dynamox
9 Legend
•
20.4K Posts
0
July 17th, 2015 06:00
Hi Peter,
i am not sure what you mean by "overlay". I don't know (nor the app owner) if upgrade process changes metadata only or actual data files so i have to be able to recover "everything", there is no going and restoring individual files.
dynamox
9 Legend
•
20.4K Posts
0
July 17th, 2015 08:00
Shane can you please confirm that fail back is incremental?
Stdekart
104 Posts
1
July 17th, 2015 09:00
dynamox,
I can confirm fail back in incremental. After speaking to a few technical support engineers in the backups (they cover SyncIQ)
They also made the recommendation to open and SR, after the initial sync from source to target, to validate everything is good to go. They can also be available to help with the fail back if needed.
dynamox
9 Legend
•
20.4K Posts
0
July 17th, 2015 11:00
Thank you Shane. Any particular reason they are recommending to validate initial sync ? We setup SyncIQ policies all the time and assumption is if policy completes without errors, then all data is on secondary cluster.
Stdekart
104 Posts
0
July 17th, 2015 11:00
dynamox,
Confusion of steps when setting SyncIQ up for fail over fail back. Cases in the past have come down to procedure not being followed correctly, which results in a full sync. It's a better safe then sorry/sanity check.
Stdekart
104 Posts
0
July 17th, 2015 12:00
dynamox,
Correct everything step is outlined in the links I mentioned earlier.
It's issues with them being followed correctly that result in a full resync.
dynamox
9 Legend
•
20.4K Posts
0
July 17th, 2015 12:00
this is how you have the fail back procedure documented in the online help. I think it's pretty straight forward, anything missing ?
Fail back data to a primary cluster
After you fail over to a secondary cluster, you can fail back to the primary cluster.
Before you begin
Fail over a replication policy.
Procedure
After you finish
Redirect clients to begin accessing the primary cluster.