Start a Conversation

Unsolved

This post is more than 5 years old

6211

March 10th, 2015 13:00

isilon synciq data replication

We are looking for a option to sync the data in 4 parts by creating separate paths on source and destination isilon clusters. But the problem we are facing is we have some 300 files in one directory which need to be synced. Is there any way to sync the data in 4 separate paths from source to destination? One challenge here is i can't create 4 directory on the source to equally distribute these 300 files in it, so is there any solution where i can spilt these files and sync the data to destination paths?

57 Posts

March 10th, 2015 14:00

Hi my company develops an automation platform for Isilon.  We would be interested to see if we can help.  The product is available through EMC and you can read about it here Superna - Eyeglass Isilon Edition

Can you confirm if the 300 files are always the same or new files need to be distributed across 4 source paths to allow syncIQ to replicate them.

Regards

Andrew

5 Practitioner

 • 

274.2K Posts

March 10th, 2015 14:00

SyncIQ has flexible policies that allow you to replicate files and folders that meet specific criteria.

You can select from the below options:

- File name

- File type

- Create / Access/ Modify time

- Even regular expressions are supported

I think you will split these 300 files based on criteria, which can be translated to a policy as per below screen-shoot

synciq.PNG.png

31 Posts

March 10th, 2015 17:00

Thank You Gemmy. I started the sync with size option, but can i create 4 different policies on source and sync to only one destination path? As i am not able to do it since already pervious policy is in sync with same target path.

60 Posts

March 10th, 2015 19:00

Are you trying to create policies similar to

/ifs/source1  --> /ifs/target

/ifs/source2  --> /ifs/target

/ifs/source3  --> /ifs/target

/ifs/source4  --> /ifs/target

If so, when you try to start any of the jobs after the first policy is created, you will get the following error

SyncIQ policy target path overlaps target path of another policy.

5 Practitioner

 • 

274.2K Posts

March 11th, 2015 05:00

You can replicate to different destinations and not same destination.

Remember SyncIQ is 1-to-1 replication, this is good specially when you fail-over to the target site.

in case of consolidated target, when you fail-over you will replicate back from DR all data from all sites/folders to single production site.

you will end up with data scattered everywhere and it is not logically right

March 12th, 2015 07:00

If you have one directory with a lot of files you want to sync, you only need 1 policy.  SyncIQ will split the job up into multiple workers based on the number of nodes in your destination cluster and will synchronize those files in parallel.

I typically sync multiple terabytes and possibly millions of files overnight using a single policy.

Note that the more recent your OneFS release, the better SyncIQ is - there have been significant and noticeable improvements in SyncIQ over the last couple of years.

31 Posts

March 12th, 2015 11:00

We are using OneFS Version: 7.1.1.2 on both source and target clusters. Where we have 5 nodes on each of the clusters. I have created only one policy and doing the data sync from source to target cluster. I have first created the policy with condition <=1000KB which completed in 10mins with size 63 MB of 140 files synced, next i have edited the same policy with condition >1000KB and <=1000MB which completed in 2 hours with size 40 GB of 340 files synced, again i have edited the policy with condition >1000MB and <=100GB which is progress from 1.66 days of 831 GB completed out of 1.7 TB and completed 79 files out of 118 files. Still i have some 30 files between 100 GB to 800 GB size to sync it.


Total size of the folder is 4.8TB with 609 files count to be synced which is taking some 5 to 6 days to complete. Please help me with any solution where i can complete it soon in at least 2 days? i hope the delay could be of network bandwidth? or something else?

March 12th, 2015 12:00

There's a large fallacy in thinking that running SyncIQ more often is good and in fact, it can be very bad for you.

Let's say you want your replication cluster to have a longer retention for your backups.  If you run your backups/SyncIQ once per day, you are only transferring the data that has changed during the day.  If you run it every hour, you're transferring the data that has changed every hour.  Makes sense, right?  But what if you have a lot of data that sits around for a few hours and is deleted?  In the daily sync, you might not be transferring the data.  In the hourly sync, you are unless you have very disciplined user community that writes transitory data to a location that's outside of your SyncIQ policies.

More frequent syncs can result in a lot more data being transferred and result in your not even meeting the same RPO as a daily sync schedule.

It's important to think about the end goal and what tools you have to accomplish that goal.

450 Posts

March 12th, 2015 12:00

Manual tuning of the policy like this isn't necessary.  Please keep in mind that the first full copy is just that a full copy.  After that, it's block-level incrementals forever.  People often get tripped up by another fallacy of logic around job frequency.  Here is the gist of that.  If a syncIQ incremental job is set to run every hour, but the incremental jobs are taking 50 minutes, is it possible to make the jobs run more often, and thus lower the RPO?  At first glance, most people would say no.  But if we make 1 assumption, which is that the rate of change over a relatively large dataset is fairly constant, then we can say this:

If you were to run the same syncIQ job every 30 minutes, because it only has only 1/2 as much changed bytes to copy as the hourly job, it should complete (excluding overhead from starting and stopping) in about 25 minutes.  But then let's do it again.  If we changed the schedule to every 15 minutes, the job would only have to copy 1/4 of the amount of data in the hourly job, and therefore the job should complete in about 12.5 minutes.  So a good rule of thumb when setting up SyncIQ jobs over a WAN is once you get the full completed.  Start with a very liberal schedule frequency, and tune it down until you meet your desired goal.  As others have mentioned, performance of SyncIQ in the latest releases especially with large files can be significantly better than on older releases, because we introduced file-splitting, so that rather than a single worker copying say a 5GB file, it can be broken up into smaller chunks, spread across lots of workers and then re-assembled on the other side.

~Chris Klosterman

chris.klosterman@emc.com

twitter: @croaking

Senior Solution Architect

EMC Isilon Offer & Enablement Team

March 12th, 2015 12:00

What is your network bandwidth between the clusters?

Let's assume your network is 1Gbps.  That means that at most, you can transfer about .7Gbps or 70MB/sec.   That's 252000 MB per hour or about 250GB.   You have 4800GB so at that rate, this will take about 19 hours.  That's assuming you have 100% of that 1Gbps connection and have maximized the network utilization.

The way you're using your policies is not optimum because of the way SyncIQ chunks up files between its workers.  You'll be more efficient to throw the entire directory at it so that some of the larger files can start early and chug away while other streams process a lot of the smaller files.  You really want to have as many workers as your load will accept.

On a 5-node cluster, with a default of 3 workers per node, you'll transfer at most 15 files at one time.  You probably want to exceed this default.  With only 15 total workers, you're very unlikely to saturate your network.  It also depends on what kind of nodes you have - NL nodes do not have as much horsepower as S nodes.

Please open a support ticket and have the techs look at your cluster and help you tune the number of workers.  It can make a world of difference.  You'll also need to tell them what network interfaces you have on all of your nodes and how much is used for customer-facing traffic so they can estimate how much bandwidth you have left. 

1.2K Posts

March 12th, 2015 19:00

Ed and Chris

shortening the interval works well and make sense as you explain -- and in the end isn't this exactly what SyncIQ "Continuous" mode has been made for? It kind of auto-adapts the sync interval to the shortest possible time span.

-- Peter

March 13th, 2015 06:00

Yes and no - continuous replication works well if you have a slowly growing directory.

If you have a busy source cluster, and little or no growth in the directory you're using as the source for your continuous replication, you can run into problems with snapshot deletes.  You can create a lot of snapshots very quickly.

Note also that continuous replication can require more bandwidth than scheduled replication for the reasons I outlined earlier.

Continuous replication doesn't really "auto-adapt".  It simply creates a snapshot, replicates the data, and starts over again right away.  It can take only a few seconds if there's no new data.

1.2K Posts

March 13th, 2015 07:00

I have tried it once (demo) and if I remember correctly, it doesn't fire up if there is no new data.

With permanent write activity, a new sync starts a few seconds after the previous one has finished; and creating many snapshots in this scenario is by design I would assume -- nevertheless, thanks for the warning!

-- Peter

122 Posts

April 8th, 2015 11:00

Hello,

While creating policy you can create filters for folder (include or exclude) or by file name extension. This is the only possible way.

Thanks

No Events found!

Top