We are looking for a option to sync the data in 4 parts by creating separate paths on source and destination isilon clusters. But the problem we are facing is we have some 300 files in one directory which need to be synced. Is there any way to sync the data in 4 separate paths from source to destination? One challenge here is i can't create 4 directory on the source to equally distribute these 300 files in it, so is there any solution where i can spilt these files and sync the data to destination paths?
Hi my company develops an automation platform for Isilon. We would be interested to see if we can help. The product is available through EMC and you can read about it here Superna - Eyeglass Isilon Edition
Can you confirm if the 300 files are always the same or new files need to be distributed across 4 source paths to allow syncIQ to replicate them.
SyncIQ has flexible policies that allow you to replicate files and folders that meet specific criteria.
You can select from the below options:
- File name
- File type
- Create / Access/ Modify time
- Even regular expressions are supported
I think you will split these 300 files based on criteria, which can be translated to a policy as per below screen-shoot
Thank You Gemmy. I started the sync with size option, but can i create 4 different policies on source and sync to only one destination path? As i am not able to do it since already pervious policy is in sync with same target path.
Are you trying to create policies similar to
/ifs/source1 --> /ifs/target
/ifs/source2 --> /ifs/target
/ifs/source3 --> /ifs/target
/ifs/source4 --> /ifs/target
If so, when you try to start any of the jobs after the first policy is created, you will get the following error
SyncIQ policy target path overlaps target path of another policy.
You can replicate to different destinations and not same destination.
Remember SyncIQ is 1-to-1 replication, this is good specially when you fail-over to the target site.
in case of consolidated target, when you fail-over you will replicate back from DR all data from all sites/folders to single production site.
you will end up with data scattered everywhere and it is not logically right
If you have one directory with a lot of files you want to sync, you only need 1 policy. SyncIQ will split the job up into multiple workers based on the number of nodes in your destination cluster and will synchronize those files in parallel.
I typically sync multiple terabytes and possibly millions of files overnight using a single policy.
Note that the more recent your OneFS release, the better SyncIQ is - there have been significant and noticeable improvements in SyncIQ over the last couple of years.
We are using OneFS Version: 126.96.36.199 on both source and target clusters. Where we have 5 nodes on each of the clusters. I have created only one policy and doing the data sync from source to target cluster. I have first created the policy with condition <=1000KB which completed in 10mins with size 63 MB of 140 files synced, next i have edited the same policy with condition >1000KB and <=1000MB which completed in 2 hours with size 40 GB of 340 files synced, again i have edited the policy with condition >1000MB and <=100GB which is progress from 1.66 days of 831 GB completed out of 1.7 TB and completed 79 files out of 118 files. Still i have some 30 files between 100 GB to 800 GB size to sync it.
Total size of the folder is 4.8TB with 609 files count to be synced which is taking some 5 to 6 days to complete. Please help me with any solution where i can complete it soon in at least 2 days? i hope the delay could be of network bandwidth? or something else?
Manual tuning of the policy like this isn't necessary. Please keep in mind that the first full copy is just that a full copy. After that, it's block-level incrementals forever. People often get tripped up by another fallacy of logic around job frequency. Here is the gist of that. If a syncIQ incremental job is set to run every hour, but the incremental jobs are taking 50 minutes, is it possible to make the jobs run more often, and thus lower the RPO? At first glance, most people would say no. But if we make 1 assumption, which is that the rate of change over a relatively large dataset is fairly constant, then we can say this:
If you were to run the same syncIQ job every 30 minutes, because it only has only 1/2 as much changed bytes to copy as the hourly job, it should complete (excluding overhead from starting and stopping) in about 25 minutes. But then let's do it again. If we changed the schedule to every 15 minutes, the job would only have to copy 1/4 of the amount of data in the hourly job, and therefore the job should complete in about 12.5 minutes. So a good rule of thumb when setting up SyncIQ jobs over a WAN is once you get the full completed. Start with a very liberal schedule frequency, and tune it down until you meet your desired goal. As others have mentioned, performance of SyncIQ in the latest releases especially with large files can be significantly better than on older releases, because we introduced file-splitting, so that rather than a single worker copying say a 5GB file, it can be broken up into smaller chunks, spread across lots of workers and then re-assembled on the other side.
Senior Solution Architect
EMC Isilon Offer & Enablement Team
What is your network bandwidth between the clusters?
Let's assume your network is 1Gbps. That means that at most, you can transfer about .7Gbps or 70MB/sec. That's 252000 MB per hour or about 250GB. You have 4800GB so at that rate, this will take about 19 hours. That's assuming you have 100% of that 1Gbps connection and have maximized the network utilization.
The way you're using your policies is not optimum because of the way SyncIQ chunks up files between its workers. You'll be more efficient to throw the entire directory at it so that some of the larger files can start early and chug away while other streams process a lot of the smaller files. You really want to have as many workers as your load will accept.
On a 5-node cluster, with a default of 3 workers per node, you'll transfer at most 15 files at one time. You probably want to exceed this default. With only 15 total workers, you're very unlikely to saturate your network. It also depends on what kind of nodes you have - NL nodes do not have as much horsepower as S nodes.
Please open a support ticket and have the techs look at your cluster and help you tune the number of workers. It can make a world of difference. You'll also need to tell them what network interfaces you have on all of your nodes and how much is used for customer-facing traffic so they can estimate how much bandwidth you have left.