I've been working with ecs-sync with a single server, working various migration scenarios. These work well, but I'm unclear on how to create a sync job that would utilize multiple sync servers to migrate a large namespace/bucket. References are made to running multple sync servers in parallel in some documentation I've seen, but there doesn't seem to be a specific process to do this. Has anyone attempted this, or is there a document I'm missing somewhere?
There a couple of different ways to approach the question of scaling.
Proper consideration needs to focus on network bandwidth of the aggregate ECS Sync traffic as well as VM resources.
Currently, there is no orchestration component built in to ecs-sync, so managing a cluster is done manually. This is accomplished via list files (clip lists and object lists). To split a bucket across multiple jobs would require generating a complete list of keys from the bucket and then splitting that list just as Chris mentions above.
However, S3 scales a bit differently than other storage plugins and you can use a much higher thread count. I recommend trying that before adding more VMs.
Note that the typical problem with large S3 buckets is that listing the bucket can be quite slow. If ecs-sync performs the list operation, it will only go as fast as results are returned. Please make sure your ECS is on the latest software revision, as this may come with a performance boost. 3.1 was just released, so speak with your account rep to schedule an upgrade.
Also note that a file system (NFS) bucket will list slower than a standard bucket, since there is a hierarchy that must be traversed and aggregated.