2 Intern

 • 

205 Posts

December 8th, 2016 19:00

Look into gnu parallel + rsync. Here's what I typically do:

cd /source/cluster/path

ls | parallel -q rsync -vaWh --inplace {} /target/cluster/path/

If you're only using one client, you won't oversaturate a node, so I wouldn't bother with the IP addresses. If you're using multiple clients, sure.

December 9th, 2016 15:00

This is what SyncIQ is for - it will parallelize the I/O across source and destination nodes.  We use it a LOT and it works well.

4 Operator

 • 

1.2K Posts

December 9th, 2016 18:00

Not quite -- if one reads None Isilon as non-Isilon....  

scnr

-- Peter

December 13th, 2016 11:00

That makes a difference.  I read it One, Peter read it none.

Yes, parallel rsync is your friend from a generic Linux host to Isilon.  The more hosts and threads you can throw at it, the better.

Ideally, you would have a good look at the data and decide which directories to parallelize at which level.  Always start the largest directories first, sorting down to the smallest.  As the smaller ones finish, other even smaller ones will start up.  If you don't do this, Murphy's Law says you'll do all of the smaller directories first, and when you're done to your last thread, the really large one will start up.

I have not found an easy way to parallelize the transfers without researching the source data first. You need to factor in both the file count and file sizes to balance the threads properly.

Watch the encryption you're doing as well.  Don't use the most secure protocols if you don't need them.  Drop down to blowfish if you must tunnel over ssh but it's within your localized secure environment.  You can make this change in ~/.ssh/config if that's easier for you.

It should go without saying that you should use a current version of rsync.

2 Intern

 • 

356 Posts

December 13th, 2016 11:00

Oops... thanks and I edit that typo

No Events found!

Top