Start a Conversation

Unsolved

This post is more than 5 years old

2875

April 5th, 2018 12:00

File clone question

Can anyone help me find more documentation about file clone? I understand some of the concepts and tested cloning files. I will need to write a script to be able to clone all the files in a directory structure.

We have a scenario where we would like to clone close to 10 TB data that consists of more than 100k directory and 16 million files. We will be suing the clone files to present the data to test environment. Are there any negative impact of using file clone on such huge number of files.

1 Message

April 6th, 2018 06:00

254 Posts

April 6th, 2018 08:00

There will be challenges here.  The first is to manually create the directory structure with proper permissions.  This is a little easier if it's UNIX only, but can get messier if Windows ACLs are involved.  File cloning is exactly that, a clone of a file, not a directory.

Then you can run the cp -c on each file.  It works one at a time per call.  As you can imagine at the scale you have, this could take a good while to complete.  At 16M files it could be more than a day depending on how parallel you can be when doing this.  It also has to be on the cluster as the -c flag doesn't work over the protocols.

It's doable, just not pretty.

450 Posts

April 6th, 2018 09:00

Often if the dataset isn't huge, and you have the capacity to spare on the cluster, you're better off to use SyncIQ from the production dataset, let's say /ifs/prod/applicationA/ back to the same cluster at /ifs/testcopy/applicationA.  Then pause the SyncIQ session, Allow Writes, and do your testing that way.  It's much easier to rinse and repeat.  Then discard the changes and disallow writes, and resume SyncIQ. 

Just a different, but probably easier approach.

~Chris

1.2K Posts

April 6th, 2018 09:00

In addition of Adam's and Chris' excellent tips, let me ask how much writing activity is expected from the "test" environment?

If none, you can of course establish a read-only mount on protocol level,

or leverage OneFS snapshots present the test data as read-only.

If minor writes (<< 16 million) will take place, and if there is only a single client machine

in the test environment, consider setting up an "overlay" file system on the client side,

so that the writes stay local to that client while the Isilon data is treated as read-only.

-- Peter

13 Posts

April 6th, 2018 11:00

Adam,

Thank you very much for your response. I understand the fact that it's going to take a long time to run the clone on all the data. At this point time is not a constraint for us as test system are not in use over the weekend and we can script the clone process to run over the weekend.

I'm trying to understand internal process when the cloning actually takes place. The documentation says that when a clone is created all data contained in the cloned file is transferred to a shadow store. What is exactly happening during this process? Is there any limit on number of clones or data that can be cloned? I just don't want to script it to clone without understanding the repercussions and breaking something. There is very less documentation that is available about the process to answer the limitations.

13 Posts

April 6th, 2018 11:00

Peter,

Thank you very much for your response.

Snapshots are not going to work for us as there are going to be write activity on test system. I would estimate around 10 to 20 percent change from the original data. For this environment we do have 4 machines that we are sharing the export to.

13 Posts

April 6th, 2018 11:00

Chris,

We are now using synciq to copy the data to test shares while we figure out if cloning can be used. We don't have a Dedupe license so we will see increase in space utilization by creating multiple copies.

13 Posts

April 6th, 2018 11:00

Thank you very much for your response.

I'm looking for more information on file clone feature on Isilon.

254 Posts

April 9th, 2018 08:00

Shadow Stores are really just filesystem constructs that allow blocks to be shared between multiple files.  It's the basis for SnapShotIQ, Deduplication, file cloning, and Small File Efficiency.  You can think of it as a container that you can't see that allows for the block sharing to occur. 

When you clone a file, this construct is used to have the block pointers of the clone point to the physical blocks of the original file.  You clone a file up to 32,766 times so hopefully that's not a constraint.  The blocks are not physically moved, but become part of a shadow store in the cloning process. 

File cloning is really a relic from a day prior to Isilon being acquired when they were trying to host virtual disks.  Now that Isilon is a part of Dell/EMC, this practice is discouraged as Dell/EMC has products in their portfolio that do this MUCH better than Isilon.  But when you're a startup, you do startup things.   I'm not saying it's going away, but that probably explains why it's not heavily documented.

13 Posts

May 4th, 2018 10:00

Thanks for the explanation Adam

No Events found!

Top