Isilon

Last reply by 08-19-2021 Solved
Start a Discussion
2 Bronze
2 Bronze
445

What is the best way to find all symlinks?

We are in a situation where we need to find all of the symlinks on the filesystem.  On a linux machine, I would run this command:

      find /ifs -xtype l -ls > myfile.txt

 

Is there a better/faster way to do this?  I thought about using isi_for_array to run this, but I wasn't sure if that would help or hurt.

 

Thanks

Solution (1)

Accepted Solutions
4 Beryllium
436

isi_for_array will run the identical command on all nodes, so this would not help as such.

You will need to run multiple finds in parallel, on different parts of the file system. I'm not aware of an 'auto-parallel' version of find; it appears intrinsically tricky to balance loads when the actual structure of the file system tree is not known in advance...

Use OneFS quotas to first explore file counts on a resonable set of possible 'top-level' directories (like /ifs/department*/project*/ or /ifs/users/user*/), to get a few dozen to hundreds of starting points for traversal.

Then use 'xargs -P #'' or GNU parallel to run a moderate number of finds concurrently. Ideally from a Linux client that mounts various directories from different nodes (OneFS Smartconnect; and mount with readdirplus).

Hammering the cluster from inside by running multiple finds locally would need to be done with great caution; if the cluster is in full production I would not recommend it.

Btw, 'find -type l' would make more sense, as '-xtype' already follows a symlink to check for the object that is being pointed to.

hth

-- Peter

View solution in original post

Replies (2)
4 Beryllium
437

isi_for_array will run the identical command on all nodes, so this would not help as such.

You will need to run multiple finds in parallel, on different parts of the file system. I'm not aware of an 'auto-parallel' version of find; it appears intrinsically tricky to balance loads when the actual structure of the file system tree is not known in advance...

Use OneFS quotas to first explore file counts on a resonable set of possible 'top-level' directories (like /ifs/department*/project*/ or /ifs/users/user*/), to get a few dozen to hundreds of starting points for traversal.

Then use 'xargs -P #'' or GNU parallel to run a moderate number of finds concurrently. Ideally from a Linux client that mounts various directories from different nodes (OneFS Smartconnect; and mount with readdirplus).

Hammering the cluster from inside by running multiple finds locally would need to be done with great caution; if the cluster is in full production I would not recommend it.

Btw, 'find -type l' would make more sense, as '-xtype' already follows a symlink to check for the object that is being pointed to.

hth

-- Peter

2 Bronze
2 Bronze
432

 I was hoping for some unknown to me way to parallelize it. Thanks for the quick and thorough response.

Latest Solutions
Top Contributor