We are in a situation where we need to find all of the symlinks on the filesystem. On a linux machine, I would run this command:
find /ifs -xtype l -ls > myfile.txt
Is there a better/faster way to do this? I thought about using isi_for_array to run this, but I wasn't sure if that would help or hurt.
Thanks
Solved! Go to Solution.
isi_for_array will run the identical command on all nodes, so this would not help as such.
You will need to run multiple finds in parallel, on different parts of the file system. I'm not aware of an 'auto-parallel' version of find; it appears intrinsically tricky to balance loads when the actual structure of the file system tree is not known in advance...
Use OneFS quotas to first explore file counts on a resonable set of possible 'top-level' directories (like /ifs/department*/project*/ or /ifs/users/user*/), to get a few dozen to hundreds of starting points for traversal.
Then use 'xargs -P #'' or GNU parallel to run a moderate number of finds concurrently. Ideally from a Linux client that mounts various directories from different nodes (OneFS Smartconnect; and mount with readdirplus).
Hammering the cluster from inside by running multiple finds locally would need to be done with great caution; if the cluster is in full production I would not recommend it.
Btw, 'find -type l' would make more sense, as '-xtype' already follows a symlink to check for the object that is being pointed to.
hth
-- Peter
isi_for_array will run the identical command on all nodes, so this would not help as such.
You will need to run multiple finds in parallel, on different parts of the file system. I'm not aware of an 'auto-parallel' version of find; it appears intrinsically tricky to balance loads when the actual structure of the file system tree is not known in advance...
Use OneFS quotas to first explore file counts on a resonable set of possible 'top-level' directories (like /ifs/department*/project*/ or /ifs/users/user*/), to get a few dozen to hundreds of starting points for traversal.
Then use 'xargs -P #'' or GNU parallel to run a moderate number of finds concurrently. Ideally from a Linux client that mounts various directories from different nodes (OneFS Smartconnect; and mount with readdirplus).
Hammering the cluster from inside by running multiple finds locally would need to be done with great caution; if the cluster is in full production I would not recommend it.
Btw, 'find -type l' would make more sense, as '-xtype' already follows a symlink to check for the object that is being pointed to.
hth
-- Peter
I was hoping for some unknown to me way to parallelize it. Thanks for the quick and thorough response.