254 Posts

March 9th, 2017 10:00

Hi Brian:

I just happend to write that script in Perl, mostly because I'm old and I know it well.  As no one under 40 codes in Perl anymore, I've added Python to my repertoire as well and am reasonably competent at it.

When it comes to these kinds of things, there are some basic things to consider:

1. Where does the script need to run?  Can it run off cluster (preferred but not requred)? 

2. Can it use the API?  The API needs credentials so that can be an issue in some environments.  I try to work around that with very restricted RBAC accounts (e.g. the account can only read NFS and quota stuff) so that even if someone got the password, there's very little they could actually do.  If not, does setting up a trusted ssh key and running commands remotely make sense?  There's no one correct answer here.

3. What does the output have to look like?  I tend to start with the output and work my way back to it.

This is basic scoping stuff.  I understand that a lot of shops don't have in-house scripting skills which is always preferred.  Sometimes I can whip something up, but the big caveat there is that the script is not supported by Dell/EMC and even support from me has no warranty or SLA.  It's literally stuff I do when I have some free time.  And my stuff is reasonably tested but it's far from comprehensive so it's caveat emptor on that.   It's really up to you to test it and make sure you feel it's safe your environment and you're ultimately responsible for what happens.

If you want to take this offline, feel free to reach out via email.  As with everyone else here I'm first_name.last_name@dell.com so in my case it's adam.fox@dell.com.

254 Posts

December 15th, 2016 13:00

As of today, CloudPools does not have much in the way of reporting.  There is work being done in this area.  To find out details, you'll need to contact your Isilon SE who can discuss this under NDA.

As far as telling if a file has been stubbed, you can run the following on a per file basis:

# isi get -DD foo | grep -i stub

*  Stubbed:            False

In this case, the file wasn't stubbed, but you'd see True if it were.  The goal is to hide the fact that it's a stub from the user so checking properties won't help there.  The metadata of the file resides locally on the cluster so we can report that back to the user if needed.

Another clue that a file was stubbed is if the du size for the file is small (say 8-10KB) and the file metadata reports a much larger size.  But isi get is the authoritative answer.

3 Posts

December 15th, 2016 19:00

The isi get command is recommended.

isi get -DD foo | grep -i stub


The du command is also useful, but maybe have a latency problem sometimes.


1 Rookie

 • 

107 Posts

December 16th, 2016 04:00

Thanks all and appreciate the input.  I run ISI GET all day long but it's manual.  What I need is a script that will pull the "isi get" command against every file (like a treewalk) and put it into an excel document.  Then again, if I could do that, then Isilon would be able to put that option into IIQ.

I don't have the time to sit there and to through 2.7TB of data to make sure every file is stubbed....

I wonder if I can script this since it's pretty close to what I want to do.


isi_test_cpool_cbm –S *

isi_test_cpool_cbm: no path specified

        [-s | --stub]       stub file (default operation)

        [-n | --sync]      sync stubbed file to cloud

        [-r | --read]       read file to stdout

        [-w | --write]      write to file from stdin

        [-t size | --truncate=size

                            truncate file to specifed size

        [--clear-cache]     invalidate the local data cache for file

        [-R | --recall]     recall stubbed file

        [--readahead]      perform full readahead on file

        [-A | --attributes] show mtime and file size extattr

        [-S | --is-stub]    show whether file is stubbed

        [-M | --show-map]   show map for stubbed file

        [--show-object-paths]

                            display paths to CMO and CDOs

                            may be specified with -M or standalone

        [-C | --show-CMO]   print the cloud CMO of a stubbed file

        [--dump-cache]      print cacheinfo status

        [-c option | --cache-option=option]

                            specify cache read option

                            one of CACHE, NOCACHE, POLICY

                            default: POLICY

        [-o offset | --offset=offset]

                            offset (for read/write operation)

        [-l length | --length=length]

                            length (for read/write operation)

        [-p policy | --policy=policy]

                            policy for stubbing [required for stubbing]

        [-f script | --failpoint=script]

                            shell script to set process failpoints

                            arg 1 to script is PID

        [-L | --get-lin]    return the lin::snapid for specified path

        [--list-version]    return the list of object versions

        [--list-cdo=version]

                            return CDOs under the version

        [--io-file]         specify input file for write operations

        [--set-posix-info=x,x,x]

                      use the wcc supported interface to set

                      attributes, if x is set to Y attribute is

                      changed.  The tupple corresponds to

                      setting size,mtime,mode.  The value of

                      size can be specified using the --length

                      option while mtime is set to current time

                      and mode has a default value 444

        [-v | --verbose]    display lin:snap info

        [--get-times]       return mtime/ctime in seconds,nanoseconds

        [--dump-sparse-info]

                            display the sparse info for a file

        [--dump-sparse-map] display the sparse maps from CDOs for file

        [--compare-sparse-info FILE1 FILE2]

                            compare sparse info for FILE1 and FILE2

                            if match is found exit status is 0.  With

                            --verbose, all differences are display,

                            otherwise only last first diff is shown     [--use-ilog]        enable logging

4 Operator

 • 

1.2K Posts

December 16th, 2016 07:00

I'd say the scriptability is a good or bad as with isi get, as both do not (yet?) appear in the platform API,

so one needs to script around these CLI tools as they are...

1 Rookie

 • 

107 Posts

December 16th, 2016 09:00

I understand and can appreciate that, but we would like the option to allow our customers to pull that information from IIQ where they do not have CLI access and we're already buried in work to sit there and script reports all day

254 Posts

December 16th, 2016 10:00

If it helps, here's quick script I put together that runs on a node.  It outputs a csv which can be loaded into Excel.  It could place it somewhere in /ifs where folks could get it.  I understand if you can't use it, but I'm putting it here in case someone can.

https://my.syncplicity.com/share/mynxxl05yul0skz/cp_check

Syntax is simple:

cp_check.pl [-n] [-s] dir [dir] [dir] ... [dir]

-n : Only report files that have not been stubbed

-s: Only report files that have been stubbed

(default is to do all files)

If you are running this in the /ifs filesystem, you will have to call perl explicitly.  Feel free to hack it to bits.  Script it as is and isn't supported by Dell EMC.  I will try to take requests but it will be done in my spare time.

1 Rookie

 • 

107 Posts

December 16th, 2016 11:00

Thanks Adam!  Scripting is not my forte, so I really appreciate it.  This appears to be perl so I'll fiddle around and see if we can get it working.

Cheers!

1 Rookie

 • 

107 Posts

March 9th, 2017 05:00

Adam if you see this, need to talk to you about PERL scripting. I need to be able to create a script that outputs a text document for verbose listing NFS/SMB shares and quotas and then emails them to an internal customer.

I checked out your script, but its way above my knowledge

1 Rookie

 • 

107 Posts

March 9th, 2017 11:00

Thanks! And PS: I am QUITE QUITE QUITE aware that EMC and custom scripting is like, getting blood from an Onion. They hate it and I get it, but its a necessary function we need as a customer and I get really angry that people we spend a LOT of money on look at me like I am speaking a foreign language.

1 Rookie

 • 

107 Posts

March 9th, 2017 11:00

Inbound email coming!

7 Posts

May 1st, 2018 19:00

I am having the same issue that the OP had.  We are getting ready to implement Cloud Pools on our production Isilon cluster and need something very similar to what is being discussed in this thread.  The short version is that we use SmartQuotas for internal chargeback to customers.  Since SmartQuotas isn't Cloud Pools aware, we are running into an issue with how calculate the size of files that have been archived off to ECS.

So far I've come up with a script that iterates through each directory quota:

1. Run recursive file listing (ls -lo)

2. Grab only Smartlink files (grep sstubbed)

3. Grab just the size (in bytes) of the each stubbed file (using awk)

4. Add all file sizes together, convert to GB

5. Output path for SmartQuota and size of stubbed files (step 4)

6. Repeat

This seems to work, but since there are ~200 million files on this cluster it is taking really long to run (about a week).  I have tried speeding things up by piping data into xargs to use multiple processors, but still taking forever.

I would love to hear from anyone who might have come up with a better solution to this issue.

4 Operator

 • 

1.2K Posts

May 2nd, 2018 13:00

Using a multithreaded 'find'-like tool, I have recently seen dramatic speedups for file system traversals,

in particular over NFS (various server brands).

I'd recommend to take a peek at 'fd' (google for 'sharkdp/fd' to find the github page; external links usually get stuck in the forum moderation)

What really helps is that the 'fd' tool does the actual traversal(!) in a multithreaded fashion, try to boost it with the -j NTHREADS option, but keep an eye on the Isilon performance (CPU load, disk latencies) at the same time.

Note that 'fd' has a total different call syntax than 'find', it's not a drop-in replacement. But the multithreaded tree-walk is just awesome.

hth

-- Peter

7 Posts

May 18th, 2018 08:00

'fd' does look interesting, but unfortunately I need to run the find directly on the cluster at this time.  This is because I use the attributes returned by the follow-up 'ls -lo' on each file to identify if they are archived (it lists the "sstubbed" attribute).  I have not found a way to identify stubbed/archived files when using another host to scan the cluster.

254 Posts

May 18th, 2018 12:00

It's possible through the API, but you'd still end up walking the tree and making an API call each time.  Not the most elegant answer, but it's possible.

Example on the cluster:

partly-1# ls -lso ooc.pdf

1 -rw-r--r--    1 root  wheel  inherit,writecache,wcinherit,ssmartlinked 1253657 May 18 13:57 ooc.pdf

Example off-cluster

[foxa3@AFOX-51-c66-vm ~]$ curl -k -u root:a https://10.111.158.120:8080/namespace/ifs/archive/ooc.pdf\?metadata


{

"attrs" :

[


{

"name" : "is_hidden",

"value" : false

},


{

"name" : "size",

"value" : 1253657

},


{

"name" : "block_size",

"value" : 8192

},


{

"name" : "blocks",

"value" : 1

},


{

"name" : "last_modified",

"value" : "Fri, 18 May 2018 17:57:15 GMT"

},


{

"name" : "change_time",

"value" : "Fri, 18 May 2018 17:57:15 GMT"

},


{

"name" : "access_time",

"value" : "Fri, 18 May 2018 17:57:15 GMT"

},


{

"name" : "create_time",

"value" : "Fri, 18 May 2018 17:57:15 GMT"

},


{

"name" : "mtime_val",

"value" : 1526666235

},


{

"name" : "ctime_val",

"value" : 1526666235

},


{

"name" : "atime_val",

"value" : 1526666235

},


{

"name" : "btime_val",

"value" : 1526666235

},


{

"name" : "owner",

"value" : "root"

},


{

"name" : "group",

"value" : "wheel"

},


{

"name" : "uid",

"value" : 0

},


{

"name" : "gid",

"value" : 0

},


{

"name" : "id",

"value" : 4296751653

},


{

"name" : "nlink",

"value" : 1

},


{

"name" : "type",

"value" : "object"

},


{

"name" : "stub",

"value" : true

},


{

"name" : "mode",

"value" : "0644"

}

]

}

No Events found!

Top