Isilon script: Which pool contains my data.

Question

Anyone have a script (python maybe) that will scan the OneFS file system and export the results of the script into an excel spreadsheet? All I want to know is what data is where and when it was last modified. We have a cluster with 3 different pools and if I can find out where ALL the data is located on each pool I can figure out from there how to customize my SmartPool job to put the data where I want it to be AND balance the data between all three pools.

I've also asked EMC for an Isilon F/E that will stop SmartPools when one pool is at 90% to prevent any one pool from having performance issues. Hopefully EMC will get back to me on that F/E as its needed to keep us from having to manually change SmartPools every time a pool starts to hit the cap (and yes, we've told the customer to delete data but that doesn't always drain the right pool).

Peter_Sero · Answer

Couple of thoughts / questions:

Scanning the file system and inquiring the pool for each file (per isi get)

would be practical only if there are not too many files (say up to few millions

but not tens of millions).

Then, would you like to see each file in the spreadsheet?

Or aggregated by directories at a certain depth of the tree?

Still you would need to re-create the actual filepool

rules from the cluster for evaluating the data and to make a prediction.

Each successful SmartPools job gives you a report

with an exact file count per file pool policy. I'd first

look at these numbers -- they come for free -- to see

wether some useful trending over time can be derived.

The more often SmartPools jobs are run, the better of course.

Unfortunately the file capacities are not reported.

If one can "afford" to create more policies,

one can consider adding ranges of file size to the rules.

This will make "trending" somewhat more quantitative.

just fwiw -- but makes sense?

-- Peter

Brian_Coulombe_ · Answer

Don't disagree, but let's remove the 'file' level reporting question.&#xa0; Can we do it based on 'directory' so I could at least see every directory/folder/subfolder and which pool it is located?&#xa0; From that I can run IIQ reports on each directory as to the capacity and from there I could set up SmartPool policies to put the data where I want to it to reside.&#xa0; That's the end goal, to put the data where it needs to be because SmartPools doesn't provide the full functionality I need.As for SmartPools itself, it needs to be smarter than 'I've hit 97% capacity so I'm going to spill over to other pools' instead of saying 'Ok, lets stop writing to this pool since it's at 'X' capacity and handle the data between the other two pools until pool 3 is below X capacity'.&#xa0; The results of letting SmartPool railroad a specific pool, as you know, has a severe performance impact that in production is not tolerable.&#xa0; Too much hand-holding with SmartPools as it is, I need it to actually be 'smart'

trimbn · Answer

Absolutely. Your concerns around a lack of SmartPools directed spillover threshold feature are definitely valid.

Pretty sure we have a product requirement for this due in the next couple of releases timeframe.

Scott G – care to add any color or context to this?

Peter_Sero · Answer

> Can we do it based on 'directory' so I could at least see every directory/folder/subfolder and which pool it is located? Keep in mind that with timestamp-based policies, each subfolder can contain files in different pools. That makes it somewhat complex and I assume rather than a having a generic script one will end up with a custom tool that specifically reflects the rationale of the actual filepool policies in a given environment. My suggestion to look at the job reports first comes from the fact that these reports are already there with no additional efforts or overhead... also @trimbn and @gentrs1: 'Smart' would be not just to migrate files until some threshold is reached, but to migrate files in order of (whatever chosen) timestamp, from oldest to newest, or even 'coldest' to 'hottest', up to a threshold or balance criterion. definitely staying tuned... -- Peter

Brian_Coulombe_ · Answer

Thanks Gents. So a few important things I would hope that EMC would integrate into Isilon/SmartPools/FilePool Policies:

1. Set a "capacity limit" between tiers on Isilon and make OneFS smart enough to know when a pool is reaching the 90% capacity mark and either stop smartpools (with a BIG RED FLASHING WARNING) or pause it and send a note "Hey bud, delete some data because I can't do my job as your pool is at X capacity). If there are more than 2 pools, let SmartPools work on only the two pools that have not reached a set capacity limit for SmartPools.

2. Set a way for SmartPools to say "This directory/folder/path" needs to stay on Tier 1/2/3. For example, if I have an "ingest" folder for high I/O on S Pools. I want all new data to hit that path and stay there for X days, then move to an archive folder. I could do that based on "last modified" date. Later with ECS and Cloudpools I could use "last access" to say whether I want to send that data to the Cloud or not.

3. I really, REALLY want IIQ to tell me "Last accessed". I saw how we could get that info from Isilon but it puts so much overhead on the cluster it's not worth risking an impact to our customers. How hard is it to get "last accessed" integrated into IIQ?

There's about 100 different things I cant think of that could be fixed/enchanced/updated/corrected with OneFS. You can always reach out to me if you wish.

Isilon

Isilon script: Which pool contains my data.

Was this post helpful?