Unsolved
This post is more than 5 years old
1 Rookie
•
107 Posts
1
3766
October 11th, 2016 10:00
Isilon script: Which pool contains my data.
Anyone have a script (python maybe) that will scan the OneFS file system and export the results of the script into an excel spreadsheet? All I want to know is what data is where and when it was last modified. We have a cluster with 3 different pools and if I can find out where ALL the data is located on each pool I can figure out from there how to customize my SmartPool job to put the data where I want it to be AND balance the data between all three pools.
I've also asked EMC for an Isilon F/E that will stop SmartPools when one pool is at 90% to prevent any one pool from having performance issues. Hopefully EMC will get back to me on that F/E as its needed to keep us from having to manually change SmartPools every time a pool starts to hit the cap (and yes, we've told the customer to delete data but that doesn't always drain the right pool).



Peter_Sero
4 Operator
•
1.2K Posts
0
October 11th, 2016 11:00
Couple of thoughts / questions:
Scanning the file system and inquiring the pool for each file (per isi get)
would be practical only if there are not too many files (say up to few millions
but not tens of millions).
Then, would you like to see each file in the spreadsheet?
Or aggregated by directories at a certain depth of the tree?
Still you would need to re-create the actual filepool
rules from the cluster for evaluating the data and to make a prediction.
Each successful SmartPools job gives you a report
with an exact file count per file pool policy. I'd first
look at these numbers -- they come for free -- to see
wether some useful trending over time can be derived.
The more often SmartPools jobs are run, the better of course.
Unfortunately the file capacities are not reported.
If one can "afford" to create more policies,
one can consider adding ranges of file size to the rules.
This will make "trending" somewhat more quantitative.
just fwiw -- but makes sense?
-- Peter
Brian_Coulombe_
1 Rookie
•
107 Posts
0
October 11th, 2016 12:00
Don't disagree, but let's remove the "file" level reporting question. Can we do it based on "directory" so I could at least see every directory/folder/subfolder and which pool it is located? From that I can run IIQ reports on each directory as to the capacity and from there I could set up SmartPool policies to put the data where I want to it to reside. That's the end goal, to put the data where it needs to be because SmartPools doesn't provide the full functionality I need.
As for SmartPools itself, it needs to be smarter than "I've hit 97% capacity so I'm going to spill over to other pools" instead of saying "Ok, lets stop writing to this pool since it's at "X" capacity and handle the data between the other two pools until pool 3 is below X capacity". The results of letting SmartPool railroad a specific pool, as you know, has a severe performance impact that in production is not tolerable. Too much hand-holding with SmartPools as it is, I need it to actually be "smart"
trimbn
3 Posts
0
October 12th, 2016 13:00
Pretty sure we have a product requirement for this due in the next couple of releases timeframe.
Scott G – care to add any color or context to this?
Peter_Sero
4 Operator
•
1.2K Posts
1
October 12th, 2016 21:00
> Can we do it based on "directory" so I could at least see every directory/folder/subfolder and which pool it is located?
Keep in mind that with timestamp-based policies, each subfolder can contain
files in different pools. That makes it somewhat complex and I assume rather
than a having a generic script one will end up with a custom tool that
specifically reflects the rationale of the actual filepool policies in a given environment.
My suggestion to look at the job reports first comes from the fact
that these reports are already there with no additional efforts or overhead...
also @trimbn and @gentrs1:
"Smart" would be not just to migrate files until some threshold is reached,
but to migrate files in order of (whatever chosen) timestamp, from oldest
to newest, or even "coldest" to "hottest", up to a threshold or balance criterion.
definitely staying tuned...
-- Peter
Brian_Coulombe_
1 Rookie
•
107 Posts
2
October 14th, 2016 04:00
Thanks Gents. So a few important things I would hope that EMC would integrate into Isilon/SmartPools/FilePool Policies:
1. Set a "capacity limit" between tiers on Isilon and make OneFS smart enough to know when a pool is reaching the 90% capacity mark and either stop smartpools (with a BIG RED FLASHING WARNING) or pause it and send a note "Hey bud, delete some data because I can't do my job as your pool is at X capacity). If there are more than 2 pools, let SmartPools work on only the two pools that have not reached a set capacity limit for SmartPools.
2. Set a way for SmartPools to say "This directory/folder/path" needs to stay on Tier 1/2/3. For example, if I have an "ingest" folder for high I/O on S Pools. I want all new data to hit that path and stay there for X days, then move to an archive folder. I could do that based on "last modified" date. Later with ECS and Cloudpools I could use "last access" to say whether I want to send that data to the Cloud or not.
3. I really, REALLY want IIQ to tell me "Last accessed". I saw how we could get that info from Isilon but it puts so much overhead on the cluster it's not worth risking an impact to our customers. How hard is it to get "last accessed" integrated into IIQ?
There's about 100 different things I cant think of that could be fixed/enchanced/updated/corrected with OneFS. You can always reach out to me if you wish.