Using SmartPools to manage 'in flight' data?

Question

Hi, We recently upgraded a cluster of Isilon 108NL nodes (OneFS 7.0.1.2) by adding six NL400 nodes to it, which then formed their own node pool. Since we want to retire the old 108NL nodes as part of the upgrade, we ran a SmartPools job whose filter simply was [File name matches '*'] and whose operation was to [Save data to 'NL400 pool'] and it successfully moved all the data, hundreds of terabytes, to the new nodes. However, SmartPools jobs continue to run against data that is, for some reason, still collecting in the old node pool.  Why?  Could it be, that, once SmartPools walks a particular part of the file system tree, and it records which files need to be moved, and then goes to another part of the tree to take further inventory, any new files written get skipped, until the job runs again? A recent discussion suggests that the use of a folder designation in the filter, rather than '*', may work better.  Would it?  For that matter, what is the best practice for directing any newly written, or 'in flight' data, to the new NL400 nodes only?  Any advice appreciated. Thanks, Don Pichette

jbauman · Accepted Answer

Could it be, that, once SmartPools walks a particular part of the file system tree, and it records which files need to be moved, and then goes to another part of the tree to take further inventory, any new files written get skipped, until the job runs again?

No, it doesn't work that way.

A recent discussion suggests that the use of a folder designation in the filter, rather than "*", may work better. Would it?

No, this should work just as well.

For that matter, what is the best practice for directing any newly written, or "in flight" data, to the new NL400 nodes only?

What you did sounds pretty appropriate to me.

I'm not sure exactly what you mean by "still collecting in the old node pool". Are you noticing specific files, or merely seeing nonzero utilization? Once the file pool policy was changed to set everything to the NL400 and the SmartPools job finished, all file data should be on the NL400 pool. There are a couple of potential exceptions why that might not be the case including files or directories being manually managed or the SmartPools job not having completed yet, or system data such as the LIN tree being stored on the old nodes. If you're aware of an example of a file that should be on NL400 according to the file pool policy but isn't, you can examine its attributes to help determine what's going on.

Do you know whether there are any files or directories on the cluster which are (intentionally) manually managed? If not, the easiest solution is probably to tell SmartPools to override that and move manually managed files as well.

Using SmartPools for migration will make the eventual removal of the 108NLs faster, but once they're down to very low utilization, you can SmartFail them. It's not like they have to be to absolute 0 before that.

AndrewChung · Answer

Your rule will move the files, however if you do not change your default policy to point all new writes to the NL400 nodes, you will likely find that your old nodes continue to get targeted for writes. You likely have the default policy being to write to the ANY node pool. You should change it to write to the NL400 node pool and this should solve your problem.

Your file name match filter will only run when SmartPools itself runs. If you instead targeted a directory path to live on the NL400 pool, then that would take effect on all new writes as well.

Peter_Sero · Answer

> If you instead targeted a directory path to live on the NL400 pool, then that would take effect on all new writes as well.

Yes, BUT only AFTER the immediate parent directory has been updated by SmartPools or "isi smartpools apply"...

The per directory "new file attributes" that Jon mentioned earlier act like a cache

for policies, but add one layer of indirection to take care of -- including unbounded

amounts of delay (if policies are changed, but SmartPools never run).

Take care

-- Peter

jbauman · Answer

Just to be clear, the default file pool policy isn't immediately applied. It requires a SmartPools (or SetProtectPlus) job to run just like user-defined file pool policies. If a rule matching '*' is first in the list, it will apply to all files and the default file pool policy shouldn't matter (except potentially for attributes not specified by the '*' file pool policy).

piched2 · Answer

Hi,Thanks for all the helpful insight so far.&#xa0; Some additional information has come to light, as we try to determine why approximately 2TB of 'data' is reported as still residing in the old pool.&#xa0; It does not look like this data was 'in flight' after all. A look at the SmartPools page shows that 1.98 TB, or .34%, is stubbornly staying behind, in the pool 'iq_108NL'.&#xa0; File System Explorer does not seem to indicate that this data is being manually managed.&#xa0; And finally, a log file of the last SmartPools job to run is appended.&#xa0; Could the 1.8TB be metadata?&#xa0; Is this normal?&#xa0; Some questions:1) FS Explorer doesn't list hidden folders.&#xa0; How to we find out about them?2) If there *were* any manually managed data, how would we find that out what the files were, and their attributes, beyond drilling down, and down, (and down!) using FS Explorer.&#xa0; Is there some kind of 'show all manually managed data' functionality?3) If files (data) were open by an application writing to them, might those files refuse to be moved by a SmartPools job as well?Screen shots are appended.&#xa0; We can of course open a case with tech support, but wonder if there is not some logical explanation we can come to without having to do that.Thanks,Donhere is the log from the last SmartPools run:> > primary-10# isi job history --job 20350 --verbose> Job events: > Time Job Event&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; > --------------- -------------------------- ------------------------------ > 01/05 16:04:06 SmartPools[20350]&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Waiting > 01/05 16:05:51 SmartPools[20350]&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Running (LOW) > 01/05 16:34:20 SmartPools[20350]&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Waiting > 01/05 16:34:21 SmartPools[20350]&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Running (LOW) > 01/05 18:05:29 SmartPools[20350]&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Waiting > 01/05 18:05:32 SmartPools[20350]&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Running (LOW) > 01/05 19:50:11&#xa0; SmartPools[20350]&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Phase 1: end lin policy update > FILEPOLICY JOB REPORT>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Elapsed time:&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 31062 seconds> {'default' :> {'Policy Number': -2,> 'Files matched': {'head':166426910, 'snapshot': 54269},> 'Directories matched': {'head':4800309, 'snapshot': 7502},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS containers matched': {'head':54205, 'snapshot': 1},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS streams matched': {'head':54344, 'snapshot': 2},> 'Access changes skipped': 0,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'Protection changes skipped': 0,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'File creation templates matched': 4800309,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; },> 'system':> {'Policy Number': -1,> 'Files matched': {'head':83008, 'snapshot': 0},> 'Directories matched': {'head':42361, 'snapshot': 0},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS containers matched': {'head':0, 'snapshot': 0},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS streams matched': {'head':0, 'snapshot': 0},> 'Access changes skipped': 3,> 'Protection changes skipped': 0,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'File creation templates matched': 42361,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; },>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'New Writes':> {'Policy Number': 0,> 'Files matched': {'head':166426910, 'snapshot': 54269},> 'Directories matched': {'head':4800309, 'snapshot': 7502},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS containers matched': {'head':54205, 'snapshot': 1},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS streams matched': {'head':54344, 'snapshot': 2},> 'Access changes skipped': 0,> 'Protection changes skipped': 0,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'File creation templates matched': 4800309,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; },>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; }>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; LIN scan>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Elapsed time:&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 31062 seconds>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; LINs traversed:&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 171554626>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Files seen:&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 166647589> Directories seen: 4906990>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Errors:&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 0>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Total blocks:&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 957154742334 (478577371167 KB)> 01/05 19:50:11 SmartPools[20350]&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Phase 2: begin sin policy update > 01/05 19:50:13 SmartPools[20350]&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Phase 2: end sin policy update > FILEPOLICY JOB REPORT>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Elapsed time:&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 1 second> {'default' :> {'Policy Number': -2,> 'Files matched': {'head':0, 'snapshot': 0},> 'Directories matched': {'head':0, 'snapshot': 0},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS containers matched': {'head':0, 'snapshot': 0},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS streams matched': {'head':0, 'snapshot': 0},> 'Access changes skipped': 0,> 'Protection changes skipped': 0,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'File creation templates matched': 0,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; },> 'system':> {'Policy Number': -1,> 'Files matched': {'head':0, 'snapshot': 0},> 'Directories matched': {'head':0, 'snapshot': 0},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS containers matched': {'head':0, 'snapshot': 0},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS streams matched': {'head':0, 'snapshot': 0},> 'Access changes skipped': 0,> 'Protection changes skipped': 0,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'File creation templates matched': 0,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; },>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'New Writes':> {'Policy Number': 0,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'Files matched': {'head':0, 'snapshot': 0},> 'Directories matched': {'head':0, 'snapshot': 0},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS containers matched': {'head':0, 'snapshot': 0},>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'ADS streams matched': {'head':0, 'snapshot': 0},> 'Access changes skipped': 0,> 'Protection changes skipped': 0,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 'File creation templates matched': 0,>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; },>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; }>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; LIN scan>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Elapsed time:&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 1 second>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; LINs traversed:&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 0>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Files seen:&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 0> Directories seen: 0> Errors: 0>&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Total blocks:&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 0 (0 KB)> 01/05 19:50:13 SmartPools[20350]&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Succeeded (LOW)

jbauman · Answer

Could the 1.8TB be metadata? Is this normal?

1.8TB is less than 0.5% of the total data. That is quite likely metadata.

FS Explorer doesn't list hidden folders. How to we find out about them?

isi get should work.

If there *were* any manually managed data, how would we find that out what the files were, and their attributes, beyond drilling down, and down, (and down!) using FS Explorer. Is there some kind of 'show all manually managed data' functionality?

No, this is why manually managed data is such a pain and should be avoided. It undermines the simplicity of management.

If files (data) were open by an application writing to them, might those files refuse to be moved by a SmartPools job as well?

The job would wait, I believe.

Another possibility is that some of that data is leaked blocks. It's such a small percentage, it doesn't seem disconcerting to me. Is there any reason not to just go ahead with the smartfail of the 108NLs?

Isilon

Was this post helpful?