4 Operator

 • 

1.2K Posts

January 6th, 2016 22:00

It seems the overall capacity information is not recorded simultaneously by path/LIN and by disk_pool.

What comes closest:

list_top_n_files_by_log_size

list_top_n_files_by_phys_size

but obviously only for a small set, the largest files.

You can get pool statistics per single subtree with

isi filepool apply -r -s DIR

which runs quickest shortly after a SmartPools job.

(actually is gives you "file pool" statistics, which would need to

be mapped into "disk pools" in a post processing step.)

I am currently experimenting with statistics by progressive

random sampling of /ifs (pick random LINs, run

as long as needed, about 1000 LIN/min in a single thread). Interested?

Cheers

-- Peter

January 6th, 2016 23:00

It feels like the "isi filepool apply .." is also doing a disk walk

My problem is that one directory under /ifs/data owns about 162,000,000 files.

With that many files, even 1000 LIN/min is just too slow.

4 Operator

 • 

1.2K Posts

January 7th, 2016 04:00

This idea is to sample only a tiny fraction of files, say 10 or 100 thousand, and extrapolate...

January 10th, 2016 16:00

If I were a statistician or otherwise not being an engineer, then maybe I'd go with the random sample approach.

But as an engineer I can't take the risk of missing outliers that might mean missing a 100GB file among a collection of files that are mostly 20-30GB in size.

It's just a shame that without the LIN key there is no way to join any of the stats_* tables in results.db with disk_usage.

Another use for the information for the data in disk_usage would be to create a graph or presentation similar to what "Sequoia View" does for Windows filesystems.

4 Operator

 • 

1.2K Posts

January 12th, 2016 07:00

Correct, and it gets even worse when you have a "few" GB-files file among millions of KB-files...

Random sampling is most useful for file counts -- it can give a reasonable picture of where those (100s of) million file are located within an hour or even minutes.

Cheers

-- Peter

January 12th, 2016 22:00

Is there a way to file an RFE to get the "stats_*" tables updated in results.db to include the LIN for each file?

No Events found!

Top