This post is more than 5 years old
2 Intern
•
300 Posts
0
10050
October 28th, 2015 08:00
Enabling L3 Cache and populating it
Hi Guys,
I'm planning to enable L3 Cache on several upgraded clusters. And have some questions regarding that process.
Current Situation:
- Freshly upgraded Clusters on 7.1.1.x
- MultiScan Jobs etc. are finished
- The Clusters are productive for our customer
- We use our SSDs for Metadata-Read acceleration.
- The current workload is mostly random access and files with <128K are also quite often in use --> we should profit from L3 Cache
- Smartpool license is NOT in use
- on the clusters is only one nodepool used
- the SSDs have space left
Planned process:
isi storagepool nodepool modify -l3 true
wait for jobs to complete
profit.
Since the L3 activation process requires the metadata copies to be evacuated from the SSD with SetProtectPlus and FlexProtect before reformatting the drives as L3 cache, we won't profit from metadataread nor L3 cache until reformatting finished. I want to minimize the performance impact for our customer and thus have the following questions
L3 Cache is populated by evicted blocks from L2 cache. My plan is to have the L3 Cache populated with metadata at the time the customer accesses the data (after the weekend), so he has no performance impact by not having metadataread enabled.
Can i force a population of L3 cache with metadata by just stupidly doing treewalks (via SMB/NFS)?
In my mind the metadata will reside either in L1/L2 or L3 Cache after a complete treewalk (and nothing else!) finished. As soon as data is queried, the meta and data will be evicted from L1/L2 and populated to L3 . if it gets old, it will also be evicted from L3. Since metadata operations (in my environment) are more frequently used than data reads i hope to have a lot of often used metadata and additional data in L3 Cache.
Does FSAnalyze or Snapshot(Deletes) populate metadata to Caches?
If i delete a Directorytree with files do i populate this data into the caches or are caches cleared from deleted files? (i guess the first...)
Is a timecalculation based on experiences from another (comparable) cluster valid?
i.e.
Cluster 1 has 3Nodes and 1Million LINs. it needs 15 Minutes to evacuate Metadata and reformat the SSDs.
Cluster 2 has 3Nodes and 10Million LINs (based on last FSAnalyze). I calculate 2:30 hours for metadata evacuation and reformating of the SSDs
Thanks and Regards
--sluetze



Peter_Sero
4 Operator
•
1.2K Posts
0
October 29th, 2015 02:00
When warming up the L3 cache with treewalks, record the isi_cache_stats outputs.
With some delay (minutes... hours) start additional instances of the same treewalks and see how the L3 cache hit rates rise up...
If the picture turns out inconsistent or unexpected things happen, let us know. It's the kind of things one cannot test with the virtual nodes.
Likewise I'd say your time calculation is best effort on limited information, just go for it and keep the eyes open.
-- Peter
carlilek
2 Intern
•
205 Posts
1
October 29th, 2015 05:00
Ah, big maintenance windows must be nice...
sluetze
2 Intern
•
300 Posts
0
October 30th, 2015 01:00
Hi Peter,
i tried this with my test-system. it's a physical Cluster which is only in use by ~10 Persons.
I ran a treewalk and waited a few hours. Then i started a second treewalk (2times in parallel). A day after i started a third set of treewalks... the results are quite... shocking...
BUT: i have a prefetch miss Ratio of 0.0% in L1 and 0.6% in L2. are the treewalks maybe sequential, thus L1/L2 prefetching data from L3 and these prefetchs do not count into L3 stats?
@Carlilek:
I don't have such a big maintanance window... but 95% of users are gone over the Weekend, which leaves the Clusters pretty idle for some "extrawork"
Regards
--sluetze
carlilek
2 Intern
•
205 Posts
0
October 30th, 2015 05:00
My L3 experiments (on my primary prod cluster....) were a total failure when it came to metadata. My cache emptied so fast that the metadata just didn't hang around. YMMV and certainly I had/have other issues going on, even after reverting back to metadata r/w.
Peter_Sero
4 Operator
•
1.2K Posts
0
October 30th, 2015 06:00
These are cumulative stats beginning from reboot
or execution of "isi_cache_stats -z" (=zero).
Better zero the stats to start a new recording interval,
or use plain "isi_cache_stats" for real-time stats.
-- Peter
sluetze
2 Intern
•
300 Posts
0
November 1st, 2015 23:00
Hi Peter,
i zeroed after enabling L3 Cache.
Carlilek: quite interesting. If i understood the Technologie korrekt, the Data / Metadata will fill up the SSDs until they are full. then the oldest blocks will be discarded.
Well let it put me this way:
On this Testcluster I have more SSD-Space than Data and Metadata combined. So at some Point every file smaller 127K should be in L3 Cache...
-- sluetze
Peter_Sero
4 Operator
•
1.2K Posts
1
November 2nd, 2015 01:00
Ok, so what exactly is "shocking" here.
You have exceptional high hit rates on L2 level for both data and metadata.
The L2 misses are below 1% -- and notice how the L2 "miss" block rates
are showing up nearly identically as the L3 read start, for data and metadata respectively.
This is exactly how a multilevel/tiered cache system should work.
Now on L3 level there are no hits at all, which is kind of sad and unexpected.
Is this what you called "shocking"? Keep in mind that the absolute rate
of attempted L3 reads (in blocks/s or MB/s) is still low, almost "noise",
compared to the actual system throughput as seen on L1 and L2 levels.
I'd say that for the moment you are still "lucky" with your L2 cache and
these tiny amounts of L2 misses are stuff that had no chance to get cached
on any level so far.
Once your L2 hit rates diminish and you see substantial cache misses here,
it's when it really gets interesting for L3 data and L3 metadata...
Cheers
-- Peter
sluetze
2 Intern
•
300 Posts
1
November 3rd, 2015 23:00
Hi Peter,
the shocking part for me were the low percentages in t he L3-Cache. Cause of this was, that there is just too less traffic and data on the box. thus all the data/metadata resides in L1 and L2.
In the last days the L3 metadata hits raised to 35,6% - but looking on the numbers i see 93384 misses and 51633 hits. So i have 62790 new read.starts and 51633 hitted. That makes a very good 82% for this new hits.
Thanks for your support and your feedback.
Regards
--sluetze
carlilek
2 Intern
•
205 Posts
0
November 4th, 2015 05:00
How is the performance compared to your old configuration?
sluetze
2 Intern
•
300 Posts
0
November 5th, 2015 00:00
Hi carlilek,
until now i only implemented l3 on our test-Cluster. ill provide feedback after implementing it on one of our productive Clusters.
Regards
-- sluetze
sluetze
2 Intern
•
300 Posts
1
December 15th, 2015 05:00
Hi Ken,
as promised some Feedback so here it is:
my calculation based on the experiences in my testenvironment was wrong. I calculated with a time of 19h (max) and 8h avg but it took ~60h on my first productive clusters (not the smallest but not the biggest).
Performance:
improved. it is hard to say in numbers but regarding read-requests and data we reduced the amount of read-requests which do not get answered by prefetch or cache by ~60%
Hitrates of data in L3 are ~60% still increasing
Hitrates of metadata in L3 (not warmed up!) are ~40% (increasing)
we have a setup where we measure the value of a file getting written and read afterwards which also showed improvements.
Reagards
-- sluetze
carlilek
2 Intern
•
205 Posts
0
December 15th, 2015 05:00
Interesting. How metadata heavy is your environment? Our L3 (metadata only) has been catastrophic.
sluetze
2 Intern
•
300 Posts
0
December 15th, 2015 07:00
we do not have real defined workflows, since we use the Isilon not for applications but as NAS for home and groupdrives (nearly) exclusively. So most of the "metadata" actions are users traversing the folders.
If I look on L1 Cache I can see 43 metadata blocks for every datablock.
Also for metadata operations the bottleneck in our environment would be the latency since most access comes from offsite.
carlilek
2 Intern
•
205 Posts
0
December 15th, 2015 08:00
Ah, completely different environment, then. Our ops are ~50-70% metadata, virtually all onsite, with low latency switching and 10GbE. Interactive stuff is horrific when it's a first access for metadata on the L3-metadata pool.
Peter_Sero
4 Operator
•
1.2K Posts
0
January 6th, 2016 20:00
Hello sluetze
how has it been going in the meantime?
There is a semi-obvious issue about L3 cache warming one should think through... (came to my mind during holidays)
The L3 cache is *local* to each node, as we know. This has consequences:
With metadata being mirrored on at least three nodes, any naive attempt to warm up the metadata cache by a single treewalk will only cache one mirror for each affected LIN.
Which means that later-on the odds of a cache MISS are still 2/3, or 2 against 1.

Unless of course there is one "preferred" copy of a LINs metadata to be always used. Having this one in the cache would be sufficient for all subsequent accesses from other nodes.
Seems like nitpicking but obviously the exact choice which metadata mirror to *read* will have tremendous impact on L2/L3 cache efficiency -- further thoughts and facts, anyone?
Cheers
-- Peter