Unsolved

1 Rookie

 • 

62 Posts

1001

November 17th, 2021 05:00

Tiering in "In-Line dedupe" enabled mixed cluster - what am i missing?

The first paragraph on page 14 of Dell EMC White Paper "Next-Generation Storage Efficiency With Dell EMC PowerScale In-Line Data Reduction", release October 2021, states the following:

"In general, OneFS does not allow deduplication across different disk pool policies. For in-line dedupe, a deduplicated file that is moved to another tier will retain the shadow references to the shadow store on the original disk pool. While this behavior violates the rule for deduping across different disk pool policies, it is preferred to do this than rehydrate files that are moved. Further deduplication activity on that file will no longer be allowed to reference any blocks in the original shadow store. The file will need to be deduped against other files in its new node pool."

What does this mean in reality? The file (or rather the file system object representing the file in this case) gets tiered to a tier/pool without in-line dedupe according to file pool policy, but the actual data the file consists of just keeps using space on the original - in-line dedupe enabled tier/pool - because it keeps the references to shadow stores on this tier/pool?

If so, what tasks or situations would release the occupied space on the in-line dedupe enabled tier/pool? Only a SmartDedupe job run on the file in question that makes it point to shadow stores in the new tier/pool?

If this specific file was one of many in the in-line dedupe enabled tier/pool that references the shadow stores in question, that would not be a problem as the shadow stores would have to keep existing when this specific file gets moved to a different pool (as other files would still reference the shadow stores in question).

But what if this specific file was the last one in the in-line dedupe enabled tier/pool with blocks referencing the shadow stores in question? Would the shadow stores be released/deleted and free up space in this tier/pool? The way I understand it, not until the file gets deduped (via SmartDedupe job?) against the shadow stores on the new tier/pool...

What am i missing? In a classic price/performance tiering layout in a mixed cluster (10-30% tier 1/SSD, 70-90% tier 2/HDD), that would have a terrible effect when using F-type nodes with active in-line dedupe as tier 1 and H/A-type nodes without in-line dedupe as tier 2. Why? As new data gets ingested with in-line dedupe into tier 1 and moved to tier 2 over time by policy, all in-line deduped data would actually not release space on tier 1 because of this... ...and tier 1 space usually is sparse due to $/TB and one wants to keep tier 1 as small as possible with as little data in it as necessary, simply to raise the price/performance ratio on the whole cluster, right???

Again, what am I missing?

Moderator

 • 

7.7K Posts

November 17th, 2021 15:00

Hello CendresMetaux,

Here are a link of kb that maybe of assistance.  https://dell.to/30vJ3q8

1 Rookie

 • 

62 Posts

November 18th, 2021 01:00

Hi Sam

Thanks for jumping in, but your article - Balancing/Autobalancing - touches a different matter (balancing usage/performance inside one tier consisting of different node types) and does not help w.r.t my question...

Anyone?

Top