Unsolved
This post is more than 5 years old
1 Rookie
•
64 Posts
0
719
Dedupe Stats
I'm confused about what is reported by the dedupe stats:
Deduplicated Data: 8.65 TB from 23.7 TB configured for deduplication
Savings from Deduplication: 15.1 TB
corresponds (mostly) to:
8.65TB = 23.7TB - 15.1TB
Deduplicated Data = (cluster.dedupe.estimated.deduplicated.bytes - cluster.dedupe.estimated.saved.bytes)
I know the stats probably look at blocks, but the bytes stats is close enough for my reporting and was the first thing I tried.
Where does the 23.7TB number comes from when I actually have close to 90TB of data that is being scanned for deduplication? I realize not all of that data will be deduped but in calculating a percentage from these numbers I get a 36% deduplication rate, which doesn't seem right (based on my 90TB, that should be closer to 10%).
Am I using the wrong stats to calculate dedupe rate and total savings? If I'm pointing to a directory with 90TB of data, shouldn't "configured for deduplication" be 90TB, even if not all of it dedupes?