This post is more than 5 years old
90 Posts
0
2703
January 5th, 2014 06:00
DD Reporting
In regard to the report below:
From: 2013-12-27 06:00 To: 2014-01-03 06:00
Pre-Comp Post-Comp Global-Comp Local-Comp Total-Comp
(GiB) (GiB) Factor Factor Factor
(Reduction %)
--------------- -------- --------- ----------- ---------- -------------
Currently Used: 10294.1 1813.5 - - 5.7x (82.4)
Written:*
Last 7 days 2168.3 308.9 3.1x 2.3x 7.0x (85.8)
Last 24 hrs 0.0 0.0 1.6x 4.4x 6.8x (85.4)
--------------- -------- --------- ----------- ---------- -------------
* Does not include the effects of pre-comp file deletes/truncates
If no data is going to the DD in the last 24 hours, why is it reporting a reduction percentage of 85.4%? I believe, even though no data is going to the DD, that the Global Compression is reducing the data due to pure deduplication against data already resident on the DD system. Therefore, the data has also been deduplicated against the resident data and the resulting unique data is being processed (LZ by default) by the DD, thus coming up with compression factors shown above. Thoughts?
Thanks!
0 events found


PatrickBetts
1 Rookie
•
116 Posts
1
January 6th, 2014 15:00
Mikuszed,
I have reached out to our Filesystem team here at DataDomain and their synopsis (without having any further data) is that this is most likely metadata processing, file deletes, flushing btrees to disk and so on. They ran a simulation on a lab machine and got the same results.
dynamox
11 Legend
•
20.4K Posts
•
87.4K Points
2
January 5th, 2014 08:00
but DD does dedupe/compression inline so if there is nothing coming in how can it report that it did something. It's not like it's a post process that runs after data has been backed up. I am curious what others think.
ble1
6 Operator
•
14.4K Posts
•
56.2K Points
1
January 5th, 2014 09:00
First I thought of replication, but that should be (I guess) covered by pre/post compression too. Second thought is that due to data which is to be removed (eg. as backup application marks date removed, you can see set of data also marked eligible for next scheduled removal) - I'm not sure if calculation takes those into account (this value is seen as cleanable GiB column in df output).
I do not have idle DD system to check this myself, but I suspect that you did filesys show compression. Does filesys show compression daily-detailed give any additional info?
dynamox
11 Legend
•
20.4K Posts
•
87.4K Points
0
January 6th, 2014 19:00
so the question that comes in my head is how much these activities influence statistics when i do have real backup going on ? How much of what i see in "filesys show compression" is all this background metadata processing that i don't really care about.
mikuszed
90 Posts
0
January 7th, 2014 05:00
Good point, Dynamox. With some of the information here; https://community.emc.com/docs/DOC-31725, it looks as though metadata processing should be factored in when incrementals are taking place. This accounts for a subset of backup scenarios out there. The differences in under section two of the article is what stood out, due to the caveats that exist with "filesys show compression". There are many other factors, including metadata, to account for.
-Ed
dynamox
11 Legend
•
20.4K Posts
•
87.4K Points
0
January 7th, 2014 06:00
Ed,
that's a good document. So to that point, how much these metadata processing values change cumulative ratio for the entire box. Let's say last 24 hours i backed up a brand new set of data and only got 2x dedupe, but in those same 24 hours i was doing some metadata processing and it resulted in 8x values. If this 8x value has the same weight as 2x value then my cumulative ratio is skewed greatly and i am looking at somewhat bogus values and thinking oh wow ..relatively good rates.