the question is if you have one or more mtrees in the source/target systems.
Dedupe is global within a DDR, so the mtree might consume more space on an another system. check the size with physical capacity measurement (I'm also interested in the results )
There are 2 Mtrees per DD. One Mtree holds Netbackup backups and is not replicated, i.e. that NetBackup Mtree at the source DD holds different data than the Netbackup Mtree at the destination DD. The second Mtree is Oracle Database RMAN DDBoost Mtree that is replicated from the source to destination.
Here are physical capacity mesurements for the NetBackup Mtree at Source DD
Here are physical capacity mesurements for the NetBackup Mtree at destination DD
Here are physical capacity mesurements for the DDboost Mtree at Source DD
Here are physical capacity mesurements for the DDboost (replication of source) Mtree at destination DD
Why is one mtree bigger on the target then the source? This goes very deep into metadata reporting and "Original Block" storage reporting. When you run your first backup on the data domain with one mtree it will be probably the only time to get an actual 99.9% correct usage report. When you add more mtree's, backups, and store the second backup of a duplicate block, it points meta data to the original mtree usage block. What happens next is very important.
The first block should have saved at 50% reduction (compression) and lets say used 32K of disk
The second duplication block reports 0% disk usage and points to the original block and reports usage of 0K disk
So 14 days go by and the original block (lets say backup) get removed, and cleaning runs. The block doesn't get erased because the second backup is pointing to it. it is still using 32K.
The first mtree reduces the usage by the original block 32K, but the second block usage doesn't change on the mtree it still reports 0K from the original creation point. This causes everyone to get different storage numbers between array with multiple mtrees and lots of deleted blocks.
This has been an acceptable problem for years, because the overhead to recalculate file ownership size is no easy, and frankly not worth it.. The only way to get an actual block usage report from the data domain is to perform a file scan with your SE to get actual block usage. The file scan takes a long time and a lot of CPU, but you do get actual blocks used per mtree, if you were to move or replicate it (but even that is worst case since it my find deduplication blocks again).
Physical capacity measurement should address exactly the same problem and
it should show the capacity needs to store a specific mtree (or files below
a specific path) on a different DD without counting on the global
deduplication. So if the pre-comp size is the same on both DDs for the
replicated mtrees, but the post-comp value differes (and we are using the
same compression algorithm for both fs) it means that we shouldn’t rely on
physical capacity measurement results...
rugby01 <email@example.com> (időpont: 2018. jún. 11., H, 18:26)
Why is the replicated mtree much bigger than its source?
reply from rugby01
<https://community.emc.com/people/rugby01?et=watches.email.thread> in *Data
Domain* - View the full discussion