Highlighted
jbagdoo
1 Copper

Why is the replicated mtree much bigger than its source?

A ddboost mtree of size 30 TB (compressed) is replicated to a remote Data Domain and the resulting remote mtree  has a size of 39 TB.

What explains the extra 9 TB?

Both DDs are indentical:

DD860

version 5.7.2.0.532316

0 Kudos
7 Replies
hclim1
1 Copper

Re: Why is the replicated mtree much bigger than its source?

Source and Remote Data Domain have only one mtree?.

0 Kudos
jbagdoo
1 Copper

Re: Why is the replicated mtree much bigger than its source?

yes only one mtree is replicated.

0 Kudos
oldhercules
2 Iron

Re: Why is the replicated mtree much bigger than its source?

the question is if you have one or more mtrees in the source/target systems.

Dedupe is global within a DDR, so the mtree might consume more space on an another system. check the size with physical capacity measurement (I'm also interested in the results )

jbagdoo
1 Copper

Re: Why is the replicated mtree much bigger than its source?

There are 2 Mtrees per DD.  One Mtree holds Netbackup backups and is not replicated, i.e. that NetBackup Mtree at the source DD holds different data than the Netbackup Mtree at the destination DD.  The second Mtree is Oracle Database RMAN DDBoost Mtree that is replicated from the source to destination. 

Here are physical capacity mesurements for the NetBackup Mtree at Source DD

Used (Post-Comp):

21.2 TiB

Compression:

4.9x

Here are physical capacity mesurements for the NetBackup Mtree at destination DD

Used (Post-Comp):

10.4 TiB

Compression:

2.1x

Here are physical capacity mesurements for the DDboost Mtree at Source DD

Used (Post-Comp):

25.9 TiB

Compression:

1.9x

Here are physical capacity mesurements for the DDboost (replication of source) Mtree at destination DD

Used (Post-Comp):

31.6 TiB

Compression:

1.9x

0 Kudos
rugby01
2 Bronze

Re: Why is the replicated mtree much bigger than its source?

Why is one mtree bigger on the target then the source?  This goes very deep into metadata reporting and "Original Block" storage reporting.   When you run your first backup on the data domain with one mtree it will be probably the only time to get an actual 99.9% correct usage report.   When you add more mtree's, backups, and store the second backup of  a duplicate block, it points meta data to the original mtree usage block. What happens next is very important.

The first block should have saved at 50% reduction (compression) and lets say used 32K of disk

The second duplication block reports 0% disk usage and points to the original block and reports usage of 0K disk

So 14 days go by and the original block (lets say backup) get removed, and cleaning runs. The block doesn't get erased because the second backup is pointing to it.  it is still using 32K.

The first mtree reduces the usage by the original block 32K, but the second block usage doesn't change on the mtree it still reports 0K from the original creation point.   This causes everyone to  get different storage numbers between array with multiple mtrees and lots of deleted blocks.

This has been an acceptable problem for years, because the overhead to recalculate file ownership size is no easy, and frankly not worth it.. The only way to get an actual block usage report from the data domain is to perform a file scan with your SE to get actual block usage. The file scan takes a long time and a lot of CPU, but you do get actual blocks used per mtree, if you were to move or replicate it (but even that is worst case since it my find deduplication blocks again).

0 Kudos
oldhercules
2 Iron

Re: Why is the replicated mtree much bigger than its source?

Physical capacity measurement should address exactly the same problem and

it should show the capacity needs to store a specific mtree (or files below

a specific path) on a different DD without counting on the global

deduplication. So if the pre-comp size is the same on both DDs for the

replicated mtrees, but the post-comp value differes (and we are using the

same compression algorithm for both fs) it means that we shouldn’t rely on

physical capacity measurement results...

rugby01 <emc-community-network@emc.com> (időpont: 2018. jún. 11., H, 18:26)

ezt írta:

DECN <https://community.emc.com/?et=watches.email.thread>

Why is the replicated mtree much bigger than its source?

reply from rugby01

<https://community.emc.com/people/rugby01?et=watches.email.thread> in *Data

Domain* - View the full discussion

<https://community.emc.com/message/1000612?et=watches.email.thread#1000612>

0 Kudos
oldhercules
2 Iron

Re: Why is the replicated mtree much bigger than its source?

pre-comp size is the same?

0 Kudos