Data Domain: Compression Frequently Asked Questions

Summary: This article answers the most frequent questions regarding compression. Data Domains are independent of data type. Data Domain uses compression algorithms that back up only unique data - duplicated patterns or multiple backups are stored only once. ...

Affected Products

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Check out other resources

Instructions

Show | Hide

Do incremental and full backups use the same disk space?
Why do 'filesys show space' and 'filesys show compression' show different numbers?
Why does 'filesys show compression last 24 hours' not match expectations for VTL?
How is the cumulative compression ratio calculated?
How does Data Domain compression work?
Does Data Domain support multiplexing?
With 1-to-1 directory replication, why does the replica show better global compression?
What is the change in compression when using lz, gzfast, and gz local compression settings?

Typical compression rates are 20:1 over many weeks of daily and incremental backups. Data type affects the compression ratio - compressed image files, databases, and compressed archives (such as .zip files) do not compress well.

Do incremental and full backups use the same disk space?

Ideally, this would be true. In practice, the full backup uses a little more space than the incremental for the following reasons. These reasons also explain why a full backup after no changes in data still consumes a positive amount of space.

The metadata takes about 0.5% of the logical size of the backup. Suppose that:
- The logical size of the full is 100 GB
- The logical size of the incremental is 2 GB
- The incremental compresses to 1 GB
- ...then the full takes at least 1.5 GB
The DD compression engine rewrites some duplicate data segments for performance. The poorer the data locality of the changes, the more the duplicates are written. The duplicates are later reclaimed by file system garbage collection (GC). In some cases about 2% of the logical size is rewritten as duplicate. Assuming this level of duplicates, the full might take 1 GB (compressed) + 0.5 GB (metadata) +2 GB (duplicates) = 3.5 GB. The amount of duplicates written can be controlled through a system parameter, but we generally do not tune this parameter in the field.
The data segmentation may vary a little from backup to backup depending on the order in which the NFS client sends the data. This order is not deterministic. In general, the segmentation algorithm tolerates shifts and reordering. However, it also creates some "forced" segments, which are prone to shifts and reordering. Typically about 0.2% of the segments are forced, so that much more space usage can be expected.

Why do '`filesys show space`' and '`filesys show compression`' show different numbers?

'filesys show space' provides the compression ratio based on the logical size of the data stored and the disk space used at the time that the command is run.
'filesys show compression' provides the compression ratio based on how each file was compressed at the time that it was created.
'filesys show compression' is used mostly for support and debugging. In the presence of file deletes, 'filesys show compression' overestimates the compression ratio.

For example, assume that:

The first full backup gets 2x compression
A subsequent full backup without any data changes gets 200x compression
The first full backup is deleted

The output of 'filesys show space' would show a compression ratio of 2x, while 'filesys show compression' would show a compression ratio of 200x, because the only file that exists now got a compression ratio of 200x when it was created.

In the example above, after the second backup, 'filesys show space' would show a cumulative ratio of about 4x. The cumulative ratio would improve asymptotically towards 200x if continuing with more backups without deletion.

There are some other minor differences. The 'filesys show compression' command:

Does not account for container-level wastage, thus overestimating the compression ratio further
Does not account for duplicate-elimination by global compression, thus underestimating the compression ratio
Can provide per-file or per-directory information, while 'filesys show space' is limited to the entire system
Provides the breakdown between global and local compression, while 'filesys show space' does not

Why does '`filesys show compression last 24 hours`' not match expectations for VTL?

For VTL, the output of commands such as 'filesys show compression last 24 hours' often does not meet expectation based on other sources such as 'system show performance'.

The problem happens due to a peculiarity in 'filesys show compression'. In general, it shows cumulative stats in selected files. The qualifier "last 24 hours" selects files that were updated in the last 24 hours. The stats are still cumulative since the file was created or last truncated to zero size. Thus, if a file was appended in the last 24 hours, 'filesys show compression last 24 hours' shows its cumulative stats before the last 24 hours.

Backup files in non-VTL environments are written only once, so there is little discrepancy between files updated and files created. With VTL, backups may be appended to existing tape files. For example, consider a 100 GB tape that is filled up to 50 GB. If 10 GB of data were appended to this tape in the last 24 hours, 'filesys show compression last 24 hours' would show the file's "Original bytes" written at 60 GB.

How is the cumulative compression ratio calculated?

Individual compression ratios do not add up linearly.

Suppose that the compression on the first full backup is 2x and that on the second full backup is 20x. The cumulative compression is not (2 + 20) / 2 = 11x, but 2 / (1/2 + 1/20) = 3.64x.

In general, lower compression ratios have more impact than higher ones on the cumulative compression ratio.

Suppose that the ith backup has logical size si and compression ratio ci. Then, the cumulative compression ratio for k backups can be computed as follows:

C = (total logical size)/(total space used)
total logical size = s1 + s2 + .. + sk
total space used = s1/c1 + s2/c2 + ... + sk/ck

Often, the logical sizes are roughly the same. In that case, the above calculation simplifies to the following:

C = k / (1/c1 + 1/c2 + ... + 1/ck)

For example, if:

The first full backup gets 3x compression
Each subsequent full gets 30x compression
The retention period is 30 days

the user sees a cumulative compression of 30 / (1/3 + 29/30), or 23x.

How does Data Domain compression work?

This question is answered in detail in a separate article: Understanding Data Domain Compression

Does Data Domain support multiplexing?

Multiplexed data from the backup application results in very poor global deduplication. For more information, see this article: Data Domain: Multiplexing in Backup Software

With 1-to-1 directory replication, why does the replica show better global compression?

This is usually because of variations in the level of duplicate segments written on the system:

The data stored at the source has been deduplicated once - against the previous data stored at the source.
The data sent over the wire has been deduplicated once - against the data stored at the replica.
The data stored at the replica has been deduplicated twice, once when the data was sent over the wire, and again when the received data is written on the replica.

Since the deduplication process leaves some duplicates, data that has been deduplicated multiple times has fewer duplicates. The data stored at the source and sent over the wire is deduplicated once, so they are roughly the same, assuming the data stored at the source and the replica are similar. The data stored on the replica is deduplicated twice, so it is better compressed.

File system cleaning removes most of the duplicates. Therefore, after cleaning has been run on the source and the replica, the amount of data stored there should be about the same.

What is the change in compression when using `lz`, `gzfast`, and `gz` local compression settings?

Use the following command to change the local compression algorithm used in a Data Domain:

filesys option set compression {none | lz | gzfast | gz}

Note: The file system must be shut down prior to changing the local compression type. It can then be restarted immediately after the compression option has been set.

In general, the order of compression is as follows:

lz < gzfast < gz

Type	Expected comp.	CPU load
none	1x	0x
lz	2x	1x
gzfast	2.5x	2x
gz	3x	5x

The rough difference is:

lz to gzfast gives ~15% better compression and consumes 2x CPU
lz to gz gives ~30% better compression and consumes 5x CPU
gzfast to gz gives ~10-15% better compression

Note that changing the local compression first affects new data written to the Data Domain after the change was made. The old data retains its previous compression format until the next cleaning cycle. The next cleaning cycle copies forward all the old data into the new compression format. This causes the cleaning to run much longer and take more CPU.

If the system is already low on CPU, particularly if backups and replication are running simultaneously, this can slow down the backups and. The customer may want to explicitly schedule some time to do this conversion.

Additional Information

Knowledge References:

Affected Products

Data Domain

Products

Data Domain

Article Number: 000022100

Article Type: How To

Last Modified: 24 Apr 2026

Version: 12

Check if your device is covered by Support Services.

Data Domain: Compression Frequently Asked Questions

Summary: This article answers the most frequent questions regarding compression. Data Domains are independent of data type. Data Domain uses compression algorithms that back up only unique data - duplicated patterns or multiple backups are stored only once. ...

Instructions

Additional Info

Affected Products

Instructions

Table of Contents

Do incremental and full backups use the same disk space?

Why do '`filesys show space`' and '`filesys show compression`' show different numbers?

Why does '`filesys show compression last 24 hours`' not match expectations for VTL?

How is the cumulative compression ratio calculated?

How does Data Domain compression work?

Does Data Domain support multiplexing?

With 1-to-1 directory replication, why does the replica show better global compression?

What is the change in compression when using `lz`, `gzfast`, and `gz` local compression settings?

Additional Information

Knowledge References:

Affected Products

Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

Data Domain: Compression Frequently Asked Questions

Summary: This article answers the most frequent questions regarding compression. Data Domains are independent of data type. Data Domain uses compression algorithms that back up only unique data - duplicated patterns or multiple backups are stored only once. ... View More View Less

Detailed Article

Instructions

Additional Info

Affected Products

Instructions

Table of Contents

Do incremental and full backups use the same disk space?

Why do 'filesys show space' and 'filesys show compression' show different numbers?

Why does 'filesys show compression last 24 hours' not match expectations for VTL?

How is the cumulative compression ratio calculated?

How does Data Domain compression work?

Does Data Domain support multiplexing?

With 1-to-1 directory replication, why does the replica show better global compression?

What is the change in compression when using lz, gzfast, and gz local compression settings?

Additional Information

Knowledge References:

Affected Products

Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

Summary: This article answers the most frequent questions regarding compression. Data Domains are independent of data type. Data Domain uses compression algorithms that back up only unique data - duplicated patterns or multiple backups are stored only once. ...

Why do '`filesys show space`' and '`filesys show compression`' show different numbers?

Why does '`filesys show compression last 24 hours`' not match expectations for VTL?

What is the change in compression when using `lz`, `gzfast`, and `gz` local compression settings?