Data Domain System and Cleaning Performance Impact of Converting to GZ Compression
Summary: This article provides information about different compression algorithms supported on the DDR along with the impact of converting the system to use the GZ algorithm.
Instructions
SOLUTION
DDOS currently supports four different compression types with varying levels of compression CPU load. The following table summarizes the characteristics of each option:
type expected-comp CPU-load ---------- -------------------- -------------- none 1.0x 0x lz(default) 2.0x 1x gzfast 2.5x 2x gz 3.0x 5x
For example, the gz compression algorithm gives about 3x local compression but uses about 5x more CPU to run the compression part of the code than lz. The expected compression figures can vary greatly based on the data type. For some data types, gz may only be 10% better than lz while for other data types, it is 2x or better than lz.
In general, files with lots of similar strings of data tend to compression better with gz than lz.
Examples of such datasets include:
- Database files.
- Log files.
Consequences of using GZ Compression:
Since the stronger compression algorithms use more CPU, they can have significant performance consequences:
- Backups with low deduplication run slower since more new data must be compressed and written to disk. In particular, the first full backup will likely achieve 50% of the rated peak throughput.
- Since cleaning uncompresses and recompresses the data while it is running, cleaning may take longer to run and may slow other activity on the system such as backups and replication.
- The source DDR in a directory replication pair compresses the data using the compression algorithm used by the destination before sending the data. Therefore, if the destination uses the gz compression algorithm, replication may run slower and may cause other activity on the system such as backups and cleaning to run slower.
Therefore converting to GZ compression is a decision that should be taken based on the workload the system is going to experience. Otherwise, a capacity problem will essentially be converted into a performance problem.
In the following section we describe the characteristics of the workload where GZ would be helpful.
Who should use GZ compression?
Applications with high deduplication, low churn, and low backup performance requirements are ideal candidates to use gz. A good example is nearline applications. Most DDRs in the field used for nearline applications already use gz.
How do I change the compression type?
Use the following commands to change the compression type:
# filesys disable
# filesys option set local-compression-type {none | lz | gzfast | gz}
# filesys enable
Once the compression type is changed, all new writes use the new compression type and any data already written will be lazily converted to the new compression type during cleaning. The lazy conversion means that not all containers will be recompressed during the first round of cleaning. It takes several rounds of cleaning to fully recompress all the data existing on the DDRs prior to the change of compression policy.
Cleaning policy determines which containers are selected in a particular round of cleaning and only those containers are recompressed. The cleaning policy is based on the amount of garbage data a given container holds. Garbage data means deleted data which is no longer referenced by the namespace. The more garbage that a container has, the more likely it is to be selected for cleaning.
If the customer wants to pay a one-time hit, they can follow the following procedure:
-
Disable DDFS by using the command:
filesys disable
-
Use the following command to disable the lazy conversion (requires SE Mode):
reg set system.GC_APPLY_LAZY_CONVERSION=false
-
Enable DDFS by using the command:
filesys enable
As a result, the first clean after changing the compression type and disabling the lazy conversion may take a longer time to run. Whenever changing the compression type, you should carefully monitor the system for a week or two to make sure it is behaving well.