EMC Data Domain deduplication storage systems continue to revolutionize disk backup, archiving, and disaster recovery with high-speed, inline deduplication. By consolidating backup and archive data on a Data Domain system, you can reduce storage requirements by 10-30x, making disk cost-effective for onsite retention and highly efficient for network-based replication to disaster recovery sites.
Compression is a data reduction technology which aims to store a data set using less physical space. In Data Domain systems (DDOS), we do global compression and local compression to compress user data.
Today, we will introduce global compression and local compression on Data Domain.
EMC Data Domain Stream-Informed Segment Layout (SISL) Scaling Architecture Overview
Before we introduce global compression and local compression, we need to know Data Domain Stream-Informed Segment Layout (SISL) Scaling Architecture.
SISL data flow: Step by Step as shown in below.
1. Segment: SISL slices the incoming data into segments, 4 to 12 KB in size.
2. Fingerprint: SISL then creates fingerprint for each segment.
3. Filter: The summary vector and segment locality techniques identify 99% of the duplicate segments in RAM, inline, before storing to disk. If a segment is a duplicate, it is referenced and discarded. If a segment is new, the data moves on to step 4.
4. Compress: New segments are grouped and compressed using common algorithms: lz, gz, gzfast (lz by default).
5. Write: Writes data (segments, fingerprints, metadata and logs) to containers, and containers are written to disk.
Understanding global compression and local compression
After understanding SISL data flow, let us look at what global compression and local compression is.
Global compression equals deduplication. Global compression is used to identify redundant data segments and store only unique data segments.
Global compression corresponds to the 1st, 2nd and 3rdsteps data deduplication of SISL. Global compression cannot be turned off.
Local compression further compresses the unique data segments with certain compression algorithms (for example, lz, gz, and gzfast).
Local compression corresponds to the 4th and 5th steps data deduplication of SISL. Local compression canbe turned off.
The overall user data compression is the joint effort of global compression and local compression. This is the final result of deduplication. DDOS uses "compression ratio" to measure the effectiveness of its data compression.
An example of daily monitor on Data Domain
We can use CLI command to view the values of global compression and local compression. The command is filesys show compression. The output is as below:
View the data of Last 7 days, we know Pre-Comp (The original backup data) is 34260.2GiB. Global-Comp (Global compression) ratio is 2.2. Local-Comp (Local compression) ratio is 3.2. Post-Comp (The data written to disk) is 4899.2GiB. Total-Comp (The overall user data compression) is 7.0. (2.2 * 3.2≈7.0).
View the data of Last 24 hours, we know Pre-Comp (The original backup data) is 4845.0GiB. Global-Comp (Global compression) ratio is 2.1. Local-Comp (Local compression) ratio is 3.0. Post-Comp (The data written to disk) is 784.3GiB. Total-Comp (The overall user data compression) is 6.2. (2.1 * 3.0≈7.0).
By understanding the concept of data deduplication and data compression on DDOS, can help you to monitor the space usage effectively, it also has some significance to adjust your backup strategy.