Running clean on a Data Domain Restorer (DDR) does not reclaim the amount of physical space indicated by 'Cleanable Gb'
Summary: Running clean on a Data Domain Restorer (DDR) does not reclaim the amount of physical space indicated by 'Cleanable Gb'
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
Data Domain Restorers (DDRs) allow users to display system utilization via the 'filesys show space' or 'df' commands. For example:
# filesys show space
Active Tier:
Resource Size GiB Used GiB Avail GiB Use% Cleanable GiB*
---------------- -------- --------- --------- ---- --------------
/data: pre-comp - 1970382.1 - - -
/data: post-comp 150830.3 111365.1 39465.2 74% 8252.4
/ddvar 308.1 95.6 196.9 33% -
---------------- -------- --------- --------- ---- --------------
* Estimated based on last cleaning of 2016/06/24 09:45:56.
One of the uses of this command is to determine physical (post-comp) utilization of the system. When run against the active tier this output will include a figure called 'Cleanable Gb'. To explain this figure further:
This behavior is expected and is due to the design of DDFS.
# filesys show space
Active Tier:
Resource Size GiB Used GiB Avail GiB Use% Cleanable GiB*
---------------- -------- --------- --------- ---- --------------
/data: pre-comp - 1970382.1 - - -
/data: post-comp 150830.3 111365.1 39465.2 74% 8252.4
/ddvar 308.1 95.6 196.9 33% -
---------------- -------- --------- --------- ---- --------------
* Estimated based on last cleaning of 2016/06/24 09:45:56.
One of the uses of this command is to determine physical (post-comp) utilization of the system. When run against the active tier this output will include a figure called 'Cleanable Gb'. To explain this figure further:
- When a file is removed from the 'Data Domain File System' (DDFS) it is no longer visible on the DDR (it is removed from the DDFS name space) however any data referenced by that file is not immediately removed
- As a result the amount of free space on the system does not immediately increase
- To reclaim this space a process called garbage collection (GC) or cleaning must be run
- The purpose of cleaning is to look for data on disk which is superflous (i.e. referenced by deleted objects such as files, snapshots, or mtrees) and physically remove/clean up this data freeing space for new ingest
- The purpose of the 'Cleanable Gb' figure is to provide an indication of how much space is likely to be freed if cleaning were started at any given point in time
This behavior is expected and is due to the design of DDFS.
Cause
To understand how 'Cleanable Gb' functions it is first necessary to understand what happens when a file is written to a DDR:
Over time as the DDR is used it is expected that files are created/deleted/clean run. This can change the way an existing file de-duplicates. For example, lets assume that a file is written to the DDR which contains 1Mb of unique data - at the point of ingesting that file this 1Mb of data is referenced only by this file. Over time, however, a further 9 files are written to the DDR all of which contain the same 1Mb of data. Now this 1Mb of data is referenced by 10 files total (i.e. the way in which that 1Mb of data is referenced has changed).
Note, however, that despite the above life cycle of a files data, the per file compression statistics for each of the files written are not changed (i.e. the statistics for the very first file writing the 1Mb unique data to disk will still appear to show that this file 'owns' the data even though it is now referenced by 10 files).
As a result of this these per file compression statistics effectively go stale over time.
Now it is possible to discuss how the 'Cleanable Gb' value is generated:
Finally an example of this in practice:
- The file is sent to the DDR from the backup client - the DDR records the original (i.e. logical) size of the file (original bytes)
- The file is anchored and segmented (split into 4-12Kb chunks called segments) with a unique fingerprint being generated for each segment
- The fingerprint of each segment is checked against indices on the DDR - if the fingerprint already exists in indices the corresponding segment already exists on disk (i.e. it is a duplicate)
- Duplicate segments do not need to be written out to disk so are effectively replaced by a pointer to the segment already on disk
- Segments whose fingerprints do not exist in indices are unique/new so must be written to disk
- Once all duplicate segments have been replaced (i.e. the file has been de-duplicated) the DDR records the size of the unique segments (globally compressed bytes)
- The unique segments are compressed (by default with lz) before being placed in 4.5Mb containers and written out to disk
- The DDR records the size of the unique segments after compression (i.e. the physical disk space consumed by the files unique data) (locally compressed bytes)
- In addition to the above a map of segments making up the file (segment tree) is generated and written to disk - this is to allow the file to be read back/reconstructed in the future
Over time as the DDR is used it is expected that files are created/deleted/clean run. This can change the way an existing file de-duplicates. For example, lets assume that a file is written to the DDR which contains 1Mb of unique data - at the point of ingesting that file this 1Mb of data is referenced only by this file. Over time, however, a further 9 files are written to the DDR all of which contain the same 1Mb of data. Now this 1Mb of data is referenced by 10 files total (i.e. the way in which that 1Mb of data is referenced has changed).
Note, however, that despite the above life cycle of a files data, the per file compression statistics for each of the files written are not changed (i.e. the statistics for the very first file writing the 1Mb unique data to disk will still appear to show that this file 'owns' the data even though it is now referenced by 10 files).
As a result of this these per file compression statistics effectively go stale over time.
Now it is possible to discuss how the 'Cleanable Gb' value is generated:
- Over time files get deleted from the DDR
- Every time a file is deleted we take the files 'locally compressed bytes' (i.e. physical amount of space the file took on disk when ingested) and add this to the current value of 'Cleanable Gb'
Finally an example of this in practice:
- A 1Tb file (file1) of completely random data (which does not de-duplicate and/or compression) is written to a DDR. Per file compression statistics for file1 will show a 'locally compressed bytes' of ~1Tb
- A 1Tb file (file2) is written to the DDR. This file shares 500Gb of data with file1 and has 500Gb unique random data. Per file compression statistics for file2 will show a 'locally compressed bytes' of 500Gb
- A 1Tb file (file3) is written to the DDR. This file is identical to file2. Per file compression statistics for file3 will show a 'locally compressed bytes' of 0 bytes (as the entire file de-duplicates against file2
- Initially 'Cleanable Gb' is 0
- file1 is deleted - as file1s 'locally compressed bytes' is 1Tb 'Cleanable Gb' is incremented by 1Tb
- Clean is run however only reclaims 500Gb of space - this is expected as the other 500Gb of data written by file1 must stay on disk as it is referenced by file2/file3
- file2 is deleted - as file2s 'locally compressed bytes' is 500Gb 'Cleanable Gb' is incremented by 500Gb
- Clean is run however no space is reclaimed - this is expected as all data referenced by file2 is also referenced by file3 (so must stay on disk)
- file3 is deleted - as file3s 'locally compressed bytes' is 0 'Cleanable Gb' is incremented by 0
- Clean is run however 1Tb physical space is reclaimed
Resolution
This behavior is as expected given the design of DDFS. There is no way to work around this or to increase the accuracy of the 'Cleanable Gb' figure.
Affected Products
Data DomainProducts
Data DomainArticle Properties
Article Number: 000018664
Article Type: Solution
Last Modified: 02 Jun 2021
Version: 4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.