Data Domain : Cleanable Size is an Estimate

Summary: There is often confusion about the "Cleanable GiB" value presented on a Data Domain system and improper expectations about the amount of space that will be recovered upon running cleaning ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Instructions

There is often confusion about the "Cleanable GiB" value presented on a Data Domain system and improper expectations about the amount of space that will be recovered upon running cleaning.

The "Cleanable GiB" number given is purely an estimate, and it is not possible to get an accurate value for how much space will be recovered by running cleaning, due to the technological choices made when developing the Data Domain Filesystem.


Following is a succinct explanation of why estimates of cleanable space can vary substantially from the actual space recovered. There are other factors not accounted for here though, which may make the estimate and the amount of disk space really freed upon running clean to differ substantially
 

When data is ingested by the Data Domain system, the post-compression value is calculated and stored as static data for every file. The "Cleanable" value is simply the sum of the post-compression value for all deleted files since the last time DD clean was run to completion.
 

The Cleanable value becomes inaccurate if the file segments for deleted files have been used in de-deduplicating data in other files that have not been deleted. As long as there is a single file referring to an existing unique segment, the DD clean process will not consider those segments for being reclaimed. So even if a file's post-comp was added in the "Cleanable GiB" counter as if all its unique segments were about to be disposed of, some (or many) may not because of being re-used by other files.
 

A more detailed example showing this effect follows :

Assume that you have 5 files, added one by one to a Data Domain system, with no other data previously on it.

Since the first 100GB files contained all unique data, its compression ratio is 1x (assuming the first file didn't have any redundancy within the file itself). The 2nd- 5th files were able to dedupe against the 1st file s data and each of the older files as they get added, each gaining increasing de-duplication due to the increasing files against which to de-duplicate.

File 1: precomp: 100 GB postcomp: 100 GB compression ratio: 1x
File 2: precomp: 100 GB postcomp:  50 GB compression ratio: 2x
File 3: precomp: 100 GB postcomp:  25 GB compression ratio: 4x
File 4: precomp: 100 GB postcomp:  25 GB compression ratio: 4x
File 5: precomp: 100 GB postcomp:   1 GB compression ratio: 100x

Resource            Size GiB    Used GiB   Avail GiB   Use%   Cleanable GiB*
----------------   ---------   ---------   ---------   ----   --------------
/backup: pre-comp          -         500           -      -                -
/backup: post-comp      1000         201         799    20%                0
----------------   ---------   ---------   ---------   ----   --------------


Example 1. Status after deleting the first 3 files from /backup :
 

Resource            Size GiB    Used GiB   Avail GiB   Use%   Cleanable GiB*
----------------   ---------   ---------   ---------   ----   --------------
/backup: pre-comp          -         200           -      -                -
/backup: post-comp      1000         201         799    20%              175
----------------   ---------   ---------   ---------   ----   --------------

 

If you run Cleaning after this you may be able to reclaim 125 instead of the full 175 cleanable. This is due to the fact that the last 2 files share segments with files 1-3.  Cleaning will not recover the other 50 GB of space because those segments are still in use by files 3-5.
 

Example 2: Using the same starting point as with Example 1, assume file 1 was deleted, then a fastcopy performed on the entire /backup folder (ie. all 5 files), then deletion of files 2-4. 

Resource            Size GiB    Used GiB   Avail GiB   Use%   Cleanable GiB*
----------------   ---------   ---------   ---------   ----   --------------
/backup: pre-comp          -         800           -      -                -
/backup: post-comp      1000         201         799    20%              200
----------------   ---------   ---------   ---------   ----   --------------

 

The "Size GiB" figure for pre-comp comes from (500-100)=400*2=800, giving 500 for the 5 original files, subtracting 100 for deleting file 1 gives 400 GiB.  Next, 400 GiB multiplied by 2 due to the fastcopy on all 4 remaining files.

Note that the post-comp space used is still the same because a filecopy only adds a tiny amount of space, consisting of the metadata pointers to the original data. The space usage hasn't changed despite deleting of File 1 because a "filesys clean start" hasn't been run (to initiate Cleaning). 
 

After the Clean we will see:
 

Resource            Size GiB    Used GiB   Avail GiB   Use%   Cleanable GiB*
----------------   ---------   ---------   ---------   ----   --------------
/backup: pre-comp          -         800           -      -                -
/backup: post-comp      1000         176         824    18%                0
----------------   ---------   ---------   ---------   ----   --------------

 

Note that even though 200GB was shown cleanable, only 25GB was actually cleaned. The "Cleanable GiB" showed as 200 because the "post-comp" file size of  Files 1 through 4 added up to 200GB.  Only "File 1" was removed which was 100GB, but 75GB of which was still in use by the other 4 files (due to de-duplication).  

That  may seem odd, since "File 2" through "File 4" had also been deleted, but remember that although the system will show "File 2" through "File 4" as removed, the actual data segments for those files could not be removed because those files had been fastcopied to another folder.   Only after all fastcopy versions have also been removed can the space be fully recovered by cleaning.

 

Since Cleanable GiB is just an 'estimate' and may not be accurate , even sometimes it may reflect big or same size as the physical capacity of Data Domain.

This may lead to confusion whether to allow the scheduled DDFS Cleaning to run or to run manually if the DDFS Space usage is nearing 100% due to Cleanable GiB showing nearby or same value as "/data: post-comp".

To have a better and more reliable way at estimating the amount of disk space clean would reclaim when running, starting from DDOS 7.7.x it is now possible to determine from the CLI the actual 'Total Cleanable-Space' next GC on the Active tier will be able to reclaim. This is a summary of the CLI :
 

# filesys cleanable-space calculate
Cleanable space calculation started. Use 'filesys cleanable-space watch' to monitor progress.


Process will do the same as a regular GC, going through phases 1 to 4, but skipping phase 5 (copy), which is the one which would effectively copy forward containers to reclaim the dead disk space. As such , it will take as long as a regular GC takes to complete clean phases 1 through 4 to return a value, so this is not something to be run regularly for having an updated estimate, but only when needed. In other words, "filesys cleanable-space calculate" will run GC in the Active tier just skipping the part in which it does reclaim space.

Process may be monitored as this :
 

# filesys cleanable-space watch
Beginning 'filesys cleanable-space calculation' monitoring.  Use Control-C to stop monitoring.

Cleaning: phase 1 of 4 (pre-merge)
  100.0% complete, 96233 GiB free; time: phase  0:02:07, total  0:02:07

Cleaning: phase 2 of 4 (pre-analysis)
  100.0% complete, 96233 GiB free; time: phase  0:06:51, total  0:08:59

Cleaning: phase 3 of 4 (pre-enumeration)
  100.0% complete, 96233 GiB free; time: phase  0:00:20, total  0:09:20

Cleaning: phase 4 of 4 (pre-select)
  100.0% complete, 96233 GiB free; time: phase  0:00:25, total  0:09:46

 

Once completed, one can access the cleanable measurement result :

# filesys cleanable-space status

Cleanable space on active tier is 94649698202 bytes. Last calculated on 2023/08/25 03:29:51
Cleanable space calculation finished at 2023/08/25 03:29:51.

 

So here in the above example test, if the DD GC is to be run now, then it would free up 94649698202 bytes. That's 88.1 GiB whereas at the time of the calculation, the estimate reported by "df" in the lab DD used was 41.9 GiB. Of course, as changes are made to the FS (new backups, more deletions, snapshots being created and expired, etc.) the calculation will go off.

If required, to stop the above process below command can be used :

# filesys cleanable-space stop

The 'filesys cleanable-space stop' command stops calculating cleanable space in the system.
Are you sure? (yes|no) [no]: yes

ok, proceeding.

# filesys cleanable-space status
Cleanable space on active tier is 2607064 bytes. Last calculated on 2021/06/27 23:23:05
Cleanable space calculation started at 2021/06/27 23:27:58 and was aborted at 2021/06/27 23:28:19.
Cleaning was aborted by user.

 

Note this CLI applies to the DD Active tier only. There is no equivalent process to calculate the cleanable for a DD cloud unit, which has its own estimate, subject to the same uncertainties as described above.

 

Affected Products

Data Domain

Products

Data Domain
Article Properties
Article Number: 000005806
Article Type: How To
Last Modified: 22 Oct 2025
Version:  6
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.