Data Domain Software : Discrepancy between usage reported by "filesys show space (df)" and sfs_dump totals
Summary: Some customers add up the "post-lc-size" (post-compression size) of all files reported by "sfs_dump" and expect it to match the space utilization reported by “filesys show space” (df). As these two values don't match, they can see the difference as “missing or unaccounted space” ...
Instructions
The summation of post-compression size of all files reported by "sfs_dump" will match the total space utilization only if no deletes have ever been performed on the filesystem. This is never the case on a production DDR, so using "sfs_dump" to account for current space utilization is a fallacy. We will illustrate tthis with an example below.
ILLUSTRATION:
This can be easily seen with a simple three-step example:
- I backup a file (file1) that is 10G. Say this is a brand-new box and nothing to de-duplicate with. The post-compression size is hence around 10G also.
# sfs_dump /data/col1/test/ /data/col1/test/file1: mtime: 1503023531690895000 fileid: 18 size: 10736790004 type: 9 seg_bytes: 10772785048 seg_count: 1282587 redun_seg_count: 0 (0%) pre_lc_size: 10772785048 post_lc_size: 10942680452 (102%) mode: 02000100644
As seen above, the post compression size is 10G.
II. Next, I backup another 10G file (file2). This is very similar or same-as file1.
# sfs_dump /data/col1/test /data/col1/test/file1: mtime: 1503023531690895000 fileid: 18 size: 10736790004 type: 9 seg_bytes: 10772785048 seg_count: 1282587 redun_seg_count: 0 (0%) pre_lc_size: 10772785048 post_lc_size: 10942680452 (102%) mode: 02000100644 /data/col1/test/file2: mtime: 1503023531690895000 fileid: 19 size: 10736790004 type: 9 seg_bytes: 10772785048 seg_count: 1282587 redun_seg_count: 1282587 (100%) pre_lc_size: 0 post_lc_size: 0 (0%) mode: 02000100644
As seen above, file2 took no additional space as it de-duplicated completely with file1.
III. Now, I delete File1 and run "sfs_dump" on the mtree again.
# sfs_dump /data/col1/test /data/col1/test/file2: mtime: 1503023531690895000 fileid: 19 size: 10736790004 type: 9 seg_bytes: 10772785048 seg_count: 1282587 redun_seg_count: 1282587 (100%) pre_lc_size: 0 post_lc_size: 0 (0%) mode: 02000100644
Based on the above, the customer may expect that no space should be consumed on the filesystem. The only file in the mtree takes up 0 bytes post-compressed (Summation of all files reported by sfs_dump is 0 bytes). But this is a fallacy. The space utilization on the system is actually 10G and is accurately reflected by df .
This is because File2 is referencing all segments originally written by File1. The deletion of File1 freed up nothing. "sfs_dump" doesn t show the original File1 because it has been deleted.
# df Active Tier: Resource Size GiB Used GiB Avail GiB Use% Cleanable GiB* ---------------- -------- -------- --------- ---- -------------- /data: pre-comp - 10.0 - - - /data: post-comp 129479.5 10.7 129468.8 0% 10.2