Unsolved
This post is more than 5 years old
2 Posts
0
2116
July 25th, 2012 15:00
VNX Dedupe Frequency and Behavior
Hello,
We have a VNX7500 and are using the virtual gateway VGx series devices as Data Movers for NAS CIFS services. Our environment is quite large with over 60 File Systems defined and several file systems with over 14TB or 16TB in size.
We are encountering an issue where not all of the file systems are showing a completed file system scan and as a result no data deduplication has occurred. Reviewing the logs of our data mover it appears that some file systems are correctly scanning but they appear to take a very long time. As an example one scan completed in 30 hours with 33 million files.
Files scanned: 33,635,579
Files deduped: 5,048,854 (15.0% of total files)
Original data size: 13,208,856 MB (82% of current file system capacity)
Space saved: 2,014,386 (15% of original data size)
The question I have for the experts is what would prevent file systems with larger FSIDs from even starting a scan. My uninformed opinion is that if the Deduplication Settings for Minimum Scan Interval is set to 7 days and if the Dedupe process is a batch process where it identifies 66 File Systems requiring Dedupe that is will begin sequentially scanning the File Systems , potentially run out of time, and then start over after 7 days and thus never completely Dedupe all 66 File Systems.
Many of the File Systems that are not being scanned are small (1 TB) and have lots of free space. None of our file systems are more than 80% full.
0 events found


SAMEERK1
296 Posts
0
July 27th, 2012 04:00
Hi,
try to initate a scan manually with this command :
/nas/bin/fs_dedupe -modify <file system name> -state on
After issuing the above command, the scan of the new files will take place. To confirm that additional files have been scanned, issue the /nas/bin/fs_dedupe -info <file system name> command before and after the scan.
Let me know if this helps.
Sameer Kulkarni
John12341
2 Posts
0
July 27th, 2012 14:00
Sameer,
Thanks for the suggestion. This is great for those items that haven't scanned at all yet. However I am still curious what the behavior of having so many file systems that require scanning is for steady state and set it and forget it operation.
In the situation where there are over 60 File Systems to scan, and the scan frequency is set to 7 days, if all the file systems don't complete a full scan within the 7 day period, will a new file system scan start back at the beginning and prevent other file system scans from running?
Based on the logs the Dedupe scan appears to be a serial process starting with the lower FSIDs. Thus newer FSIDs have to wait their turn and if after 7 days a new scan starts will it start back at the lower FSIDs?
Peter_EMC
674 Posts
0
July 29th, 2012 23:00
"File systems on each Data Mover are scanned one at a time by a single threaded process to
select files for deduplication processing. Different Data Movers can be scanned
simultaneously." p. 22 (Deduplication manual)
There is a special resource management (throtteling) for the deduplication scan process, to avoid impacting cleient activity.
The dedupe filesystem scans are queued (one after the other), the 7 days is the default minimum scan interval.
If the Queue is not empty after the minimum scan interval of a FS is reached, it will be putten at the end the Scan Queue.
A scan can not run out of time, because there is no time limit defined for it.