Data Domain: Scheduling Cleaning on a DDR
Summary: This article provides an overview of the file system cleaning operation. This operation reclaims physical storage occupied by deleted objects in the Data Domain file system.
Instructions
Scheduling Cleaning on a Data Domain System
Purpose:
The filesys clean operation reclaims physical storage occupied by deleted objects in the Data Domain file system.
When application software expires backup data or archive images, they are not accessible or available for recovery, however still occupy physical storage.
Only a filesys clean operation reclaims the physical storage used by files that are deleted and that are not present in a snapshot. The file system may never report 100% cleaned. The total space cleaned may always be a few percentage points less than 100.
Applies to:
- All Data Domain Systems
- All Software Releases
- Cleaning
Solution:
Data Domain recommends running a clean operation after the first full backup to a Data Domain System. The initial local compression on a full backup is generally a factor of 1.5 to 2.5. An immediate clean operation gives additional compression by another factor of 1.15 to 1.2 and reclaims a corresponding amount of disk space.
A default schedule runs the clean operation every Tuesday at 6 a.m. (tue 0600) with 50% throttle.
To increase file system availability, and if the Data Domain System is not short on disk space, consider changing the schedule to clean less often.
-
If the system is filling up, changing default values to more frequent or aggressive cleaning cycles should not be used to compensate for this. Running cleaning every day will fragment the data. For example, read speed can be severely impaired. A global compression algorithm depends on good locality during writes so too frequent clean cycles bring de-duplication numbers down.
-
Cleaning is a file system operation that impacts overall file system performance while it is running. Changing the cleaning throttle higher from the default of 50 has an impact on performance during the active cleaning cycle as the cleaning process consumes more resources.
-
Changing the local compression algorithm causes the following cleaning cycle to run longer as all existing data must be read, uncompressed, and compressed again.
-
Any operation that shuts down the Data Domain System file system or powers down the device (a system power-off, reboot, or
file system disablecommand) stops the clean operation. The clean does not automatically continue when the system and file system starts again. -
Replication between Data Domains can affect
filesys cleanoperations. If a source Data Domain receives large amounts of new or changed data while disabled or disconnected, resuming replication may significantly slow downfilesys cleanoperations. -
If the directory replication is running behind, for example due insufficient network bandwidth between the replication pairs (resulting to a replication lag) cleaning may not be able to run fully. This condition requires either replication break (and resync once cleaning has run) or replication lag to catch up (such as increasing network link or writing less new data to the source directory).
A Data Domain that is full may need multiple clean operations to clean 100% of the file system, especially if more than one external shelves are attached.
Depending on the type of data stored, such as when using markers for specific backup software (filesys option set marker-type ...), the file system may never report 100% cleaned.
The total space cleaned may always be a few percentage points less than 100.
With collection replication, the clean operation does not run on the destination.
With directory replication, the clean operation must be run on both the source and destination Data Domain.
To display the current date and time for the clean operation:
filesys clean show schedule
Filesystem cleaning is scheduled to run "Tue" at "0600".
To display the throttle setting for cleaning operations:
filesys clean show throttle
50 Percent Throttle
To change the throttle setting:
filesys clean set throttle <value>
Where the value is 0 (slowest) to 100 (fastest)
(Changes to the throttle setting take effect without restarting cleaning)
Example:
filesys clean set throttle 75
The command produces no output, so the "show throttle" command must be rerun:
filesys clean show throttle
75 Percent Throttle
To change the cleaning schedule:
filesys clean set schedule <schedule-to-start-cleaning>
<schedule-to-start-cleaning> is:
-
-
-
never- Turns off the clean process, and does not take a qualifier.
-
daily <time>- Runs the operation every day at the given time (Not recommended)
- Time is 24-hour military time. 2400 is not a valid time.
mon0000 is midnight between Sunday night and Monday morning.
-
<day or days> <time>- Runs on one or more given days at the given time - a day-name is three letters (such as
monfor Monday). Use a dash (-) between days for a range of days. For example:tue-fri - Time is 24-hour military time. 2400 is not a valid time.
mon0000 is midnight between Sunday night and Monday morning.
- Runs on one or more given days at the given time - a day-name is three letters (such as
-
biweekly <day> <time>- Starts on a given day or days every second week at the given time
- Time is 24-hour military time. 2400 is not a valid time.
mon0000 is midnight between Sunday night and Monday morning.
-
monthly <day or days> <time>- Starts on a given day or days (from 1 to 31) at the given time
- Time is 24-hour military time. 2400 is not a valid time.
mon0000 is midnight between Sunday night and Monday morning.
-
-
Examples:
To run cleaning every Tuesday at 4pm:
filesys clean set schedule tue 1600
Filesystem cleaning is scheduled to run "Tue" at "1600".
To run the operation on the first and 15th of the month at 3 pm:
filesys clean set schedule monthly 1,15 1500
Filesystem cleaning is scheduled to run "1, 15" at "1500".
To set the clean schedule to the default of Tuesday at 6 am (Tue 0600), and default throttle of 50%, use the reset command:
filesys clean reset all
The command produces no output.
The command produces no output, so the "show throttle" and "show schedule" commands must be rerun:
filesys clean show throttle
50 Percent Throttle
filesys clean show schedule
Filesystem cleaning is scheduled to run "Tue" at "0600".