Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

PowerProtect DP Series: Protection Storage: Data Domain: Automatic Cleaning

Summary: Prediction Enabled Automatic Cleaning is run only when its prediction engine determines whether used capacity exceeds a configured percentage within a configured time.

This article may have been automatically translated. If you have any feedback regarding its quality, please let us know using the form at the bottom of this page.

Article Content


Instructions

What is Automatic Cleaning?
Prediction Enabled Automatic Cleaning complements the existing cleaning engine by predicting system capacity and allowing cleaning to start automatically when the system predicts it meets certain capacity usage levels within a certain period instead of relying entirely on time-based cleaning schedules regardless of capacity usage or system activity.

In which operating system release has Automatic Cleaning been introduced?
Prediction-enabled Automatic cleaning is introduced in DD OS 7.6.x which is incorporated in Integration Data Protection Appliance 2.7.x.

Automatic Cleaning is available only for Active Tier.

NOTE: This feature is disabled by default. It can be configured as required.

What are the Challenges with Traditional or Regular Cleaning Process?
  • DD Cleaning or Garbage Collection (GC) is a long running process which also delays mutually exclusive process such as Cloud Cleaning.
  • It is resource-intensive. Ingest or REPL performance can be affected.
  • GC leads to data fragmentation which internally tends to degrade Data Locality and impacts read performance over time, thus impacting the restore performance.
  • GC is run per-schedule even when there is no need (Scenarios such as: System not close to full, backups have longer retention, not many of the backups have been expired within a week).
  • GC can be I/O intensive and can compete with ingest.
  • Repeated I/O can affect Disk Life.

What are the Benefits of Automatic Cleaning?
  • With Automatic Cleaning, GC runs only when it is required, so it is resource efficient.
  • Reducing the number of cleaning cycles in turn reduces data fragmentation and improves read or restore performance.
  • If prediction depicts that the system-used capacity does not exceed x amount of usage in the next n days, then scheduled Active Tier Cleaning is skipped but internally it marks it successful so that if cloud cleaning is scheduled to run, it could kick in.

What is the Concept behind Automatic Cleaning?
  • Automatic Cleaning uses a Prediction Engine.
  • Prediction Engine is a thread inside Data Domain File-System and runs every hour.
  • Collects physical-bytes-written and stores these capacity records
  • Capacity Prediction can be made after collecting 10 such capacity records.
  • Keeps capacity usage history records in a circular buffer.
  • By default, it keeps 756 records (one month worth of hourly capacity usage).
  • Prediction Engine uses a Linear Regression Model
Future Capacity = Current Capacity + (Ingest Rate * Time)

What are the different Types of Automatic Cleaning?
  • Scheduled Automatic Cleaning OR Skip Schedule
  • Fully Automatic Cleaning OR Auto Schedule
NOTE: Only one type of Automatic Cleaning can be set at a time, either Skip Schedule or Auto Schedule.
 

What are the differences between both the types of Automatic Cleaning?
 
Scheduled Automatic Cleaning or Skip Schedule
Fully Automatic Cleaning or Auto Schedule
Supported on systems with Cloud Tier
Not Supported on systems with Cloud Tier
Regular or Traditional Cleaning Schedule must be present.
Regular Cleaning schedule is automatically disabled once Auto Schedule is set.
Regular Scheduled Cleaning is skipped if system used-capacity is expected to grow beyond configured percentage within configured days.
Cleaning is only run if scheduled used-capacity is expected to exceed configured percentage threshold within configured days.
When Skip Schedule is disabled or reset, normal cleaning schedule remains as is.
When Auto Schedule is disabled or reset, normal cleaning schedule must be manually set.
 

Watch this video about Automatic Cleaning:
 



What are the commands used for setting up Automatic Cleaning?

  • Scheduled Automatic Cleaning or Skip Schedule
Configuration
Syntax: 
filesys clean skip schedule { [days <day(s)> estimate-percent-used <percent>] | show | reset }

Example:
At the time of regular cleaning schedule, if prediction depicts that the system used-capacity does not grow beyond 90% in next 10 days, then the cleaning is skipped. The following must be done to configure this:
  1. Verify if regular cleaning schedule exists and ensure it is not set as "never."
# filesys clean show schedule


If the regular cleaning schedule does not exist or is set as never, use the below syntax to set it or use the DD UI path to set it:

CLI Syntax: 

filesys clean set schedule { daily <time> | <day(s)> <time> | biweekly <day> <time> | monthly <day(s)> <time> }


DD UI PATH
Log in to DD UI > Data Management > Filesystem > Click the Gear Icon at the right for Settings > Go to Cleaning tab > Select Frequency, Time & day.

  1. Set up Skip Schedule Configuration as below:
# filesys clean skip schedule days 10 estimate-percent-used 90


Command to display the Skip Schedule:

# filesys clean skip schedule show


Command to disable the Skip Schedule:

# filesys clean skip schedule reset

 

  • Fully Automatic Cleaning or Auto Schedule
Configuration
Syntax:
filesys clean auto schedule {[days <day(s)> estimate-percent-used <percent>]|[interval-days <days>]|show | reset }


Example:
If the requirement is to have cleaning run when system used-capacity is expected to grow beyond 85% in the next 10 days, use the below to set it:

# filesys clean auto schedule days 10 estimate-percent-used 85

Result: Automatically scheduled cleaning runs, if system-used space is estimated to grow beyond 85% in the next 10 days. Minimum days between automatically scheduled cleanings are set to seven days.

NOTE: By default, minimum days between two consecutive automatic cleanings are taken as seven days.

This interval can also be altered and defined using "interval-days" option as below:
# filesys clean auto schedule days 10 estimate-percent-used 85 interval-days 5

Result: Automatically scheduled cleaning runs, if system-used space is estimated to grow beyond 85% in the next 10 days. Minimum days between automatically scheduled cleaning are set to five days.

NOTE: Once Auto Schedule is set, the regular cleaning schedule is automatically disabled as below:
# filesys clean show schedule Filesystem cleaning is scheduled to run "never".

Command to display the current configuration of Auto Schedule Cleaning:
# filesys clean auto schedule show
 
Command to Disable the Auto schedule, run:
# filesys clean auto schedule reset
 
NOTE: The regular cleaning cycle must be manually set using the below syntax or using the DD UI path, after disabling Auto Schedule Cleaning:


CLI Syntax: 

filesys clean set schedule { daily <time> | <day(s)> <time> | biweekly <day> <time> | monthly <day(s)> <time> }

 

DD UI path:
Log in to DD UI > Data Management > Filesystem > Click on the Gear Icon at the right for Settings > Go to Cleaning tab > Select Frequency, Time & day.


For more details, see the DD OS Administration Guide for respective operating systems on SolVe Online.

Article Properties


Affected Product

Data Domain, Integrated Data Protection Appliance Family

Product

PowerProtect Data Protection Appliance

Last Published Date

21 Jun 2023

Version

8

Article Type

How To