PowerProtect DP Series: Protection Storage: Data Domain: Automatic Cleaning
Summary: Prediction Enabled Automatic Cleaning is run only when its prediction engine determines whether used capacity exceeds a configured percentage within a configured time.
Instructions
What is Automatic Cleaning?
Prediction Enabled Automatic Cleaning complements the existing cleaning engine by predicting system capacity and allowing cleaning to start automatically when the system predicts it meets certain capacity usage levels within a certain period instead of relying entirely on time-based cleaning schedules regardless of capacity usage or system activity.
In which operating system release has Automatic Cleaning been introduced?
Prediction-enabled Automatic cleaning is introduced in DD OS 7.6.x which is incorporated in Integration Data Protection Appliance 2.7.x.
Automatic Cleaning is available only for Active Tier.
What are the Challenges with Traditional or Regular Cleaning Process?
- DD Cleaning or Garbage Collection (GC) is a long running process which also delays mutually exclusive process such as Cloud Cleaning.
- It is resource-intensive. Ingest or REPL performance can be affected.
- GC leads to data fragmentation which internally tends to degrade Data Locality and impacts read performance over time, thus impacting the restore performance.
- GC is run per-schedule even when there is no need (Scenarios such as: System not close to full, backups have longer retention, not many of the backups have been expired within a week).
- GC can be I/O intensive and can compete with ingest.
- Repeated I/O can affect Disk Life.
What are the Benefits of Automatic Cleaning?
- With Automatic Cleaning, GC runs only when it is required, so it is resource efficient.
- Reducing the number of cleaning cycles in turn reduces data fragmentation and improves read or restore performance.
- If prediction depicts that the system-used capacity does not exceed x amount of usage in the next n days, then scheduled Active Tier Cleaning is skipped but internally it marks it successful so that if cloud cleaning is scheduled to run, it could kick in.
What is the Concept behind Automatic Cleaning?
- Automatic Cleaning uses a Prediction Engine.
- Prediction Engine is a thread inside Data Domain File-System and runs every hour.
- Collects physical-bytes-written and stores these capacity records
- Capacity Prediction can be made after collecting 10 such capacity records.
- Keeps capacity usage history records in a circular buffer.
- By default, it keeps 756 records (one month worth of hourly capacity usage).
- Prediction Engine uses a Linear Regression Model
What are the different Types of Automatic Cleaning?
- Scheduled Automatic Cleaning OR Skip Schedule
- Fully Automatic Cleaning OR Auto Schedule
What are the differences between both the types of Automatic Cleaning?
|
Scheduled Automatic Cleaning or Skip Schedule
|
Fully Automatic Cleaning or Auto Schedule
|
|
Supported on systems with Cloud Tier
|
Not Supported on systems with Cloud Tier
|
|
Regular or Traditional Cleaning Schedule must be present.
|
Regular Cleaning schedule is automatically disabled once Auto Schedule is set.
|
|
Regular Scheduled Cleaning is skipped if system used-capacity is expected to grow beyond configured percentage within configured days.
|
Cleaning is only run if scheduled used-capacity is expected to exceed configured percentage threshold within configured days.
|
|
When Skip Schedule is disabled or reset, normal cleaning schedule remains as is.
|
When Auto Schedule is disabled or reset, normal cleaning schedule must be manually set.
|
Watch this video about Automatic Cleaning:
What are the commands used for setting up Automatic Cleaning?
- Scheduled Automatic Cleaning or Skip Schedule
Syntax:
filesys clean skip schedule { [days <day(s)> estimate-percent-used <percent>] | show | reset }
Example:
At the time of regular cleaning schedule, if prediction depicts that the system used-capacity does not grow beyond 90% in next 10 days, then the cleaning is skipped. The following must be done to configure this:
- Verify if regular cleaning schedule exists and ensure it is not set as "never."
# filesys clean show schedule
If the regular cleaning schedule does not exist or is set as never, use the below syntax to set it or use the DD UI path to set it:
CLI Syntax:
filesys clean set schedule { daily <time> | <day(s)> <time> | biweekly <day> <time> | monthly <day(s)> <time> }
DD UI PATH
Log in to DD UI > Data Management > Filesystem > Click the Gear Icon at the right for Settings > Go to Cleaning tab > Select Frequency, Time & day.
- Set up Skip Schedule Configuration as below:
# filesys clean skip schedule days 10 estimate-percent-used 90
Command to display the Skip Schedule:
# filesys clean skip schedule show
Command to disable the Skip Schedule:
# filesys clean skip schedule reset
- Fully Automatic Cleaning or Auto Schedule
Syntax:
filesys clean auto schedule {[days <day(s)> estimate-percent-used <percent>]|[interval-days <days>]|show | reset }
Example:
If the requirement is to have cleaning run when system used-capacity is expected to grow beyond 85% in the next 10 days, use the below to set it:
# filesys clean auto schedule days 10 estimate-percent-used 85
Result: Automatically scheduled cleaning runs, if system-used space is estimated to grow beyond 85% in the next 10 days. Minimum days between automatically scheduled cleanings are set to seven days.
This interval can also be altered and defined using "interval-days" option as below:
# filesys clean auto schedule days 10 estimate-percent-used 85 interval-days 5
Result: Automatically scheduled cleaning runs, if system-used space is estimated to grow beyond 85% in the next 10 days. Minimum days between automatically scheduled cleaning are set to five days.
# filesys clean show schedule Filesystem cleaning is scheduled to run "never".
Command to display the current configuration of Auto Schedule Cleaning:
# filesys clean auto schedule show
# filesys clean auto schedule reset
CLI Syntax:
filesys clean set schedule { daily <time> | <day(s)> <time> | biweekly <day> <time> | monthly <day(s)> <time> }
DD UI path:
Log in to DD UI > Data Management > Filesystem > Click on the Gear Icon at the right for Settings > Go to Cleaning tab > Select Frequency, Time & day.
For more details, see the DD OS Administration Guide for respective operating systems on SolVe Online.