Data Domain Operating System does not support proactive rebalancing of data across storage after expanding capacity of the Data Domain File System
Summary: This article explains that there is no in built support within the Data Domain Operating System (DDOS) for rebalancing of data across storage after expanding the Data Domain File System (DDFS) on a Data Domain Restorer (DDR) ...
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Instructions
As with many storage arrays the capacity of most models of Data Domain Restorer (DDR) can be increased by adding external storage enclosure shelves (ES30, DS60) to the system then expanding the Data Domain File System (DDFS) onto these enclosure shelves. When this is performed:
Rebalancing of data is performed by two operations:
Garbage collection cleaning
Garbage collection cleaning (GC) is a scheduled activity that runs regularly on a DDR (by default once a week against the active tier and, assuming space reclamation is enabled, when required against archive units). When it runs it:
As a simple example:
When a file is written to a DDR, the following high-level operations take place:
Achieving good read performance of data on a DDR requires that the file has good 'locality' (its data is relatively sequential on disk) such that DDFS read ahead algorithms can function optimally. Note also that DDFS assumes that the file most likely to be read from (for restore or replication) is the latest copy of a given backup. As a result, for certain types of data (such as virtual synthetics), a process called 'locality repair' is performed to 'optimize' the locality of newly written files data. When run, locality repair:
As a result through normal use of locality repair and cleaning (GC) functionality a DDR can transparently rebalance data across shelves over time. This happens with no additional input from administrators and means that there is no need for dedicated data rebalancing operations functionality as sometimes seen on other storage arrays. To increase the speed at which rebalancing takes place, it is therefore necessary to either:
- New enclosures shelves are physically attached (cabled powered on)
- The Data Domain Operating System (DDOS) rescans storage to identify the existence of new enclosures shelves
- These new enclosure shelves are then added to a tier of storage within the DDR (the active tier or a specific archive unit)
- This tier can then be expanded online without the need for an outage to DDFS
- Any new data written to that tier of storage is written across existing and new shelves
- Data on existing shelves, however, is not rebalanced across new enclosure shelves
- Within DDOS, the unit of data storage is a 4.5 Mb 'container'
- As they are created, 4.5 Mb containers are written across all enclosures shelves in the corresponding tier archive unit in a round robin manner
- When additional enclosures shelves are added to a tier archive unit DDFS starts writing new 4.5 Mb containers to these enclosures in addition to existing enclosures (the new enclosures are included when round robin container writes)
- DDOS, however, does not make any specific attempt (or offer any specific functionality) to migrate existing containers in the tier from existing to new shelves enclosures
- A DDR initially has a single enclosure in its active tier which is 90% full
- An additional enclosure is added to the active tier and DDFS expanded onto this enclosure
- Writes of newly created 4.5 Mb containers now are round robin across the existing and new enclosures
- This leaves the existing enclosure short of free space whereas the newly added enclosure is almost empty
Rebalancing of data is performed by two operations:
- Garbage collection cleaning
- Locality repair
Garbage collection cleaning
Garbage collection cleaning (GC) is a scheduled activity that runs regularly on a DDR (by default once a week against the active tier and, assuming space reclamation is enabled, when required against archive units). When it runs it:
- Identifies which physical data within the tier archive unit is 'live' (used by one or more files in the file system or objects such as snapshots) or 'dead' (unreferenced by any object hence superfluous to the system)
- Determines the 4.5Mb containers holding the majority of the 'dead' data within the tier archive unit
- Reads these 4.5Mb containers and extracts any 'live' data they contain - this is then 'copied forwards' to newly created 4.5Mb containers which are written across all shelves in the tier archive unit
- Deletes the old 4.5 Mb containers hence removing the dead data that they contain and freeing underlying space on disk for re-use
As a simple example:
- The active tier of a DDR contains two shelves - the first shelf contains 10000 4.5Mb containers whereas the second shelf contains 100 4.5Mb containers (for every one container on the second shelf there are 100 containers on the first shelf)
- GC runs and copies forwards data from 5000 containers on the first shelf
- Live data within these 5000 containers causes 1000 new 4.5 Mb containers to be created
- These 1000 new 4.5 Mb containers are written across both shelves
- Once GC completes the first shelf therefore holds 5500 4.5Mb containers whereas the second shelf holds 600 containers (for every one container on the second shelf there are approximately nine containers on the first shelf)
- In a single run of GC the imbalance of containers between first and second shelves has been reduced by a factor of 10 - this is expected to be reduced further during subsequent runs of GC meaning that data is rebalanced across shelves naturally over time
When a file is written to a DDR, the following high-level operations take place:
- The file is split into logical chunks (called segments) of 4-12 Kb in size
- Each segment is checked to see if it already exists on disk within the tier the file is being written to
- If the segment does already exist, it is duplicate data and the segment within the newly written file is replaced with a pointer to existing data on disk
- If the segment does not exist, it is unique data and is therefore packaged into a new 4.5 Mb container and written to disk
Achieving good read performance of data on a DDR requires that the file has good 'locality' (its data is relatively sequential on disk) such that DDFS read ahead algorithms can function optimally. Note also that DDFS assumes that the file most likely to be read from (for restore or replication) is the latest copy of a given backup. As a result, for certain types of data (such as virtual synthetics), a process called 'locality repair' is performed to 'optimize' the locality of newly written files data. When run, locality repair:
- Examine data referenced by the file looking for sections where data is not sequential on disk (displays poor locality)
- Read this nonsequential data from disk and write it again sequentially (as duplicate data) to newly created 4.5 Mb containers
- On systems where there is a data imbalance that it is expected that most old non-sequential data exist on old more fully populated enclosures shelves
- When this data is rewritten sequentially as duplicate data, it is placed in new 4.5 Mb containers which are round robin across all enclosures in the corresponding tier
- As a result the majority of 'dead' (old duplicate data) created by locality repair exists on old more fully populated shelves
- When GC runs, the majority of 'dead' data is then found on old more fully populated shelves and removed (freeing space on these shelves) as described above
As a result through normal use of locality repair and cleaning (GC) functionality a DDR can transparently rebalance data across shelves over time. This happens with no additional input from administrators and means that there is no need for dedicated data rebalancing operations functionality as sometimes seen on other storage arrays. To increase the speed at which rebalancing takes place, it is therefore necessary to either:
- Increase the rate at which data 'churns' on the DDR
- Increase the amount of data which is locally repaired on the DDR
Affected Products
Data DomainArticle Properties
Article Number: 000019150
Article Type: How To
Last Modified: 29 Jul 2025
Version: 4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.