Data Domain: How to solve high space consumption or low available capacity on Data Domain Restorers (DDRs)
Summary: This article describes how to assist with issues relating to high space usage or a lack of available capacity on Data Domain Restorers (DDRs)
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
All Data Domain Restorers (DDRs) contain a pool/area of storage known as the 'active tier':
- This is the area of disk where newly ingested files/data resides and on most DDRs files remain here until expired/deleted by a client backup application
- On DDRs configured with Extended Retention (ER) or Long Term Retention (LTR) the data movement process may periodically run to migrate old files from the active tier to archive or cloud tiers
- The only way in which to reclaim space in the active tier which was used by deleted/migrated files is by running the garbage collection/clean process (GC)
Current utilization of the active tier can be displayed using the 'filesys show space' or 'df' commands:
# df
Active Tier:
Resource Size GiB Used GiB Avail GiB Use% Cleanable GiB*
---------------- -------- -------- --------- ---- --------------
/data: pre-comp - 33098.9 - - -
/data: post-comp 65460.3 518.7 64941.6 1% 0.0
/ddvar 29.5 19.7 8.3 70% -
/ddvar/core 31.5 0.2 29.7 1% -
---------------- -------- -------- --------- ---- --------------
Active Tier:
Resource Size GiB Used GiB Avail GiB Use% Cleanable GiB*
---------------- -------- -------- --------- ---- --------------
/data: pre-comp - 33098.9 - - -
/data: post-comp 65460.3 518.7 64941.6 1% 0.0
/ddvar 29.5 19.7 8.3 70% -
/ddvar/core 31.5 0.2 29.7 1% -
---------------- -------- -------- --------- ---- --------------
If configured, details of archive/cloud tiers will be shown below the active tier.
Utilization of the active tier must be carefully managed otherwise the following may occur:
- The active tier may start to run out of available space causing alerts/messages such as the following to be displayed:
EVT-SPACE-00004: Space usage in Data Collection has exceeded 95% threshold.
- If the active tier becomes 100% full no new data can be written to the DDR which may cause backups/replication to fail - in this scenario alerts/messages such as the following may be displayed:
CRITICAL: MSG-CM-00002: /../vpart:/vol1/col1/cp1/cset: Container set [container set ID] out of space
- In some circumstances, the active tier becoming full may cause the Data Domain File System (DDFS) to become read-only at which point existing files cannot be deleted
This knowledge base article attempts to:
- Explain why the active tier may become full
- Describe a simple set of checks which can be performed to determine the cause of high utilization of the active tier and corresponding remedial steps
Note that:
- This article is not exhaustive (there may be some situations where the active tier of a DDR becomes highly utilized or full for a reason not discussed in this document).
- This article does not cover high utilization of archive or cloud tiers
Cause
- Backup files/save sets are not being correctly expired/deleted by client backup applications due to incorrect retention policy or backup application configuration
- Replication lag causing a large amount of old data to be kept on the active tier pending replication to replicas
- Data being written to the active tier has a lower than expected overall compression ratio
- The system has not been sized correctly, that is it is simply too small for the amount of data which is being attempted to be stored on it
- Backups consist of many small files - these files consume more space than is expected when initially written however this space should be reclaimed during clean/garbage collection
- Data movement is not being run regularly on systems configured with ER/LTR causing old files which should be migrated to archive/cloud tiers to remain on the active tier
- Cleaning/garbage collection is not being run regularly
- Excessive or old mtree snapshots existing on the DDR preventing clean from reclaiming space from deleted files/data
Resolution
Step 1 - Determine whether an active tier clean must be run.
The Data Domain Operating System (DDOS) attempts to maintain a counter called 'Cleanable GiB' for the active tier. This is an estimation of how much physical (post-comp) space could potentially be reclaimed in the active tier by running clean/garbage collection. This counter is shown by the 'filesys show space'/'df' commands:
If either:
To confirm that clean has started as expected the 'filesys status' command can be used, that is:
Note that:
If necessary set an active tier clean schedule - for example to run every Tuesday at 6am:
Once clean has completed use the 'filesys show space'/'df' commands to determine whether utilization issues have been resolved. If usage is still high proceed to run through the remaining steps in this article.
Step 2 - Check for large amounts of replication lag against source replication contexts
Native Data Domain replication is designed around the concept of 'replication contexts'. For example when data needs to be replicated between systems:
To determine if replication contexts are lagging the following steps should be performed:
Contexts for which the current system is the source and which are showing significant lag or contexts which are no longer required should be broken. This can be performed by running the following command on the source and destination system:
For example, to break the 'interesting' contexts shown above, the following commands would be run on source and destination:
Note that:
The contents of DDFS is logically divided into mtrees. It is common for individual backup applications/clients to write to an individual mtrees. If a backup application is decomissioned it will no longer be able to write data to/delete data from the DDR which may leave old/superfluous mtrees on the system. Data in these mtrees will continue to exist indefinitely using space on disk on the DDR. As a result any such superfluous mtrees should be deleted. For example:
For example:
A Data Domain snapshot represents a point in time snapshot of the corresponding mtree. As a result:
When run against an mtree with no snapshots the following will be displayed:
When run against an mtree with snapshots the following will be displayed:
These snapshots should be expired such that they can be removed when clean runs and space they are holding on disk freed:
# snapshot expire [snapshot name] mtree [mtree name]
For example:
Note that:
Autosupports from the DDR contain histograms showing a break down of files on the DDR by age - for example:
This can be useful to determine if there are files on the system which have not been expired/deleted as expected by the client backup application. For example if the above system were written to by a backup application where the maximum retention period for any one file was 6 months it is immediately obvious that the backup application is not expiring/deleting files as expected as there are approximately 80,000 files older than 6 months on the DDR.
Note that:
If required Data Domain support can provide additional reports to:
Step 6 - Check for backups which include a large number of small files
Due to the design of DDFS small files (essentially any file which is smaller than approximately 10Mb in size) can consume excessive space when initially written to the DDR. This is due to the 'SISL' (Stream Informed Segment Layout) architecture causing small files to consume multiple individual 4.5Mb blocks of space on disk. For example a 4Kb file may actually consume up to 9Mb of physical disk space when initially written.
This excessive space is subsequently reclaimed when clean/garbage collection is run (as data from small files is then aggregated into a smaller number of 4.5Mb blocks) but can cause smaller models of DDR to show excessive utilisation and fill when such backups are run.
Autosupports contain histograms of files broken down by size, for example:
If there is evidence of backups writing very large numbers of small files then the system may be affected by significant temporary increases in utilisation between each invocation of clean/garbage collection. In this scenario it is preferable to change backup methodology to include all small files into a single larger archive (such as a tar file) before writing them to the DDR. Note that any such archive should not be compressed or encrypted (as this will damage the compression ratio/de-duplication ratio of that data).
Step 7 - Check for lower than expected de-duplication ratio
The main purpose of a DDR is to de-duplicate and compress and data ingested by the device. The ratio of de-duplication/compression is very much dependent on the use case of the system and the type of data which it holds however in many cases there will be an 'expected' overall compression ratio based on results obtained through proof of concept testing or similar. To determine the current overall compression ratio of the system (and therefore whether it is meeting expectations) the 'filesys show compression' command can be used. For example:
In the above example the system is achieving an overall compression ratio of 65.3x for the active tier (which is extremely good). If, however, this value shows that overall compression ratio is not meeting expectations then further investigation is likely to be required. Note that investigating lower than expected compression ratio is a complex subject which can have many root causes. For further information on investigating further please see the following article: https://support.emc.com/kb/487055
Step 8 - Check whether the system is a source for collection replication
When using collection replication if the source system is physically larger than the destination the size of the source system will be artificially limited to match that of the destination (i.e. there will be an area of disk on the source which will be marked as unusable). The reason for this is that when using collection replication the destination is required to be a block level copy of the source however if the source if physically larger than the destination there is a chance that excessive data may be written to the source which then cannot be replicated to the destination (as it is already full). This scenario is avoided by limiting the size of the source to match the destination.
As soon as either of these have been performed additional space will be made available immediately in the active tier of the source system (i.e. there is no need to run active tier clean/garbage collection before using this space)
Step 9 - Check whether data movement is being regularly run
If the DDR is configured with either Extended Retention (ER) or Long Term Retention (LTR) it will have a second tier of storage attached (archive tier for ER or cloud tier for LTR). In this scenario data movement policies are likely configured against mtrees to migrate older/unmodified data requiring long term retention from the active tier out to the alternate tier of storage such that space used by these files in the active tier can be physically reclaimed by clean/garbage collection. If data movement policies are incorrectly configured or if the data movement process is not regularly run then old data will remain in the active tier longer than expected and will continue to use physical space on disk.
Note that ER and LTR are mutually exclusive so a system will either contain only an active tier (no ER/LTR configured) or an active and archive tier (ER configured) or an active and cloud tier (LTR configured)
If data movement policies are incorrect/missing these should be corrected - refer to the Data Domain Administrators Guide for assistance in performing this
Note that Data Domain generally recommends running data movement via an automated schedule however some customers choose to run this process in an ad-hoc manner (i.e. when required). In this scenario data movement should be started regularly by running:
For more information on modifying the data movement schedule refer to the Data Domain Administrators Guide
If data movement has not been run for some time attempt to manually start the process then monitor as follows:
If data movement fails to start for any reason please contact your contracted support provider for further assistance.
On LTR systems active tier clean should still be configured with its own schedule
Step 10 - Add additional storage to the active tier
If all previous steps have been performed, active tier clean run to completion, however there is still insufficient space available on the active tier it is likely that the system has not been correctly sized for the workload it is receiving. In this case one of the following should be performed:
To discuss addition of storage please contact your sales account team.
The Data Domain Operating System (DDOS) attempts to maintain a counter called 'Cleanable GiB' for the active tier. This is an estimation of how much physical (post-comp) space could potentially be reclaimed in the active tier by running clean/garbage collection. This counter is shown by the 'filesys show space'/'df' commands:
Active Tier: Resource Size GiB Used GiB Avail GiB Use% Cleanable GiB* ---------------- -------- --------- --------- ---- -------------- /data: pre-comp - 7259347.5 - - - /data: post-comp 304690.8 251252.4 53438.5 82% 51616.1 <=== NOTE /ddvar 29.5 12.5 15.6 44% - ---------------- -------- --------- --------- ---- --------------
If either:
- The value for 'Cleanable GiB' is large
- DDFS has become 100% full (and is therefore read-only)
# filesys clean start Cleaning started. Use 'filesys clean watch' to monitor progress.
To confirm that clean has started as expected the 'filesys status' command can be used, that is:
# filesys status The filesystem is enabled and running. Cleaning started at 2017/05/19 18:05:58: phase 1 of 12 (pre-merge) 50.6% complete, 64942 GiB free; time: phase 0:01:05, total 0:01:05
Note that:
- If clean is not able to start please contact your contracted support provider for further assistance - this may indicate that the system has encountered a 'missing segment error' causing clean to be disabled
- If clean is already running the following message will be displayed when it is attempted to be started:
**** Cleaning already in progress. Use 'filesys clean watch' to monitor progress.
- No space in the active tier will be freed/reclaimed until clean reaches its copy phase (by default phase 9 in DDOS 5.4.x and earlier, phase 11 in DDOS 5.5.x and later). For further information about the phases used by clean see: https://support.emc.com/kb/446734
- Clean may not reclaim the amount of space indicated by 'Cleanable GiB' as this value is essentially an estimation. For further information about this see: https://support.emc.com/kb/485637
- Clean may not reclaim all potential space in a single run - this is because on DDRs containing large data sets clean will work against the portion of the file system containing the most superfluous data (that is to give the best return in free space for time taken for clean to run). In some scenarios clean may need to be run multiple times before all potential space is reclaimed
- If the value for 'Cleanable GiB' is large, this indicates that clean has not been running at regular intervals. Check that a clean schedule has been set:
# filesys clean show schedule
If necessary set an active tier clean schedule - for example to run every Tuesday at 6am:
# filesys clean set schedule Tue 0600 Filesystem cleaning is scheduled to run "Tue" at "0600".On systems configured with Extended Retention (ER) clean may be configured to run after data movement completes and may not have its own separate schedule. This scenario is covered later in this document.
Once clean has completed use the 'filesys show space'/'df' commands to determine whether utilization issues have been resolved. If usage is still high proceed to run through the remaining steps in this article.
Step 2 - Check for large amounts of replication lag against source replication contexts
Native Data Domain replication is designed around the concept of 'replication contexts'. For example when data needs to be replicated between systems:
- Replication contexts are created on source and destination DDRs
- The contexts are initialised
- Once initialisation is complete replication will periodically send updates/deltas from source to destination to keep data on the systems synchronised
- Directory replication contexts (used when replicating a single directory tree under /data/col1/backup between systems):
Directory replication uses a replication log on the source DDR to track outstanding files which have not yet been replicated to the destination
If a directory replication context is lagging then the replication log on the source DDR will track many files which are pending replication.
Even if these files are deleted, whilst they continue to be referenced by the replication log, clean cannot reclaim space on the disk used by these files.
If a directory replication context is lagging then the replication log on the source DDR will track many files which are pending replication.
Even if these files are deleted, whilst they continue to be referenced by the replication log, clean cannot reclaim space on the disk used by these files.
- Mtree replication contexts (used when replicating any mtree other than /data/col1/backup between systems):
Mtree replication uses snapshots created on source and destination systems to determine differences between systems and therefore which files must be sent from source to destination.
If an mtree replication context is lagging, then the corresponding mtree may have old snapshots created against it on source and destination systems.
Even if files are from the replicated mtree on the source system if those files existed when mtree replication snapshots were created on the system clean cannot reclaim space on disk used by these files.
If an mtree replication context is lagging, then the corresponding mtree may have old snapshots created against it on source and destination systems.
Even if files are from the replicated mtree on the source system if those files existed when mtree replication snapshots were created on the system clean cannot reclaim space on disk used by these files.
- Collection replication contexts (used when replicating the entire contents of one DDR to another system):
Collection replication performs 'block based' replication of all data on a source system to a destination system.
If a collection replication is lagging, then clean on the source system cannot operate optimally - in this scenario an alert will be generated on the source indicating that a partial clean is being performed to avoid using synchronization with the destination system.
Clean will therefore be unable to reclaim as much space as expected on the source DDR.
If a collection replication is lagging, then clean on the source system cannot operate optimally - in this scenario an alert will be generated on the source indicating that a partial clean is being performed to avoid using synchronization with the destination system.
Clean will therefore be unable to reclaim as much space as expected on the source DDR.
To determine if replication contexts are lagging the following steps should be performed:
- Determine the hostname of the current system:
sysadmin@dd4200# hostname The Hostname is: dd4200.ddsupport.emea
- Determine the date/time on the current system:
sysadmin@dd4200# date Fri May 19 19:04:06 IST 2017
- List replication contexts configured on the system along with their 'synced as of time'. Contexts of interest are those where the 'destination' does NOT contain the hostname of the current system (which indicates that the current system is the source) and the 'synced as of time' is old:
sysadmin@dd4200# replication status CTX Destination Enabled Connection Sync'ed-as-of-time Tenant-Unit --- ---------------------------------------------------------------------------------- ------- ------------ ------------------ ----------- 3 mtree://dd4200.ddsupport.emea/data/col1/DFC no idle Thu Jan 8 08:58 - <=== NOT INTERESTING - CURRENT SYSTEM IS THE DESTINATION 9 mtree://BenDDVE.ddsupport.emea/data/col1/BenMtree no idle Mon Jan 25 14:48 - <=== INTERESTING - LAGGING AND CURRENT SYSTEM IS THE SOURCE 13 dir://DD2500-1.ddsupport.emea/backup/dstfolder no disconnected Thu Mar 30 17:55 - <=== INTERESTING - LAGGING AND CURRENT SYSTEM IS THE SOURCE 17 mtree://DD2500-1.ddsupport.emea/data/col1/oleary yes idle Fri May 19 18:57 - <=== NOT INTERESTING - CONTEXT IS UP TO DATE 18 mtree://dd4200.ddsupport.emea/data/col1/testfast yes idle Fri May 19 19:18 - <=== NOT INTERESTING - CONTEXT IS UP TO DATE --- ---------------------------------------------------------------------------------- ------- ------------ ------------------ -----------
Contexts for which the current system is the source and which are showing significant lag or contexts which are no longer required should be broken. This can be performed by running the following command on the source and destination system:
# replication break [destination]
For example, to break the 'interesting' contexts shown above, the following commands would be run on source and destination:
(dd4200.ddsupport.emea): # replication break mtree://BenDDVE.ddsupport.emea/data/col1/BenMtree (BenDDVE.ddsupport.emea): # replication break mtree://BenDDVE.ddsupport.emea/data/col1/BenMtree (dd4200.ddsupport.emea): # replication break dir://DD2500-1.ddsupport.emea/backup/dstfolder (DD2500-1.ddsupport.emea): # replication break dir://DD2500-1.ddsupport.emea/backup/dstfolder
- Once contexts are broken active tier clean will need to be performed to reclaim potential space in the active tier
- If using mtree replication once contexts are broken mtree replication snapshots may remain on disk. Make sure that step 5 is followed to expire any superfluous snapshots prior to running clean
- If the source/destination mtree is configured to migrate data to archive or cloud tiers care should be taken when breaking corresponding mtree replication contexts as these contexts may not be able to be recreated/initialised again in the future. The reason for this is that when an mtree replication context is initialised an mtree snapshot is created on the source system and contains details of all files in the mtree (regardless of tier). This snapshot is then replicated in full to the active tier of the destination. As a result if the active tier of the destination does not have sufficient free space to ingest all the mtrees data from the source the initialise will not be able to complete. For further information on this issue please contact your contracted support provider
- If a collection replication context is broken the context will not be able to be recreated/initialised without first destroying the instance of DDFS on the destination DDR (and losing all data on this system). As a result a subsequent initialise can take considerable time/network bandwidth as all data from the source must be physically replicated to the destination again
The contents of DDFS is logically divided into mtrees. It is common for individual backup applications/clients to write to an individual mtrees. If a backup application is decomissioned it will no longer be able to write data to/delete data from the DDR which may leave old/superfluous mtrees on the system. Data in these mtrees will continue to exist indefinitely using space on disk on the DDR. As a result any such superfluous mtrees should be deleted. For example:
- Obtain a list of mtrees on the system:
# mtree list Name Pre-Comp (GiB) Status ------------------------------------------------------------- -------------- ------- /data/col1/Budu_test 147.0 RW /data/col1/Default 8649.8 RW /data/col1/File_DayForward_Noida 42.0 RW/RLCE /data/col1/labtest 1462.7 RW /data/col1/oscar_data 0.2 RW /data/col1/test_oscar_2 494.0 RO/RD ------------------------------------------------------------- -------------- -------
- Any mtrees which are no longer required should be deleted with the 'mtree delete' command, i.e.:
# mtree delete [mtree name]
For example:
# mtree delete /data/col1/Budu_test ... MTree "/data/col1/Budu_test" deleted successfully.
- Space consumed on disk by the deleted mtree will be reclaimed the next time active tier clean/garbage collection is run.
- Mtrees which are destinations for mtree replication (i.e. have a status of RO/RD in the output of mtree list) should have their corresponding replication context broken before the mtree is deleted
- Mtrees which are used as DDBoost logical storage units (LSUs) or as virtual tape library (VTL) pools may not be able to be deleted via the 'mtree delete' command - refer to the Data Domain Administration Guide for further details on deleting such mtrees
- Mtrees which are configured for retention lock (i.e. have a status of RLCE or RLGE) cannot be deleted - instead individual files within the mtree must have any retention lock reverted and be deleted individually - refer to the Data Domain Administration Guide for further details
A Data Domain snapshot represents a point in time snapshot of the corresponding mtree. As a result:
- Any files which exist within the mtree when the snapshot is created will be referenced by the snapshot
- Whilst the snapshot continues to exist even if these files are removed/deleted clean will not be able to reclaim any physical space they use on disk - this is because the data must stay on the system in case the copy of the file in the snapshot is later accessed
- Obtain a list of mtrees on the system using the 'mtree list' command as shown in step 3
- List snapshots which exist for each mtree using the 'snapshot list' command:
# snapshot list mtree [mtree name]
When run against an mtree with no snapshots the following will be displayed:
# snapshot list mtree /data/col1/Default Snapshot Information for MTree: /data/col1/Default ---------------------------------------------- No snapshots found.
When run against an mtree with snapshots the following will be displayed:
# snapshot list mtree /data/col1/labtest Snapshot Information for MTree: /data/col1/labtest ---------------------------------------------- Name Pre-Comp (GiB) Create Date Retain Until Status ------------------------------------ -------------- ----------------- ----------------- ------- testsnap-2016-03-31-12-00 1274.5 Mar 31 2016 12:00 Mar 26 2017 12:00 expired testsnap-2016-05-31-12-00 1198.8 May 31 2016 12:00 May 26 2017 12:00 testsnap-2016-07-31-12-00 1301.3 Jul 31 2016 12:00 Jul 26 2017 12:00 testsnap-2016-08-31-12-00 1327.5 Aug 31 2016 12:00 Aug 26 2017 12:00 testsnap-2016-10-31-12-00 1424.9 Oct 31 2016 12:00 Oct 26 2017 13:00 testsnap-2016-12-31-12-00 1403.1 Dec 31 2016 12:00 Dec 26 2017 12:00 testsnap-2017-01-31-12-00 1421.0 Jan 31 2017 12:00 Jan 26 2018 12:00 testsnap-2017-03-31-12-00 1468.7 Mar 31 2017 12:00 Mar 26 2018 12:00 REPL-MTREE-AUTO-2017-05-11-15-18-32 1502.2 May 11 2017 15:18 May 11 2018 15:18 ----------------------------------- -------------- ----------------- ----------------- ------
- Where snapshots exist use the output from 'snapshot list mtree [mtree name]' to determine snapshots which:
Are not 'expired' (see status column)
Were created a significant time in the past (for example snapshots created in 2016 from the above list)
These snapshots should be expired such that they can be removed when clean runs and space they are holding on disk freed:
# snapshot expire [snapshot name] mtree [mtree name]
For example:
# snapshot expire testsnap-2016-05-31-12-00 mtree /data/col1/labtest
Snapshot "testsnap-2016-05-31-12-00" for mtree "/data/col1/labtest" will be retained until May 19 2017 19:31.
Snapshot "testsnap-2016-05-31-12-00" for mtree "/data/col1/labtest" will be retained until May 19 2017 19:31.
- If the snapshot list command is run again these snapshots will now be listed as expired:
# snapshot list mtree /data/col1/labtest
Snapshot Information for MTree: /data/col1/labtest
----------------------------------------------
Name Pre-Comp (GiB) Create Date Retain Until Status
------------------------------------ -------------- ----------------- ----------------- -------
testsnap-2016-03-31-12-00 1274.5 Mar 31 2016 12:00 Mar 26 2017 12:00 expired
testsnap-2016-05-31-12-00 1198.8 May 31 2016 12:00 May 26 2017 12:00 expired
testsnap-2016-07-31-12-00 1301.3 Jul 31 2016 12:00 Jul 26 2017 12:00
testsnap-2016-08-31-12-00 1327.5 Aug 31 2016 12:00 Aug 26 2017 12:00
testsnap-2016-10-31-12-00 1424.9 Oct 31 2016 12:00 Oct 26 2017 13:00
testsnap-2016-12-31-12-00 1403.1 Dec 31 2016 12:00 Dec 26 2017 12:00
testsnap-2017-01-31-12-00 1421.0 Jan 31 2017 12:00 Jan 26 2018 12:00
testsnap-2017-03-31-12-00 1468.7 Mar 31 2017 12:00 Mar 26 2018 12:00
REPL-MTREE-AUTO-2017-05-11-15-18-32 1502.2 May 11 2017 15:18 May 11 2018 15:18
----------------------------------- -------------- ----------------- ----------------- -------
Snapshot Information for MTree: /data/col1/labtest
----------------------------------------------
Name Pre-Comp (GiB) Create Date Retain Until Status
------------------------------------ -------------- ----------------- ----------------- -------
testsnap-2016-03-31-12-00 1274.5 Mar 31 2016 12:00 Mar 26 2017 12:00 expired
testsnap-2016-05-31-12-00 1198.8 May 31 2016 12:00 May 26 2017 12:00 expired
testsnap-2016-07-31-12-00 1301.3 Jul 31 2016 12:00 Jul 26 2017 12:00
testsnap-2016-08-31-12-00 1327.5 Aug 31 2016 12:00 Aug 26 2017 12:00
testsnap-2016-10-31-12-00 1424.9 Oct 31 2016 12:00 Oct 26 2017 13:00
testsnap-2016-12-31-12-00 1403.1 Dec 31 2016 12:00 Dec 26 2017 12:00
testsnap-2017-01-31-12-00 1421.0 Jan 31 2017 12:00 Jan 26 2018 12:00
testsnap-2017-03-31-12-00 1468.7 Mar 31 2017 12:00 Mar 26 2018 12:00
REPL-MTREE-AUTO-2017-05-11-15-18-32 1502.2 May 11 2017 15:18 May 11 2018 15:18
----------------------------------- -------------- ----------------- ----------------- -------
Note that:
- It is not possible to determine how much physical data an individual snapshot or set of snapshots holds on disk - the only value for 'space' associated with a snapshot is an indication of the pre-compressed (logical) size of the mtree when the snapshot was created (as shown in the above output)
- Snapshots which are named 'REPL-MTREE-AUTO-YYYY-MM-DD-HH-MM-SS' are managed by mtree replication and in normal circumstances should not need to be manually expired (replication will automatically expire these snapshots when they are no longer required). If such snapshots are extremely old then it indicates that the corresponding replication context is likely showing significant lag (as described in step 2)
- Snapshots which are named 'REPL-MTREE-RESYNC-RESERVE-YYYY-MM-DD-HH-MM-SS' are created by mtree replication when an mtree replication context is broken. Their intent is that they can be used to avoid a full resynchronization of replication data if the broken context is later recreated (for example if the context was broken in error). If replication will not be re-established these contexts can be manually expired as described above
- Expired snapshots will continue to exist on the system until the next time clean/garbage collection is run - at this point they will be physically deleted and will be removed from the output of 'snapshot list mtree [mtree name]' - clean can then reclaim any space these snapshots were using on disk
Autosupports from the DDR contain histograms showing a break down of files on the DDR by age - for example:
File Distribution
-----------------
448,672 files in 5,276 directories
Count Space
----------------------------- --------------------------
Age Files % cumul% GiB % cumul%
--------- ----------- ----- ------- -------- ----- -------
1 day 7,244 1.6 1.6 4537.9 0.1 0.1
1 week 40,388 9.0 10.6 63538.2 0.8 0.8
2 weeks 47,850 10.7 21.3 84409.1 1.0 1.9
1 month 125,800 28.0 49.3 404807.0 5.0 6.9
2 months 132,802 29.6 78.9 437558.8 5.4 12.3
3 months 8,084 1.8 80.7 633906.4 7.8 20.1
6 months 5,441 1.2 81.9 1244863.9 15.3 35.4
1 year 21,439 4.8 86.7 3973612.3 49.0 84.4
> 1 year 59,624 13.3 100.0 1265083.9 15.6 100.0
--------- ----------- ----- ------- -------- ----- -------
-----------------
448,672 files in 5,276 directories
Count Space
----------------------------- --------------------------
Age Files % cumul% GiB % cumul%
--------- ----------- ----- ------- -------- ----- -------
1 day 7,244 1.6 1.6 4537.9 0.1 0.1
1 week 40,388 9.0 10.6 63538.2 0.8 0.8
2 weeks 47,850 10.7 21.3 84409.1 1.0 1.9
1 month 125,800 28.0 49.3 404807.0 5.0 6.9
2 months 132,802 29.6 78.9 437558.8 5.4 12.3
3 months 8,084 1.8 80.7 633906.4 7.8 20.1
6 months 5,441 1.2 81.9 1244863.9 15.3 35.4
1 year 21,439 4.8 86.7 3973612.3 49.0 84.4
> 1 year 59,624 13.3 100.0 1265083.9 15.6 100.0
--------- ----------- ----- ------- -------- ----- -------
This can be useful to determine if there are files on the system which have not been expired/deleted as expected by the client backup application. For example if the above system were written to by a backup application where the maximum retention period for any one file was 6 months it is immediately obvious that the backup application is not expiring/deleting files as expected as there are approximately 80,000 files older than 6 months on the DDR.
Note that:
- It is the responsibility of the backup application to perform all file expiration/deletion
- A DDR will never expire/delete files automatically - unless instructed by the backup application to explicitly delete a file the file will continue to exist on the DDR using space indefinitely
If required Data Domain support can provide additional reports to:
- Give the name/modification time of all files on a DDR ordered by age (so the name/location of any old data can be determined)
- Split out histograms of file age into separate reports for the active/archive/cloud tier (where the ER/LTR features are enabled)
- Collect evidence as described in the 'Collecting sfs_dump' paragraph of the notes section of this document
- Open a service request with your contracted support provider
Step 6 - Check for backups which include a large number of small files
Due to the design of DDFS small files (essentially any file which is smaller than approximately 10Mb in size) can consume excessive space when initially written to the DDR. This is due to the 'SISL' (Stream Informed Segment Layout) architecture causing small files to consume multiple individual 4.5Mb blocks of space on disk. For example a 4Kb file may actually consume up to 9Mb of physical disk space when initially written.
This excessive space is subsequently reclaimed when clean/garbage collection is run (as data from small files is then aggregated into a smaller number of 4.5Mb blocks) but can cause smaller models of DDR to show excessive utilisation and fill when such backups are run.
Autosupports contain histograms of files broken down by size, for example:
Count Space
----------------------------- --------------------------
Size Files % cumul% GiB % cumul%
--------- ----------- ----- ------- -------- ----- -------
1 KiB 2,957 35.8 35.8 0.0 0.0 0.0
10 KiB 1,114 13.5 49.3 0.0 0.0 0.0
100 KiB 249 3.0 52.4 0.1 0.0 0.0
500 KiB 1,069 13.0 65.3 0.3 0.0 0.0
1 MiB 113 1.4 66.7 0.1 0.0 0.0
5 MiB 446 5.4 72.1 1.3 0.0 0.0
10 MiB 220 2.7 74.8 1.9 0.0 0.0
50 MiB 1,326 16.1 90.8 33.6 0.2 0.2
100 MiB 12 0.1 91.0 0.9 0.0 0.2
500 MiB 490 5.9 96.9 162.9 0.8 1.0
1 GiB 58 0.7 97.6 15.6 0.1 1.1
5 GiB 29 0.4 98.0 87.0 0.5 1.6
10 GiB 17 0.2 98.2 322.9 1.7 3.3
50 GiB 21 0.3 98.4 1352.7 7.0 10.3
100 GiB 72 0.9 99.3 6743.0 35.1 45.5
500 GiB 58 0.7 100.0 10465.9 54.5 100.0
> 500 GiB 0 0.0 100.0 0.0 0.0 100.0
--------- ----------- ----- ------- -------- ----- -------
----------------------------- --------------------------
Size Files % cumul% GiB % cumul%
--------- ----------- ----- ------- -------- ----- -------
1 KiB 2,957 35.8 35.8 0.0 0.0 0.0
10 KiB 1,114 13.5 49.3 0.0 0.0 0.0
100 KiB 249 3.0 52.4 0.1 0.0 0.0
500 KiB 1,069 13.0 65.3 0.3 0.0 0.0
1 MiB 113 1.4 66.7 0.1 0.0 0.0
5 MiB 446 5.4 72.1 1.3 0.0 0.0
10 MiB 220 2.7 74.8 1.9 0.0 0.0
50 MiB 1,326 16.1 90.8 33.6 0.2 0.2
100 MiB 12 0.1 91.0 0.9 0.0 0.2
500 MiB 490 5.9 96.9 162.9 0.8 1.0
1 GiB 58 0.7 97.6 15.6 0.1 1.1
5 GiB 29 0.4 98.0 87.0 0.5 1.6
10 GiB 17 0.2 98.2 322.9 1.7 3.3
50 GiB 21 0.3 98.4 1352.7 7.0 10.3
100 GiB 72 0.9 99.3 6743.0 35.1 45.5
500 GiB 58 0.7 100.0 10465.9 54.5 100.0
> 500 GiB 0 0.0 100.0 0.0 0.0 100.0
--------- ----------- ----- ------- -------- ----- -------
If there is evidence of backups writing very large numbers of small files then the system may be affected by significant temporary increases in utilisation between each invocation of clean/garbage collection. In this scenario it is preferable to change backup methodology to include all small files into a single larger archive (such as a tar file) before writing them to the DDR. Note that any such archive should not be compressed or encrypted (as this will damage the compression ratio/de-duplication ratio of that data).
Step 7 - Check for lower than expected de-duplication ratio
The main purpose of a DDR is to de-duplicate and compress and data ingested by the device. The ratio of de-duplication/compression is very much dependent on the use case of the system and the type of data which it holds however in many cases there will be an 'expected' overall compression ratio based on results obtained through proof of concept testing or similar. To determine the current overall compression ratio of the system (and therefore whether it is meeting expectations) the 'filesys show compression' command can be used. For example:
# filesys show compression
From: 2017-05-03 13:00 To: 2017-05-10 13:00
Active Tier:
Pre-Comp Post-Comp Global-Comp Local-Comp Total-Comp
(GiB) (GiB) Factor Factor Factor
(Reduction %)
---------------- -------- --------- ----------- ---------- -------------
Currently Used:* 20581.1 315.4 - - 65.3x (98.5)
Written:
Last 7 days 744.0 5.1 80.5x 1.8x 145.6x (99.3)
Last 24 hrs
---------------- -------- --------- ----------- ---------- -------------
* Does not include the effects of pre-comp file deletes/truncates
From: 2017-05-03 13:00 To: 2017-05-10 13:00
Active Tier:
Pre-Comp Post-Comp Global-Comp Local-Comp Total-Comp
(GiB) (GiB) Factor Factor Factor
(Reduction %)
---------------- -------- --------- ----------- ---------- -------------
Currently Used:* 20581.1 315.4 - - 65.3x (98.5)
Written:
Last 7 days 744.0 5.1 80.5x 1.8x 145.6x (99.3)
Last 24 hrs
---------------- -------- --------- ----------- ---------- -------------
* Does not include the effects of pre-comp file deletes/truncates
In the above example the system is achieving an overall compression ratio of 65.3x for the active tier (which is extremely good). If, however, this value shows that overall compression ratio is not meeting expectations then further investigation is likely to be required. Note that investigating lower than expected compression ratio is a complex subject which can have many root causes. For further information on investigating further please see the following article: https://support.emc.com/kb/487055
Step 8 - Check whether the system is a source for collection replication
When using collection replication if the source system is physically larger than the destination the size of the source system will be artificially limited to match that of the destination (i.e. there will be an area of disk on the source which will be marked as unusable). The reason for this is that when using collection replication the destination is required to be a block level copy of the source however if the source if physically larger than the destination there is a chance that excessive data may be written to the source which then cannot be replicated to the destination (as it is already full). This scenario is avoided by limiting the size of the source to match the destination.
- Using the commands from step 2 check whether the system is a source for collection replication. To do this run ''replication status' and determine if there are any replication contexts starting 'col://' (indicating collection replication) which do NOT contain the hostname of the local system in the destination (indicating that this system must be a source for the replication context)
- If the system is a source for collection replication check the size of each systems active tier by logging into both and running the 'filesys show space' command - compare the active tiers 'post-comp' size on each
- If the source is significantly larger than the destination then its active tier size will be artificially limited
- To allow all space on the source to be usable for data the following should be performed:
Add additional storage to the destination active tier such that its size is >= the size of the source active tier
Break the collection replication context (using commands from step 2) - note that this will obviously prevent data being replicated from source -> destination DDR
As soon as either of these have been performed additional space will be made available immediately in the active tier of the source system (i.e. there is no need to run active tier clean/garbage collection before using this space)
Step 9 - Check whether data movement is being regularly run
If the DDR is configured with either Extended Retention (ER) or Long Term Retention (LTR) it will have a second tier of storage attached (archive tier for ER or cloud tier for LTR). In this scenario data movement policies are likely configured against mtrees to migrate older/unmodified data requiring long term retention from the active tier out to the alternate tier of storage such that space used by these files in the active tier can be physically reclaimed by clean/garbage collection. If data movement policies are incorrectly configured or if the data movement process is not regularly run then old data will remain in the active tier longer than expected and will continue to use physical space on disk.
- Initially confirm whether the system is configured for ER or LTR by running 'filesys show space' and checking for the existence of an archive or cloud tier - note that to be usable these alternative tiers of storage must have post-comp size of > 0Gb:
# filesys show space
...
Archive Tier:
Resource Size GiB Used GiB Avail GiB Use% Cleanable GiB
---------------- -------- -------- --------- ---- -------------
/data: pre-comp - 4163.8 - - -
/data: post-comp 31938.2 1411.9 30526.3 4% -
---------------- -------- -------- --------- ---- -------------
# filesys show space
...
Cloud Tier
Resource Size GiB Used GiB Avail GiB Use% Cleanable GiB
---------------- -------- -------- --------- ---- -------------
/data: pre-comp - 0.0 - - -
/data: post-comp 338905.8 0.0 338905.8 0% 0.0
---------------- -------- -------- --------- ---- -------------
...
Archive Tier:
Resource Size GiB Used GiB Avail GiB Use% Cleanable GiB
---------------- -------- -------- --------- ---- -------------
/data: pre-comp - 4163.8 - - -
/data: post-comp 31938.2 1411.9 30526.3 4% -
---------------- -------- -------- --------- ---- -------------
# filesys show space
...
Cloud Tier
Resource Size GiB Used GiB Avail GiB Use% Cleanable GiB
---------------- -------- -------- --------- ---- -------------
/data: pre-comp - 0.0 - - -
/data: post-comp 338905.8 0.0 338905.8 0% 0.0
---------------- -------- -------- --------- ---- -------------
Note that ER and LTR are mutually exclusive so a system will either contain only an active tier (no ER/LTR configured) or an active and archive tier (ER configured) or an active and cloud tier (LTR configured)
- If the system is configured with ER/LTR check data movement policies against mtrees to ensure that these are as expected and set such that old data will be pushed out to the alternate tier of storage:
ER: # archive data-movement policy show
LTR: # data-movement policy show
LTR: # data-movement policy show
If data movement policies are incorrect/missing these should be corrected - refer to the Data Domain Administrators Guide for assistance in performing this
- If the system is configured with ER/LTR check that data movement is scheduled to run at regular intervals to physically migrate files/data from the active tier to alternate storage:
ER: # archive data-movement schedule show
LTR: # data-movement schedule show
LTR: # data-movement schedule show
Note that Data Domain generally recommends running data movement via an automated schedule however some customers choose to run this process in an ad-hoc manner (i.e. when required). In this scenario data movement should be started regularly by running:
ER: # archive data-movement start
LTR: # data-movement start
LTR: # data-movement start
For more information on modifying the data movement schedule refer to the Data Domain Administrators Guide
- If the system is configured for ER/LTR check the last time data movement was run:
ER: # archive data-movement status
LTR: # data-movement status
LTR: # data-movement status
If data movement has not been run for some time attempt to manually start the process then monitor as follows:
ER: # archive data-movement watch
LTR: # data-movement watch
LTR: # data-movement watch
If data movement fails to start for any reason please contact your contracted support provider for further assistance.
- Once data movement is complete active tier clean should be run (note that it may be configured to start automatically on completion of data movement) to ensure that space used my migrated files in the active tier is physically freed:
# filesys clean start
On ER systems it is common to schedule data movement to run regularly (i.e. once a week) then configure active tier clean to run on completion of data movement. In this scenario active tier clean does not have its own independent schedule. To configure this initially remove the current active tier clean schedule:
# filesys clean set schedule never
Configure data movement to run periodically followed by automatic active tier clean - for example to run data movement every Tuesday at 6am followed by active tier clean:
# archive data-movement schedule set days Tue time 0600
The Archive data movement schedule has been set.
Archive data movement is scheduled to run on day(s) "tue" at "06:00" hrs
It can be confirmed that active tier clean is configured to run after completion of data movement as follows:
# archive show config
Enabled Yes
Data movement Schedule Run on day(s) "tue" at "06:00" hrs <=== SCHEDULE
Data movement throttle 100 percent
Default age threshold data movement policy 14 days
Run filesys clean after archive data movement Yes <=== RUN CLEAN ON COMPLETION
Archive Tier local compression gz
Packing data during archive data movement enabled
Space Reclamation disabled
Space Reclamation Schedule No schedule
On ER systems it is common to schedule data movement to run regularly (i.e. once a week) then configure active tier clean to run on completion of data movement. In this scenario active tier clean does not have its own independent schedule. To configure this initially remove the current active tier clean schedule:
# filesys clean set schedule never
Configure data movement to run periodically followed by automatic active tier clean - for example to run data movement every Tuesday at 6am followed by active tier clean:
# archive data-movement schedule set days Tue time 0600
The Archive data movement schedule has been set.
Archive data movement is scheduled to run on day(s) "tue" at "06:00" hrs
It can be confirmed that active tier clean is configured to run after completion of data movement as follows:
# archive show config
Enabled Yes
Data movement Schedule Run on day(s) "tue" at "06:00" hrs <=== SCHEDULE
Data movement throttle 100 percent
Default age threshold data movement policy 14 days
Run filesys clean after archive data movement Yes <=== RUN CLEAN ON COMPLETION
Archive Tier local compression gz
Packing data during archive data movement enabled
Space Reclamation disabled
Space Reclamation Schedule No schedule
On LTR systems active tier clean should still be configured with its own schedule
Step 10 - Add additional storage to the active tier
If all previous steps have been performed, active tier clean run to completion, however there is still insufficient space available on the active tier it is likely that the system has not been correctly sized for the workload it is receiving. In this case one of the following should be performed:
- Reduce workload hitting the system - for example:
Redirect a subset of backups to alternate storage
Reduce retention period of backups such that they are expired/deleted more quickly
Reduce the number/expiration period of scheduled snapshots against mtrees on the system
Break superfluous replication contexts for which the local system is a destination then delete corresponding mtrees
Reduce retention period of backups such that they are expired/deleted more quickly
Reduce the number/expiration period of scheduled snapshots against mtrees on the system
Break superfluous replication contexts for which the local system is a destination then delete corresponding mtrees
- Add additional storage to the active tier of the system and expand its size:
# storage add [tier active] enclosure [enclosure number] | disk [device number]
# filesys expand
# filesys expand
To discuss addition of storage please contact your sales account team.
Affected Products
Data DomainProducts
Data DomainArticle Properties
Article Number: 000054303
Article Type: Solution
Last Modified: 21 Jul 2025
Version: 6
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.