Data Domain: Space Usage in Data Collection Has Exceeded Threshold
Summary: Data Domain sends alerts that it is almost full (90-100%). This article helps to analyze the real cause, if there is any issue in deletion of files or the DD is filled up to its limit.
Symptoms
ddboost@prdctdd50# alerts show current Id Post Time Severity Class Object Message ----- ------------------------ -------- ---------- ------------- ---------------------------------------------------------------------------- p0-69 Fri Jun 2 20:36:00 2017 CRITICAL Filesystem FilesysType=2 EVT-SPACE-00004: Space usage in Data Collection has exceeded 100% threshold. ----- ------------------------ -------- ---------- ------------- ---------------------------------------------------------------------------- There is 1 active alert. ddboost@prdctdd50# df -kh Active Tier: Resource Size GiB Used GiB Avail GiB Use% Cleanable GiB* ---------------- -------- --------- --------- ---- -------------- /data: pre-comp - 1678028.9 - - - /data: post-comp 165663.3 165663.3 0.0 100% 28605.6 /ddvar 47.2 10.5 34.3 23% - /ddvar/core 984.3 368.6 565.7 39% - ---------------- -------- --------- --------- ---- --------------
Cause
- Backups are not getting expired, and the DD contains a lot of old files.
- DD contains a huge number of small files.
- DD is a source of Collection replication.
- Mtree Snapshots are holding data.
- Replication lag is causing the data to not be removed during cleaning.
- Data has ingested into the DD at its maximum level.
Resolution
Follow the below steps to troubleshoot the issue.
- See if there is a huge amount of cleanable size. If there is, start a manual cleaning to reclaim the space.
- Backups are not getting expired. The DD contains a lot of old files.
- Check the current retention policy set in the backup software.
- Check in the File Distribution section for the latest autosupport logs.
File Distribution
----------------- 59,343 files in 243 directories Count Space ----------------------------- -------------------------- Age Files % cumul% GiB % cumul% --------- ----------- ----- ------- -------- ----- ------- 1 day 1,486 2.5 2.5 6447.1 0.9 0.9 1 week 7,477 12.6 15.1 74390.0 10.1 11.0 2 weeks 12,183 20.5 35.6 94039.1 12.8 23.7 1 month 13,050 22.0 57.6 134241.0 18.2 41.9 2 months 3,432 5.8 63.4 111922.9 15.2 57.1 3 months 2,417 4.1 67.5 51673.8 7.0 64.1 6 months 2,562 4.3 71.8 154479.7 21.0 85.1 1 year 2,806 4.7 76.5 35099.5 4.8 89.8 > 1 year 13,930 23.5 100.0 74979.7 10.2 100.0 --------- ----------- ----- ------- -------- ----- -------
If files older than the duration set in the current retention policy are found in the DD, check with the backup software, as the files are not expiring as per the policy.
- If the policy was recently changed, then files which were ingested with the old retention policy could be present. Expire the old backups manually from the backup software.
- If the retention policy and the file distribution are as per the expectation, then proceed further.
- DD contains a huge number of small files.
Data Domain File System is designed to segment data ingest into 4 KB to 12 KB segments for deduplication of those segments. Files under 12 KB in size could potentially take up more space on the DD than they would on normal storage. If there are many files that are 10 KB or smaller, this could impact storage usage.
Check this in the autosupport logs of DD. The Histogram of small files is under the File Distribution:
Count Space ----------------------------- -------------------------- Size Files % cumul% GiB % cumul% --------- ----------- ----- ------- -------- ----- ------- 1 KiB 8 0.0 0.0 0.0 0.0 0.0 10 KiB 32,792 25.1 25.1 0.2 0.0 0.0 100 KiB 32,774 25.1 50.2 2.0 0.0 0.0 500 KiB 10,607 8.1 58.3 2.3 0.0 0.0 1 MiB 1,653 1.3 59.6 1.2 0.0 0.0 5 MiB 8,036 6.2 65.7 17.5 0.0 0.0 10 MiB 2,377 1.8 67.6 15.6 0.0 0.0 50 MiB 6,680 5.1 72.7 152.1 0.0 0.0 100 MiB 1,153 0.9 73.5 72.1 0.0 0.0 500 MiB 517 0.4 73.9 120.5 0.0 0.0 1 GiB 322 0.2 74.2 233.4 0.0 0.0 5 GiB 1,432 1.1 75.3 3767.4 0.2 0.2 10 GiB 581 0.4 75.7 4416.6 0.2 0.4 50 GiB 23,715 18.2 93.9 606656.4 28.3 28.7 100 GiB 3,999 3.1 96.9 272684.4 12.7 41.4 500 GiB 3,456 2.6 99.6 982062.1 45.8 87.2 > 500 GiB 536 0.4 100.0 268000.0 12.5 99.7 --------- ----------- ----- ------- -------- ----- -------
If this is the reason, then Dell Technologies recommends setting up the backup procedure so that the small files are ingested rarely. Change backup methodology to include all small files into a single larger archive (such as a .tar or .gz file) before writing them to the DD.
- Mtree Snapshots are holding data.
- Check if there are any old unexpired or expired snapshots holding data. For all the Mtrees, run the command:
# snapshot list mtree /data/col1/<Mtree_name>
-
If any old snapshots are found which are still there and holding data, check if these can be expired. If so, expire the snapshots.
-
If any REPL-MTREE-RESYNC-RESERVE* snapshots exist, replication break and resync have been done.
-
Those snapshots do not expire with the normal replication process
-
They must be manually expired when no longer needed otherwise the retained period is one year.
-
# snapshot expire <snapshot_name> mtree /data/col1/<Mtree_name>
- Start and stop cleaning to remove these expired snapshots:
# filesys clean start
# filesys clean stop
- Verify if the snapshots are removed for the Mtrees for which the snapshots were expired:
# snapshot list mtree /data/col1/<Mtree_name>
- If the snapshot is still there, follow Data Domain: Cannot Delete Snapshot Softlocked by Replication Context
- A large replication lag is causing the data to not be removed during the cleaning.
- Check if there is any replication configured in the DD by running the command:
# replication show config
- Check if there is an alert for replication lag or sync-as-of time in DD by running the command:
# alerts show current
- Check if there is any cleanable size in the output of the command:
# df
- If there is a huge replication lag, then break the pair and start the cleaning to claim the space in DD. When complete, resync the replication again.
- Reference Data Domain - Break and Resync Directory Replication to break and resync Directory replication.
- Data has been ingested into the DD at its maximum level.
If all the above causes do not seem to be happening here, then the space in the DD is full at its maximum capacity and the user must add more storage into the DD.
Additional Information
For more information, reference this video:
How to resolve space usage in Data Collection on Dell Data Domain.
Duration: 00:06:35 (hh:mm:ss)
Closed captions: None are available.