Data Domain: Historical database pruning failed after upgrading to DDOS 6.1 or 6.2

Summary: How to address issues with the historical database failing to prune after some DDOS upgrade, which will eventually result in the DD OS /ddr/ partition being full, and causing downtime

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms



Data Domain Restorers (DDRs) use a historical database to record events (such as various performance metrics) over time.
This database is a SQLite database file/instance held under /ddr/hd, i.e.:
 
/ddr/hd:
-rw-r--r--  1 root root 1490051072 Jun  8 07:23 dd_hd.db

Note that the /ddr/hd directory sits on the /ddr file system:
 
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/dd_dg0_0p14       5160576   3691320   1207112  76% /ddr

Once a day a 'prune' job is started by cron to remove old/unnecessary data from the historical database to prevent it from growing too large, i.e.:
 
config.crontab.hd_prune = 12 12 * * * root /ddr/bin/dd_hd_rdb_tool -p

Under certain circumstances, however, this job can fail to run (particularly after some upgrades to 6.2.x code). When this happens corresponding messages will be seen in messages.engineering:
 
Feb  4 12:10:00 dd2500-01 hdc: INFO: SQLITE: rc=1, no such table: hd_perf_clgrp
Feb  4 12:10:00 dd2500-01 hdc: INFO: Error: <5146: Internal error accessing historical database.  Diagnostics: Operation(prepare stmt), Reported by(hdal), Location(hdal_db_sqlite3.c, 237, _get_object_meta_info), Database error(sqlite3, 1, no such table: hd_perf_clgrp), ddr/lib/dd_hd_rdb_sqlite.c, dd_hd_rdb_report_error_sqlite, 3825>

In addition, an alert will be raised indicating that pruning has failed:
 
Feb  4 12:12:02 dd2500-01 dd_hd_rdb_tool: INFO: Event posted: p0-19 (11000013:285212691): EVT-HD-RDB-0004: Historical database pruning failed.       

If pruning fails repeatedly then old/unnecessary data will not be able to be removed from the historical database meaning that it slowly grows in size.

Ultimately this can cause the /ddr file system to become 100% full (either transiently during attempted pruning or permanently if the historical database increases to a sufficient size). This then causes further issues such as an inability for the system to update registry files. Ultimately this can lead to system instability/unexpected DDFS restarts.

Cause

Pruning of the historical database generally requires as much free space in the /ddr file system as the current size of the historical database. If the /ddr/ has somehow become nearly full, the pruning job for the historical database may fail, which will only add to disk space consumption on /ddr/ overtime.

However in this case we focus on the pruning of the historical database failing for reasons other than previous lack of space. Something in the sequence of DDOS upgrades and / or other problems had during the upgrades, may have caused a missing table in the historical DB. This missing table is not something that the sqlite validation check will catch. The missing table is typically "hd_perf_clgrp" or "hd_space_fmig_runs_detailed", as seen below:
 
Feb 4 12:10:00 dd2500-01 hdc: INFO: Error: <5146: Internal error accessing historical database. Diagnostics: Operation(prepare stmt), Reported by(hdal), Location(hdal_db_sqlite3.c, 237, _get_object_meta_info), Database error(sqlite3, 1, no such table: hd_perf_clgrp), ddr/lib/dd_hd_rdb_sqlite.c, dd_hd_rdb_report_error_sqlite, 3825>
 

If this error, or any other in the logs (messages.engineering) for other missing tables keeps showing, or alerts about the historical database not pruning daily show up, please contact your contracted support provider at the earliest to have the underlying issue resolved.

Resolution

Data Domain Support will most likely need a remote session to do proper troubleshooting for the root cause of the reason why the historical database is not pruning. Failing to resolve this pruning issue will continue increasing the disk space used under the smaller /ddr/ partition, and will eventually fill up, resulting in potential unavailability of the DD and downtime.

Unless if the problem with pruning is the result of /ddr/ already being nearly full, the resolution will incur fixing the historical database structure and, in some cases, running the pruning action off the small /ddr/ partition, so that the process doesn't run out of space. Neither one of these actions require any downtime or will affect the running of backups or other DD activities. The only downside may be some historical and performance entries in the database to be dropped while the database is being made consistent or being pruned off a separate, larger partition.

Note the code issue resulting in the mentioned trigger preventing the historical database upgrade (and hence the failures to prune it daily) was fixed in the code for the following releases:

  • DDOS 6.1.2.40 and later
  • DDOS 6.2.0.20 and later
Hence, for any customer planning to upgrade for the first time to DDOS 6.1 or DDOS 6.2, it is strongly advised to do so to any of the mentioned fixed releases above.
Note: DD OS upgrade will not resolve the issue if the Data Domain is already having historical database pruning errors.

Additional Information

This content is  translated in other languages: 
https://downloads.dell.com/TranslatedPDF/PT-BR_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/ZH-CN_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/ES_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/DE_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/FR_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/IT_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/JA_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/NL_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/KO_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/RU_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/PT_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/SV_KB531069.pdf


Errors while pruning the historical database can be found in "messages.engineering", so using "log view debug/messages.engineering" may be used to search for those, considering the pruning job is scheduled to be started at 12:12 PM local DD time every day. It would be useful to provide DD Support with any matching errors or a full SUB up-front for analysis.

Affected Products

Data Domain

Products

Data Domain
Article Properties
Article Number: 000055469
Article Type: Solution
Last Modified: 09 Oct 2024
Version:  4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.