Avamar: AlwaysOn SQL Incremental Backups Randomly Fail Due to "log gap" Errors
Summary: For Incremental backup during Log Sequence Number (LSN) comparison the SQL server is returning the latest LSN for the Database. However, Avamar metadata is having older LSN value in the sqlmeta.xml file. This is causing SQL Incremental backups to fail with a log gap was identified, or a full backup was not found. For the cluster backup, it was observed that there is a generic delay of 1 second in updating the table sys.database_recovery_status with the latest LSN value. As a result, the outdated LSN number gets returned from the Avamar SQL Plugin query. ...
Symptoms
2021/03/22-04:30:53.62299 [avsql_assist] Before alignment - Str1: '240000000392000001', Str2: '241000000328000001' 2021/03/22-04:30:53.62299 [avsql_assist] After alignment - Str1: '240000000392000001', Str2: '241000000328000001' 2021/03/22-04:30:53.62299 [avsql_assist] <=== avsql_assist::align_numeric_ustrings 2021-03-22 00:30:53 avsql Info <15765>: A log gap was identified or a full backup was not found. 2021/03/22-04:30:53.62299 [avsql_assist] ===> sqlconnect::~sqlconnect 2021/03/22-04:30:53.62299 [avsql_assist] <=== sqlconnect::~sqlconnect 2021/03/22-04:30:53.62299 [avsql_assist] <=== avsql_assist::snapup_check_timestamps 2021-03-22 00:30:53 avsql Error <40418>: Skipping database 'oalistener07\OA05_AG/_Sync' due to the following reason: A log gap was identified or a full backup was not found.
The LSN information in sqlmeta.xml is outdated and out of sync with the SQL server view of the same Database.
Cause
For SQL incremental backups, the process compares LSN number which is retrieved from sys.database_recovery_status while finding log gap. The query used for this task is:
SELECT last_log_backup_lsn FROM sys.database_recovery_status "WHERE database_id = DB_ID(N’db2-mi')"
The log would look like:
2022/08/24-03:28:01.12199 [avsql_assist] retrieving last backup lsn for 'db2-mi' db from sys.database_recovery_status 2022/08/24-03:28:01.12199 [avsql_assist] ===> sqlconnectimpl_smo::InitDll 2022/08/24-03:28:01.12199 [avsql_assist] SMO dll already loaded. 2022/08/24-03:28:01.12299 [avsql_assist] <=== sqlconnectimpl_smo::InitDll 2022/08/24-03:28:01.12400 [avsql_assist] ==> SMOWrap::SMO_GetLastBackupLSN 2022/08/24-03:28:01.28200 [avsql_assist] database 'db2-mi', last backup lsn = '315000000022400001' 2022/08/24-03:28:01.28200 [avsql_assist] <=== sqlconnectimpl_smo::get_last_backup_lsn 2022/08/24-03:28:01.28200 [avsql_assist] ===> avsql_metadata::get 2022/08/24-03:28:01.28200 [avsql_assist] ===> avsql_metadata::get 2022/08/24-03:28:01.28299 [avsql_assist] <=== avsql_metadata::get 2022/08/24-03:28:01.28299 [avsql_assist] <=== avsql_metadata::get 2022/08/24-03:28:01.28299 [avsql_assist] Last backup LSN: '315000000022400001' (Get from sqlmeta.xml), Current LSN: '315000000022400001'
sqlmeta.xml file during this incremental updated using the following query:
SELECT last_lsn, type, user_name FROM msdb..backupset WHERE database_name=N'db2-mi' AND type LIKE 'L' ORDER by last_lsn DESC
The results of the two queries should report the same LSN number for the Database.
If the result does not match, then there is a confirmed break-in "log chain" sequence and log gap error expected during Incremental backups. In an ideal scenario during Cluster backups, after the completion of backup Microsoft SQL server updates the table sys.database_recovery_status with the latest LSN value. Avamar SQL plugin queries this table for LSN and stores the value in SQL metadata. During the next incremental backup, the Avamar SQL plugin again queries the table and gets the latest LSN value. This LSN is then compared with the LSN stored in SQL metadata, and a backup is performed.
In busy Always cluster environments, while Avamar SQL plugin is querying the table for LSN value, older LSN value is obtained and stored in SQL Metadata. Microsoft SQL server is updating the table after sometime. During the next incremental backup when the SQL plugin queries the table again, it gets the latest LSN value. When this value is compared with the stale LSN, stored in the sqlmeta.xml file, a log gap is found and the backup is promoted to full.
Resolution
Temporary workaround:
- Force a full backup of this database to resolve this failure.
- This resyncs the LSN number sequence for the specific Database in sqlmeta.xml and SQL server.
- All subsequent incremental backups of this Database should now complete successfully.
Permanent Fix:
- Ensure to add the flag "--latest-lsn-from-msdb=true" to the avsql.cmd file and apply the HotFix (HF) according to the client plug-in version:
- v19.10-100-135 => No HF available, upgrade to build 166 and apply the corresponding HF
- v19.10-100-166 (SP1) => HF 338887
- v19.12-100-186 => HF 338888
To download the hotfix from the Dell Support side, see the steps described in Avamar: How to find and download a product hotfix, patch, installation, or upgrade package from the Dell Support website
To apply the hotfix, follow the instructions provided by Dell Support using README file or see the Dell Knowledge Base article for the specific hotfix.
Caution: If the issue persists after applying the hotfix, contact Dell Support for further assistance.