Avamar: A checkpoint validation (hfscheck) is overdue due to too many messages in err.log
Summary: Avamar reports "Symptom Code 22409 - A checkpoint validation (hfscheck) is overdue" If too many err.log files are created in a short time period.
Symptoms
The event may be generated when a checkpoint validation (hfscheck) has been running successfully daily.
Error messages received from High Priority Events, Dial Home), or Events Management in the Management Console UI:
22409 ERROR A checkpoint validation (hfscheck) of Avamar Server checkpoint data is overdue.
summary="A checkpoint validation (hfscheck) of Avamar Server checkpoint data is overdue."
type="ERROR"
code="22409"
Symptom: 22409, Desc: A checkpoint validation (hfscheck) of server checkpoint data is overdueCause
If too many err.log files (/data01/cur/err.log*) are created in a short period of time, the Management Console Server (MCS) cannot process them quickly enough.
As a result, hfscheck completion notifications may not be completed in a timely manner causing the error to appear.
Resolution
1. Log in to the Avamar Utility Node as admin and load the admin keys per Avamar: How to Log in to an Avamar Server and Load Various Keys.
2. Verify the number of err.log files on each data node:
mapall --noerror 'ls -l /data01/cur/err.log* | wc -l'
a. If the value is very high, review some of the err.log files for messages that may be repeating in the logs.
Take any corrective action required (not covered in this article) to stop these messages occurring.
b. If the message cannot be prevented, then a workaround is available to only parse through the latest err.log files.
Warning: Other messages (such as hardware events) could be missed if the MCS skips to the latest log.
3. Compare the errloglen value for each node () with the current startoffset in the MCS Database (MCDB):
avmaint nodelist | grep errloglen
psql -d mcdb -p 5555 -c 'select error_key, error_idx from sv_elog_cache;'
Sample outputs from a single-node grid:
avmaint nodelist | grep errloglen
errloglen="413993"
psql -d mcdb -p 5555 -c 'select error_key, error_idx from sv_elog_cache;'
error_key " error_idx
-----------+-----------
0.0 " 413993
(1 row)
Sample output from a multinode grid:
avmaint nodelist | grep errloglen
errloglen="15909111"
errloglen="15481012"
errloglen="11157648"
psql -d mcdb -p 5555 -c 'select error_key, error_idx from sv_elog_cache;'
error_key " error_idx
-----------+-----------
0.s | 5964206
0.2 | 15909111
0.0 | 11157648
0.1 | 15481012
(4 rows)
If the values are different, continue from step 4 to update the move_to_last_error value.
4. Set the move_to_last_error value to true:
a. Make a backup copy of mcserver.xml:
cp -p /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml /usr/local/avamar/var/mc/server_data/prefs/x-mcserver.xml.`date +%y%m%d%H%M`
b. Edit mcserver.xml:
vi /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml
c. Change the move_to_last_error to true:
<entry key="move_to_last_error" value="true" />
5. Stop and start MCS per Avamar: How to restart the management console server.
6. Validate that the error_idx and errloglen match same using the commands from Step 3.
7. Revert the move_to_last_error value to false:
a. Make another backup copy of mcserver.xml:
cp -p /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml /usr/local/avamar/var/mc/server_data/prefs/x-mcserver.xml.`date +%y%m%d%H%M`
b. Edit mcserver.xml:
vi /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml
c. Change the move_to_last_error to false:
<entry key="move_to_last_error" value="false" />
8. Stop and start MCS again per Avamar: How to restart the management console server.
9. The err.log files should be reviewed to ensure that there are no other issues present on the Avamar server.
If Symptom Code 22409 is still occurring, return to Avamar: Symptom Code 22409 - A checkpoint validation (hfscheck) of server checkpoint data is overdue (Resolution Path) for further assistance.