Avamar: HFSCHECK Fails with MSG_ERR_ROLLING_CHECK due to an Avamar Server time shift
Summary: An incorrect time source causes the Avamar checkpoint validation (hfscheck) to fail.
Symptoms
Primary Symptoms:
Maintenance activities (checkpoint and hfscheck) do not complete successfully.
Checkpoint validation (hfscheck) is failing with MSG_ERR_ROLLING_CHECK.
As the latest checkpoint is already validated, it cannot be validated again.
The cplist command shows a checkpoint with a future date and time:
cplist
cp.20201103213238 Wed Nov 3 16:32:38 2020 valid --- del nodes 1/1 stripes 4779
cp.20201104153925 Thu Nov 4 10:39:25 2020 valid --- --- nodes 1/1 stripes 4796
cp.20320218180432 Wed Feb 18 12:04:32 2032 valid rol --- nodes 1/1 stripes 4521
Attempting to manually validate the checkpoint fails:
avmaint hfscheck --checkpoint=cp.20201104153925 --rolling
ERROR: avmaint: hfscheck: server_exception(MSG_ERR_ROLLING_CHECK)
Aside from the maintenance errors, a secondary issue could arise where backups are listed in the future, and as a result:
- Forecasting and reports are incorrect.
- The Management Console Server (MCS) UI information may be incorrect.
- Certain reports (For example 'Completed Activities - Failed' and 'Activity Report') are not correct.
Sample outputs for a backup that was run in 2019, but reports 2032:
mccli backup show --name=/clients/client_name
0,23000,CLI command completed successfully.
Created LabelNum Size Retention
----------------------------- -------- ----------- ---------
2032-02-17 19:01:41 GMT-06:00 15 23875106816 DMY
-- Or an MCS flush: --
avtar --backups --id=MCUser --ap=<password> --path=/MC_BACKUPS --hfsaddr=<server>--count=26
Date Time Seq Label Size Plugin Working directory Targets
---------- -------- ------ ------- ------- -------- -------------------- -------------------------
2032-02-15 09:01:05 8373 116436K Linux /usr/local/avamar var/mc/server_data/Cause
A time server issue caused the clock on the Avamar grid to reset to 2032.
A checkpoint validation occurred during this time and now there is a validated cp from 2032.
The engineering team confirmed that this is the expected behavior when an incorrect time source is used.
Resolution
Caveats:
If discrepancies are found in backup times (per the Secondary Symptoms above), these must be corrected first. Do not perform the steps in this article, instead Contact Dell Technologies Technical Support for further assistance.
To resolve the hfscheck issue:
1. Reset the scheduler:
avmaint sched reset --ava
2. Stop and start the scheduler:
avmaint sched stop --ava
avmaint sched start --ava
3. Take a checkpoint.
4. Perform hfscheck.
5. Monitor the grid through the next scheduled maintenance window to verify that the issue is resolved.