Avamar: Scheduled backups for random clients fails with Fatal Server Error occurred.
Summary: Scheduled backups for random clients fail with ERROR "avtar FATAL <5704>: Fatal Server Error occurred (MSG_ERR_AUTH_FAIL)"
Symptoms
Scheduled backups for random clients fail with ERROR "avtar FATAL <5704>: Fatal Server Error occurred (MSG_ERR_AUTH_FAIL)." The same backups run fine when kicked off as manual backups.
Backup logs (which can be obtained using Avamar Administrator UI or from the client under C:\Program Files\avs\var\) show the following error message:
2019-01-07 13:02:36 avtar Info <9772>: Starting graceful (staged) termination, Fatal server error (wrap-up stage)
2019-01-07 13:02:36 avtar FATAL <5704>: Fatal Server Error occurred (MSG_ERR_AUTH_FAIL), aborting execution (HASHLIST_IS_PRESENT=406 serial=2 seq=0 flags=R:N:0 kind=0 rsp=MSG_ERR_AUTH_FAIL)
Following are the symptoms seen:
1. The failure occurs only for scheduled backups starting at a specific time (such as the beginning of the backup window)
2. On demand backups for the same clients that failed to backup completes without any issues.
3. The problem client is random and is not fixed to one specific client all the time.
If all the above symptoms are not seen, stop following KB and engage DELL Avamar Support to investigate this issue further.
Cause
Due to a software limitation, the Avamar Server software (gsan) takes longer than 30 seconds to respond to login requests from clients. This delay occurs when many clients such as more than 40 in this scenario begin backing up simultaneously.
Resolution
Stagger the backups so that all the clients do not flood gsan with many session ticket logins requests simultaneously. The session ticket timeout value is set to 60 seconds by default.
To prevent this issue, staggering of backups must be at least 2 min apart.