Avamar: SQL client backups fail randomly with sslcontext::loadCert Error with session security enabled
Summary: In Avamar versions prior to 19.8 SQL clients deployed in Microsoft Azure using Data Domain Virtual Edition (DDVE), backups may fail randomly with session security enabled. The failures occur due to SSL certificate and private key validation errors caused by incorrect IP address selection when the client resolves to the IPv6 loopback address instead of IPv4. ...
Symptoms
When session security is enabled, SQL client backups fail intermittently with SSL-related errors. Within the same backup job, some threads complete successfully while others fail.
- The
avsqllogs show the following errors:
2023-01-11 12:02:10.99399 [avsql] uflags::parsefile Printing flags from C:\Program Files\avs\var\avsql.cmd: .cmd flag [1]: --debug=true .cmd flag [2]: --verbose=1 .cmd flag [3]: --x14=32768 2023-01-11 12:02:10.99399 [avsql] sock::libinit(enc="tls", encrypt_strength="high", verify=0) socktype="sock_ssl" 2023-01-11 12:02:10.99399 [avsql] sockimpl::libinit(usessl=1, verify=0, encrypt_strength="high", pemdir="C:\Program Files\avs\etc") 2023-01-11 12:02:10.99399 [avsql] sslcontext::libinit 2023-01-11 12:02:10.99399 [avsql] sslcontext::libinit verifypeer=0, encrypt_strength="high", global_sslctx=0000000001950D90, global_sslctx->ctx=000000000119B630 2023-01-11 12:02:10.99399 [avsql] sslcontext::libinit load cert: C:\Program Files\avs\etc\cert.pem 2023-01-11 12:02:10.99399 [avsql] sslcontext::libinit load key: C:\Program Files\avs\etc\key.pem 2023-01-11 12:02:10.99399 [avsql] ERROR: <0001> sslcontext::loadCert certificate/key not found or invalid cert=C:\Program Files\avs\etc\cert.pem key=C:\Program Files\avs\etc\key.pem 2023-01-11 12:02:10 avsql Error <5664>: SSL certificate/key not found or invalid. 2023-01-11 12:02:10 avsql Error <16154>: Unable to initialize socket library
- In the same work order, some
avtarthreads complete successfully, while others fail with certificate errors.
Successful avtar thread example:
2023-01-11 12:02:11 avtar Info <5008>: Logging to C:\Program Files\avs\var\SCH-60-G-TLOG-SQL-60-1673449200555#2-3006-SQL.avtar.log 2023-01-11 12:02:11 avtar Info <5174>: - Reading C:\Program Files\avs\var\avtar.cmd 2023-01-11 12:02:11 avtar Info <5551>: Command Line: avtar --ddr-auth-enabled=false --ddr-encrypt-strength=none --ddr-auth-mode=3 --case_sensitive=false --max-streams=1 --backup-type=incremental --cacheprefix=avsql_t1 --ctlcallport=64502 --ctlinterface=3006-SCH-60-G-TLOG-SQL-60-1673449200555#2 --check-stdin-path=false --logfile="C:\Program Files\avs\var\SCH-60-G-TLOG-SQL-60-1673449200555#2-3006-SQL.avtar.log" --vardir="C:\Program Files\avs\var" --bindir="C:\Program Files\avs\bin" --sysdir="C:\Program Files\avs\etc" --account=/Azure/AZ-Prod/cldsrv1088.arcorgroup.com --id=backuponly --password=**************** --server=cldlnx0063.arcorgroup.com --ctlusessl=true 2023-01-11 12:02:11 avtar Info <7977>: Starting at 2023-01-11 12:02:11 Argentina Standard Time [avtar Oct 16 2020 15:51:11 19.4.100-116 Windows Server 2019 Datacenter Server Edition (No Service Pack) 64-bit-AMD64] 2023/01/11-15:02:11.03500 [avtar] <1291> FIPS mode enabled 2023-01-11 12:02:11 avtar Info <10684>: Setting ctl message version to 3 (from 1) 2023-01-11 12:02:11 avtar Info <16136>: Setting ctl max message size to 268435456 2023-01-11 12:02:11 avtar Info <6767>: Successfully connected to 127.0.0.1:64502
avtar thread example:
2023-01-11 12:02:30 avtar Info <5008>: Logging to C:\Program Files\avs\var\SCH-60-G-TLOG-SQL-60-1673449200555#0-3006-SQL.avtar.log 2023-01-11 12:02:30 avtar Info <5174>: - Reading C:\Program Files\avs\var\avtar.cmd 2023-01-11 12:02:30 avtar Info <5551>: Command Line: avtar --ddr-auth-enabled=false --ddr-encrypt-strength=none --ddr-auth-mode=3 --case_sensitive=false --max-streams=1 --backup-type=incremental_full --ctlcallport=64502 --ctlinterface=3006-SCH-60-G-TLOG-SQL-60-1673449200555 --check-stdin-path=false --logfile="C:\Program Files\avs\var\SCH-60-G-TLOG-SQL-60-1673449200555#0-3006-SQL.avtar.log" --vardir="C:\Program Files\avs\var" --bindir="C:\Program Files\avs\bin" --sysdir="C:\Program Files\avs\etc" --account=/Azure/AZ-Prod/cldsrv1088.arcorgroup.com --id=backuponly --password=**************** --server=cldlnx0063.arcorgroup.com --ctlusessl=true 2023-01-11 12:02:30 avtar Info <7977>: Starting at 2023-01-11 12:02:30 Argentina Standard Time [avtar Oct 16 2020 15:51:11 19.4.100-116 Windows Server 2019 Datacenter Server Edition (No Service Pack) 64-bit-AMD64] 2023/01/11-15:02:30.86500 [avtar] <1291> FIPS mode enabled 2023/01/11-15:02:30.86500 [avtar] ERROR: <0001> sslcontext::loadCert certificate/key not found or invalid cert=C:\Program Files\avs\etc\cert.pem key=C:\Program Files\avs\etc\key.pem 2023-01-11 12:02:30 avtar Error <5664>: SSL certificate/key not found or invalid. 2023-01-11 12:02:30 avtar FATAL <19035>: Unable to initialize socket library for CTL.
The behavior is inconsistent, with failures occurring randomly across backup threads.
Cause
During failure scenarios, the private key received from the Avamar server is invalid. As a result, the OpenSSL function SSL_CTX_use_PrivateKey_file() fails.
The Avamar client agent selects the first reachable host address, which may be the IPv6 loopback address (::1) because it is included in the hostname list. Although the client sends an IPv6 address to the Management Console (MC), the MC stores only the IPv4 address. This address mismatch causes SSL certificate validation failures.
Resolution
Add the following flags in avagent.cmd on the impacted client and restart avagent to circumvent this issue:
--ipupdateinterval=43200 (Number of minutes between IP address update) --addr-family=4 (Force to use IPv4) --netbind=10.x.x.x (Force association with this network address rather than OS autoselection)
Parameter descriptions:
--ipupdateintervalspecifies the interval (in minutes) for updating the client IP address.--addr-family=4forces the client to use IPv4.--netbindbinds the agent to a specific IPv4 address instead of automatic OS selection.
After applying the changes, restart the Avamar agent and rerun the backup.
Permanent Fix
This issue has been addressed in V19.8. The code changes have been made to ignore the localhost paging address.