Avamar: Checkpoint non riuscito con risultato "MSG_ERR_BADTIMESYNC".
Summary: I checkpoint hanno esito negativo con il risultato "MSG_ERR_BADTIMESYNC"
Symptoms
Checkpoint non riuscito con risultato "MSG_ERR_BADTIMESYNC".
Il "avmaint cpstatusIl comando " mostra il seguente errore:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cpstatus
generation-time="1663935384"
tag="cp.20220923121551"
status="error"
stripes-completed="0"
stripes-total="0"
start-time="1663935351"
end-time="1663935351"
result="MSG_ERR_BADTIMESYNC"
refcount="1"/>
Il "mapall --parallel 'date'" mostra che un nodo non è sincronizzato:
(Per eseguire il comando mapall, le chiavi devono essere caricate per Avamar: Come accedere a un Avamar Server e caricare varie chiavi
admin@utility:~/>: mapall --parallel 'date'
Using /usr/local/avamar/var/probe.xml
(0.0) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.2 'date'
(0.1) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.3 'date'
(0.2) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.4 'date'
(0.3) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.5 'date'
(0.4) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.6 'date'
(0.7) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.9 'date'
(0.6) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.8 'date'
(0.5) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.7 'date'
Fri Sep 23 13:05:21 UTC 2022
Fri Sep 23 13:05:21 UTC 2022
Fri Sep 23 13:07:17 UTC 2022 <---- out of sync node
Fri Sep 23 13:05:20 UTC 2022
Fri Sep 23 13:05:22 UTC 2022
Fri Sep 23 13:05:20 UTC 2022
Fri Sep 23 13:05:22 UTC 2022
Fri Sep 23 13:05:21 UTC 2022
Verifica del protocollo NTP (Network Time Protocol) con "ntpq -pn" mostra il messaggio "Connessione rifiutata" sul nodo
sospetto (output modificato per mostrare solo il nodo interessato)
admin@utility:~/>: mapall --noerror '/usr/sbin/ntpq -pn'
Using /usr/local/avamar/var/probe.xml
...
(0.3) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.5 '/usr/sbin/ntpq -pn'
/usr/sbin/ntpq: read: Connection refused
...
Quando si verifica lo stato direttamente come root sul nodo interessato, il "Network Time Protocol Daemon" (NTPD) mostra "Active: activating (auto-restart) (Result: resources)":
root@node03:~/>: systemctl status ntpd
● ntpd.service - NTP Server Daemon
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
Drop-In: /run/systemd/generator/ntpd.service.d
└─50-insserv.conf-$time.conf
Active: activating (auto-restart) (Result: resources) since Fri 2022-09-23 13:22:35 UTC; 1min 58s ago
Lo stato deve essere Active: active (running):
● ntpd.service - NTP Server Daemon
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
Drop-In: /run/systemd/generator/ntpd.service.d
└─50-insserv.conf-$time.conf
Active: active (running) since Fri 2022-09-23 14:04:37 UTC; 26s ago
Il tentativo di avviare NTPD non riesce:
root@node03:~/#: systemctl start ntpd.service
Job for ntpd.service failed because a configured resource limit was exceeded. See "systemctl status ntpd.service" and "journalctl -xe" for details.
L'output di "journalctl -xe" segnala i messaggi "No space left on the device".
Il "df" mostra che /var è al 100%:
df -kh
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 8.0K 16G 1% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 50M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/sda5 9.8G 2.4G 7.0G 26% /
/dev/sdg1 183G 8.3G 165G 5% /ssd01
/dev/sda1 979M 50M 878M 6% /boot
/dev/sdd1 1.9T 236G 1.6T 13% /data04
/dev/sdc1 1.9T 240G 1.6T 13% /data03
/dev/sde1 1.9T 236G 1.6T 13% /data05
/dev/sdf1 1.9T 238G 1.6T 13% /data06
/dev/sdb1 1.9T 238G 1.6T 13% /data02
/dev/sda7 2.0G 2.0G 0 100% /var <------- 100% Use
/dev/sda3 1.8T 267G 1.6T 15% /data01Cause
NTPD si basa su /var/lib/ntp/drift/ntp.drift che contiene la stima più recente dell'errore di frequenza di clock.
se /var è pieno al 100%, NTPD non può aggiornare o creare il ntp.drift e NTP non funziona correttamente.
Resolution
1. Sul nodo interessato, analizzare e risolvere l'utilizzo al 100% di /vaR.
2. Una volta corretto:
Un. Riavviare NTPD:
root@node03:~/#: systemctl restart ntpd
B. Controllare lo stato di ntpd:
root@node03:~/#: systemctl status ntpd
Dovrebbero essere visualizzati risultati simili ai seguenti:
● ntpd.service - NTP Server Daemon
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2022-09-27 21:21:42 UTC; 37s ago
Docs: man:ntpd(1)
Process: 29442 ExecStart=/usr/sbin/start-ntpd start (code=exited, status=0/SUCCESS)
Main PID: 29463 (ntpd)
Tasks: 2
CGroup: /system.slice/ntpd.service
├─29463 /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf
└─29464 ntpd: asynchronous dns resolver
Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 3 bond0 10.n.n.52:123
Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 4 bond1 192.168.255.22:123
Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 5 lo [::1]:123
Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 6 bond0 [fe80::260:16ff:feaa:2a10%11]:123
Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 7 bond1 [fe80::260:16ff:fea9:b182%12]:123
Sep 27 21:21:42 node03 ntpd[29463]: Listening on routing socket on fd #24 for interface updates
Sep 27 21:21:42 node03 start-ntpd[29442]: Starting network time protocol daemon (NTPD)
Sep 27 21:21:42 node03 systemd[1]: Started NTP Server Daemon.
c. Verificare NTP con ntpq:
root@node03:~/#: /usr/sbin/ntpq -pn
Dovrebbero essere visualizzati risultati simili ai seguenti:
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.241.216.209 10.233.131.242 2 u 966 1024 377 0.558 1.559 0.600
+192.168.255.21 10.241.216.209 3 u 401 1024 377 0.152 0.521 0.420
3. Confermare la risoluzione eseguendo un checkpoint manuale da Avamar Utility Node.