Avamar: Checkpoint failed with result "MSG_ERR_BADTIMESYNC."
Resumen: Checkpoint failed with result "MSG_ERR_BADTIMESYNC."
Este artículo se aplica a
Este artículo no se aplica a
Este artículo no está vinculado a ningún producto específico.
No se identifican todas las versiones del producto en este artículo.
Síntomas
Checkpoint failed with result "MSG_ERR_BADTIMESYNC."
avmaint cpstatus shows the following error:
Every 2.0s: avmaint cpstatus 20:16:23 2022 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cpstatus generation-time="1663935384" tag="cp.20220923121551" status="error" stripes-completed="0" stripes-total="0" start-time="1663935351" end-time="1663935351" result="MSG_ERR_BADTIMESYNC" refcount="1"/>
mapall --parallel 'date' shows a node out of sync:
admin@utility:~/>: mapall --parallel 'date' Using /usr/local/avamar/var/probe.xml (0.0) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.2 'date' (0.1) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.3 'date' (0.2) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.4 'date' (0.3) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.5 'date' (0.4) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.6 'date' (0.7) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.9 'date' (0.6) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.8 'date' (0.5) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.7 'date' Fri Sep 23 13:05:21 UTC 2022 Fri Sep 23 13:05:21 UTC 2022 Fri Sep 23 13:07:17 UTC 2022 <---- out of sync node Fri Sep 23 13:05:20 UTC 2022 Fri Sep 23 13:05:22 UTC 2022 Fri Sep 23 13:05:20 UTC 2022 Fri Sep 23 13:05:22 UTC 2022 Fri Sep 23 13:05:21 UTC 2022
Verifying ntp with ntpq -pn shows connection refused on the suspect node.
Output edited to show only the affected node:
admin@utility:~/>: mapall --noerror '/usr/sbin/ntpq -pn' Using /usr/local/avamar/var/probe.xml (0.3) ssh -q -x -o GSSAPIAuthentication=no admin@192.168.255.5 '/usr/sbin/ntpq -pn' /usr/sbin/ntpq: read: Connection refused
When verifying the status directly on the affected node we see that ntpd is Active: activating (auto-restart) (Result: resources):
root@node03:~/>: systemctl status ntpd ● ntpd.service - NTP Server Daemon Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled) Drop-In: /run/systemd/generator/ntpd.service.d └─50-insserv.conf-$time.conf Active: activating (auto-restart) (Result: resources) since Fri 2022-09-23 13:22:35 UTC; 1min 58s ago
However the status should reflect Active: active (running):
root@node03:~/#: systemctl status ntpd ● ntpd.service - NTP Server Daemon Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled) Drop-In: /run/systemd/generator/ntpd.service.d └─50-insserv.conf-$time.conf Active: active (running) since Fri 2022-09-23 14:04:37 UTC; 26s ago
Attempting to start ntpd fails:
root@node03:~/#: systemctl start ntpd.service Job for ntpd.service failed because a configured resource limit was exceeded. See "systemctl status ntpd.service" and "journalctl -xe" for details.
During the review of journalctl -xe, 'No space left on the device' messages are seen.
df shows that /var is at 100%:
admin@node03:~/#: df -kh Filesystem Size Used Avail Use% Mounted on devtmpfs 16G 8.0K 16G 1% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 50M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda5 9.8G 2.4G 7.0G 26% / /dev/sdg1 183G 8.3G 165G 5% /ssd01 /dev/sda1 979M 50M 878M 6% /boot /dev/sdd1 1.9T 236G 1.6T 13% /data04 /dev/sdc1 1.9T 240G 1.6T 13% /data03 /dev/sde1 1.9T 236G 1.6T 13% /data05 /dev/sdf1 1.9T 238G 1.6T 13% /data06 /dev/sdb1 1.9T 238G 1.6T 13% /data02 /dev/sda7 2.0G 2.0G 0 100% /var <------- /dev/sda3 1.8T 267G 1.6T 15% /data01
Causa
ntpd reliies on /var/lib/ntp/drift/ntp.drift which contains the latest estimate of clock frequency error. If /var is at 100% then ntpd cannot update or create the ntp.drift file and ntp will not function correctly.
Resolución
On the affected node, investigate and resolve 100% usage on /var. Once corrected restart ntpd:
Check the status of ntpd:
Results similar to the following should be observed:
Verify ntp with ntpq:
Results similar to the following should be observed:
Test the resolution by running a manual checkpoint from the utility node.
root@node03:~/#: systemctl restart ntpd
NOTE: A successful restart will not generate any output.
Check the status of ntpd:
root@node03:~/#: systemctl status ntpd
Results similar to the following should be observed:
root@node03:~/#: systemctl status ntpd ● ntpd.service - NTP Server Daemon Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2022-09-27 21:21:42 UTC; 37s ago Docs: man:ntpd(1) Process: 29442 ExecStart=/usr/sbin/start-ntpd start (code=exited, status=0/SUCCESS) Main PID: 29463 (ntpd) Tasks: 2 CGroup: /system.slice/ntpd.service ├─29463 /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf └─29464 ntpd: asynchronous dns resolver Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 3 bond0 10.241.169.52:123 Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 4 bond1 192.168.255.22:123 Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 5 lo [::1]:123 Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 6 bond0 [fe80::260:16ff:feaa:2a10%11]:123 Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 7 bond1 [fe80::260:16ff:fea9:b182%12]:123 Sep 27 21:21:42 node03 ntpd[29463]: Listening on routing socket on fd #24 for interface updates Sep 27 21:21:42 node03 start-ntpd[29442]: Starting network time protocol daemon (NTPD) Sep 27 21:21:42 node03 systemd[1]: Started NTP Server Daemon.
Verify ntp with ntpq:
root@node03:~/#: /usr/sbin/ntpq -pn
Results similar to the following should be observed:
root@node03:~/#: /usr/sbin/ntpq -pn remote refid st t when poll reach delay offset jitter ============================================================================== *10.241.216.209 10.233.131.242 2 u 966 1024 377 0.558 1.559 0.600 +192.168.255.21 10.241.216.209 3 u 401 1024 377 0.152 0.521 0.420
Test the resolution by running a manual checkpoint from the utility node.
Productos afectados
Avamar, Avamar Data Store, Avamar Data Store Gen3, Avamar Data Store Gen4, Avamar Data Store Gen4S, Avamar Data Store Gen4T, Avamar Data Store Gen5A, Avamar Server, Avamar Virtual EditionPropiedades del artículo
Número del artículo: 000203791
Tipo de artículo: Solution
Última modificación: 18 jul 2023
Versión: 3
Encuentre respuestas a sus preguntas de otros usuarios de Dell
Servicios de soporte
Compruebe si el dispositivo está cubierto por los servicios de soporte.