Avamar: Checkpoint failed with result "MSG_ERR_BADTIMESYNC."

摘要: Checkpoint failed with result "MSG_ERR_BADTIMESYNC."

本文适用于 本文不适用于 本文并非针对某种特定的产品。 本文并非包含所有产品版本。

症状

Checkpoint failed with result "MSG_ERR_BADTIMESYNC."



avmaint cpstatus shows the following error:
 
Every 2.0s: avmaint cpstatus  20:16:23 2022

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cpstatus
  generation-time="1663935384"
  tag="cp.20220923121551"
  status="error"
  stripes-completed="0"
  stripes-total="0"
  start-time="1663935351"
  end-time="1663935351"
  result="MSG_ERR_BADTIMESYNC"
  refcount="1"/>

mapall --parallel 'date' shows a node out of sync:
 
admin@utility:~/>: mapall --parallel 'date'
Using /usr/local/avamar/var/probe.xml
(0.0) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.2 'date'
(0.1) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.3 'date'
(0.2) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.4 'date'
(0.3) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.5 'date'
(0.4) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.6 'date'
(0.7) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.9 'date'
(0.6) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.8 'date'
(0.5) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.7 'date'
Fri Sep 23 13:05:21 UTC 2022
Fri Sep 23 13:05:21 UTC 2022
Fri Sep 23 13:07:17 UTC 2022 <---- out of sync node
Fri Sep 23 13:05:20 UTC 2022
Fri Sep 23 13:05:22 UTC 2022
Fri Sep 23 13:05:20 UTC 2022
Fri Sep 23 13:05:22 UTC 2022
Fri Sep 23 13:05:21 UTC 2022

Verifying ntp with ntpq -pn shows connection refused on the suspect node.
Output edited to show only the affected node:
 
admin@utility:~/>: mapall --noerror '/usr/sbin/ntpq -pn'
Using /usr/local/avamar/var/probe.xml

(0.3) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.5 '/usr/sbin/ntpq -pn'
/usr/sbin/ntpq: read: Connection refused

When verifying the status directly on the affected node we see that ntpd is Active: activating (auto-restart) (Result: resources):
 
root@node03:~/>: systemctl status ntpd
● ntpd.service - NTP Server Daemon
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
  Drop-In: /run/systemd/generator/ntpd.service.d
           └─50-insserv.conf-$time.conf
   Active: activating (auto-restart) (Result: resources) since Fri 2022-09-23 13:22:35 UTC; 1min 58s ago

However the status should reflect Active: active (running):
 
root@node03:~/#: systemctl status ntpd
● ntpd.service - NTP Server Daemon
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
  Drop-In: /run/systemd/generator/ntpd.service.d
           └─50-insserv.conf-$time.conf
   Active: active (running) since Fri 2022-09-23 14:04:37 UTC; 26s ago

Attempting to start ntpd fails:
 
root@node03:~/#: systemctl start ntpd.service
Job for ntpd.service failed because a configured resource limit was exceeded. See "systemctl status ntpd.service" and "journalctl -xe" for details.

During the review of journalctl -xe, 'No space left on the device' messages are seen.

df shows that /var is at 100%:
 
admin@node03:~/#: df -kh
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         16G  8.0K   16G   1% /dev
tmpfs            16G     0   16G   0% /dev/shm
tmpfs            16G   50M   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/sda5       9.8G  2.4G  7.0G  26% /
/dev/sdg1       183G  8.3G  165G   5% /ssd01
/dev/sda1       979M   50M  878M   6% /boot
/dev/sdd1       1.9T  236G  1.6T  13% /data04
/dev/sdc1       1.9T  240G  1.6T  13% /data03
/dev/sde1       1.9T  236G  1.6T  13% /data05
/dev/sdf1       1.9T  238G  1.6T  13% /data06
/dev/sdb1       1.9T  238G  1.6T  13% /data02
/dev/sda7       2.0G  2.0G     0 100% /var <-------
/dev/sda3       1.8T  267G  1.6T  15% /data01


原因

ntpd reliies on /var/lib/ntp/drift/ntp.drift which contains the latest estimate of clock frequency error. If /var is at 100% then ntpd cannot update or create the ntp.drift file and ntp will not function correctly.

解决方案

On the affected node, investigate and resolve 100% usage on /var. Once corrected restart ntpd:
 
root@node03:~/#: systemctl restart ntpd
 
NOTE: A successful restart will not generate any output.

Check the status of ntpd:
 
root@node03:~/#: systemctl status ntpd

Results similar to the following should be observed:
 
root@node03:~/#: systemctl status ntpd
● ntpd.service - NTP Server Daemon
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2022-09-27 21:21:42 UTC; 37s ago
     Docs: man:ntpd(1)
  Process: 29442 ExecStart=/usr/sbin/start-ntpd start (code=exited, status=0/SUCCESS)
 Main PID: 29463 (ntpd)
    Tasks: 2
   CGroup: /system.slice/ntpd.service
           ├─29463 /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf
           └─29464 ntpd: asynchronous dns resolver

Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 3 bond0 10.241.169.52:123
Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 4 bond1 192.168.255.22:123
Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 5 lo [::1]:123
Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 6 bond0 [fe80::260:16ff:feaa:2a10%11]:123
Sep 27 21:21:42 node03 ntpd[29463]: Listen normally on 7 bond1 [fe80::260:16ff:fea9:b182%12]:123
Sep 27 21:21:42 node03 ntpd[29463]: Listening on routing socket on fd #24 for interface updates
Sep 27 21:21:42 node03 start-ntpd[29442]: Starting network time protocol daemon (NTPD)
Sep 27 21:21:42 node03 systemd[1]: Started NTP Server Daemon.

Verify ntp with ntpq:
 
root@node03:~/#: /usr/sbin/ntpq -pn

Results similar to the following should be observed:
 
root@node03:~/#: /usr/sbin/ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*10.241.216.209  10.233.131.242   2 u  966 1024  377    0.558    1.559   0.600
+192.168.255.21  10.241.216.209   3 u  401 1024  377    0.152    0.521   0.420

Test the resolution by running a manual checkpoint from the utility node.

受影响的产品

Avamar, Avamar Data Store, Avamar Data Store Gen3, Avamar Data Store Gen4, Avamar Data Store Gen4S, Avamar Data Store Gen4T, Avamar Data Store Gen5A, Avamar Server, Avamar Virtual Edition
文章属性
文章编号: 000203791
文章类型: Solution
上次修改时间: 18 7月 2023
版本:  3
从其他戴尔用户那里查找问题的答案
支持服务
检查您的设备是否在支持服务涵盖的范围内。