Avamar: How to determine if a grid is experiencing time synchronization (NTP) issues
摘要: How to determine whether an Avamar grid is experiencing a network time synchronization (NTP) issue.
說明
Time synchronization between all nodes is essential for the healthy operation of an Avamar grid.
-
The Avamar server is unable to start
-
Nodes go offline
-
Checkpoint validation
(HFScheck)fails withMSG_ERR_CGSAN_FAILED -
HFScheckfails withMSG_ERR_HFSCHECKERRORS -
Checkpoints fail
-
Garbage Collection (GC) fails
-
Data consistency issues (if the time changes during garbage collection)
-
samconn::checkallsucceed request failed DPNTIMECHECK=230 -
FATAL ERROR: <0001> dpn time mismatch: synchronize clocks and retry -
ERROR: <0001> dpncheckmanager::verifyStartup cgsan died unexpectedly. terminating -
not enough valid responses received in time
-
Problems with the time synchronization (
ntpd) server -
Problems with the time synchronization client
-
Network problems
This article helps the reader determine if the Avamar grid is experiencing a time synchronization issue.
Resolving the issue is beyond the scope of this article.
To proceed:
1. Log in to the Avamar Utility Node as admin using Avamar: How to Log in to an Avamar Server and Load Various Keys.
2. To determine if Avamar nodes are time synchronized, check the current time and date of each node on the Avamar grid:
mapall --all --parallel 'date'
The utility node (0.x) is set to the local time zone, in this example '
BST' whereas the data nodes are set to the 'UTC' time zone. This is expected behavior.
The '
--parallel' flag runs the command on each node simultaneously.
On a grid where time is synchronized, output similar to the following is seen:
Using /usr/local/avamar/var/probe.xml
(0.s) ssh -x admin@xx.xx.xx.xxx 'date'
(0.0) ssh -x admin@xx.xx.xx.xxx 'date'
(0.1) ssh -x admin@xx.xx.xx.xxx 'date'
(0.2) ssh -x admin@xx.xx.xx.xxx 'date'
Mon Jun 20 12:01:12 BST 2021
Mon Jun 20 11:01:12 UTC 2021
Mon Jun 20 11:01:12 UTC 2021
Mon Jun 20 11:01:12 UTC 2021
When all nodes report the same date and time this means that the time is fully synchronized between all the nodes on this grid.
3. To keep time synchronized on the nodes, Avamar uses Network Time Protocol (NTP). The Linux command "ntpq -pn" returns the state of time synchronization.
mapall --all --noerror '/usr/sbin/ntpq -pn'
Example of ntpq output from an Avamar with one utility node and three data nodes:
Using /usr/local/avamar/var/probe.xml
(0.s) ssh -q -x -o GSSAPIAuthentication=no admin@10.x.x.1 '/usr/sbin/ntpq -pn'
remote refid st t when poll reach delay offset jitter
==============================================================================
128.x.x.254 .LOCL. 1 u 12 64 0 0.499 36000.1 3.601
*127.127.1.0 .LOCL. 10 l 58 64 377 0.000 0.000 0.000
(0.0) ssh -q -x -o GSSAPIAuthentication=no admin@10.x.x.2 '/usr/sbin/ntpq -pn'
remote refid st t when poll reach delay offset jitter
==============================================================================
128.x.x.254 .LOCL. 1 u 1081 1024 0 0.547 35982.5 104.192
*10.x.x.1 LOCAL(0) 11 u 534 1024 377 0.159 -0.006 0.012
(0.1) ssh -q -x -o GSSAPIAuthentication=no admin@10.x.x.3 '/usr/sbin/ntpq -pn'
remote refid st t when poll reach delay offset jitter
==============================================================================
128.x.x.254 .LOCL. 1 u 268 1024 0 0.623 36000.2 102.947
*10.x.x.1 LOCAL(0) 11 u 308 1024 377 0.135 0.000 0.022
+10.x.x.2 .STEP. 10 u 512 258 377 0.090 0.073 0.012
(0.2) ssh -q -x -o GSSAPIAuthentication=no admin@10.x.x.4 '/usr/sbin/ntpq -pn'
remote refid st t when poll reach delay offset jitter
==============================================================================
128.x.x.254 .LOCL. 1 u 608 1024 0 0.597 35992.8 102.098
*10.x.x.1 LOCAL(0) 11 u 466 1024 377 0.118 0.002 0.018
+10.x.x.2 .STEP. 10 u 488 251 377 0.090 0.071 0.012
-
All nodes are set to prefer 128.xxx.xxx.254 as the primary time source.
-
The secondary time source for all nodes is the local BIOS clock on Avamar Utility Node (node 0.s or 10.x.x.1).
-
The tertiary time source is set to the first storage node (node 0.0 or 10.x.x.2) which is itself referencing the Avamar Utility Node.
-
All nodes are synchronizing with the Avamar Utility Node.
- The time server marked with an asterisk (*) is the node that the node is currently synchronizing with.
-
In this scenario, 128.xxx.xxx.254 is located remotely.
- As It has a 'reach' value of 0 (currently unreachable), it is useless as a time source.
-
0.s and 0.0 both have a reachability register of octal 377.
-
This is the highest figure attainable. Therefore, the nodes are all synchronizing with the secondary source.
-
However, the 'reach' value is essentially a report on the status of the previous eight transactions between the NTP client and NTP server. A value of 377 means that the last eight transactions were all successful.
5. Reviewing the ntpq output for node 0.2:
0.2) ssh -x admin@10.x.x.164 '/usr/sbin/ntpq -p'
remote refid st t when poll reach delay offset jitter
==============================================================================
128.x.x.254 .LOCL. 1 u 608 1024 377 0.597 35992.8 102.098
*10.x.x.1 LOCAL(0) 11 u 466 1024 377 0.118 0.002 0.018
+10.x.x.2 .STEP. 10 u 488 251 377 0.090 0.071 0.012
- Node 0.2 is polling the Utility Node (10.x.x.1) every 1024 seconds
- Node 0.2 is synchronizing with the Utility node
- The Utility Node is at stratum 11
- The reachability register for the Utility Node is octal 377.
- The clock on the Utility Node has a 0.002 milliseconds (or 2 microseconds) difference with the clock on Node 0.2.
- The roundtrip delay to the Utility Node is 118 milliseconds.
- The measurement of the variance in latency on the network (jitter) between node 0.2 and the Utility Node is 0.018 milliseconds (or 18 microseconds)
NTP configuration (/etc/ntp.conf):
- If reviewing the
/etc/ntp.conffile on node 0.2, it should correspond to thentpqoutput above:
#Customer premises / external time servers.
#
server 128.x.x.254
# - - - - -
# DPN time servers here and in the other module(s).
#
server 10.x.x.1
server 10.x.x.2
Primary time source: An external server located remote to the Avamar grid
Secondary time source: The utility node
Tertiary time source: Node 0.0
-
NTP logging is directed to the /var/log/messages file.
-
To view NTP-related logging, grep the contents of /var/log/messages* for NTP
- If an Avamar experiences time synchronization issues, the problem must be fixed. Resolving time synchronization issues is beyond the scope of this article.
- If an external time server is unreliable, as in the example given above, it is acceptable to use an internal time server. The internal time may drift slowly from UTC, but the most important consideration is that data nodes are time synchronized with one another.
- The Avamar utility
asktimetool can be used to select new, preferred time sources for NTP. See Avamar: How to configure NTP on an Avamar Server using asktime.