Avamar: How to determine if a grid is experiencing time synchronization (NTP) issues

摘要: How to determine whether an Avamar grid is experiencing a network time synchronization (NTP) issue.

本文章適用於 本文章不適用於 本文無關於任何特定產品。 本文未識別所有產品版本。

說明

Time synchronization between all nodes is essential for the healthy operation of an Avamar grid.

If nodes within an Avamar grid are not time synchronized, the following types of behavior can be expected:
  • The Avamar server is unable to start
  • Nodes go offline
  • Checkpoint validation (HFScheck) fails with MSG_ERR_CGSAN_FAILED
  • HFScheck fails with MSG_ERR_HFSCHECKERRORS
  • Checkpoints fail
  • Garbage Collection (GC) fails
  • Data consistency issues (if the time changes during garbage collection)
 
Examples of error messages commonly reported as a result of loss of time synchronization:
  • samconn::checkallsucceed request failed DPNTIMECHECK=230 
  • FATAL ERROR: <0001> dpn time mismatch: synchronize clocks and retry
  • ERROR: <0001> dpncheckmanager::verifyStartup cgsan died unexpectedly. terminating  
  • not enough valid responses received in time
 
Avamar experiences problems with NTP time synchronization for various reasons, for example:
  • Problems with the time synchronization (ntpd) server
  • Problems with the time synchronization client
  • Network problems
 

This article helps the reader determine if the Avamar grid is experiencing a time synchronization issue.

Resolving the issue is beyond the scope of this article.

Note: There are many external websites which cover NTP troubleshooting and the reader is encouraged to investigate them.
 
 

To proceed:

1. Log in to the Avamar Utility Node as admin using Avamar: How to Log in to an Avamar Server and Load Various Keys.

2. To determine if Avamar nodes are time synchronized, check the current time and date of each node on the Avamar grid:

mapall --all --parallel 'date'
 
Notes:
The utility node (0.x) is set to the local time zone, in this example 'BST' whereas the data nodes are set to the 'UTC' time zone. This is expected behavior.
The '--parallel' flag runs the command on each node simultaneously.

On a grid where time is synchronized, output similar to the following is seen:
 
Using /usr/local/avamar/var/probe.xml
(0.s) ssh  -x  admin@xx.xx.xx.xxx 'date'
(0.0) ssh  -x  admin@xx.xx.xx.xxx 'date'
(0.1) ssh  -x  admin@xx.xx.xx.xxx 'date'
(0.2) ssh  -x  admin@xx.xx.xx.xxx 'date'
Mon Jun 20 12:01:12 BST 2021
Mon Jun 20 11:01:12 UTC 2021
Mon Jun 20 11:01:12 UTC 2021
Mon Jun 20 11:01:12 UTC 2021
 

When all nodes report the same date and time this means that the time is fully synchronized between all the nodes on this grid.

3. To keep time synchronized on the nodes, Avamar uses Network Time Protocol (NTP). The Linux command "ntpq -pn" returns the state of time synchronization.

mapall --all --noerror '/usr/sbin/ntpq -pn'
 

Example of ntpq output from an Avamar with one utility node and three data nodes: 

Using /usr/local/avamar/var/probe.xml
(0.s) ssh -q  -x  -o GSSAPIAuthentication=no admin@10.x.x.1 '/usr/sbin/ntpq -pn'
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 128.x.x.254     .LOCL.           1 u   12   64  0      0.499  36000.1   3.601
*127.127.1.0     .LOCL.          10 l   58   64  377    0.000    0.000   0.000
(0.0) ssh -q  -x  -o GSSAPIAuthentication=no admin@10.x.x.2 '/usr/sbin/ntpq -pn'
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 128.x.x.254     .LOCL.           1 u 1081 1024  0      0.547  35982.5 104.192
*10.x.x.1        LOCAL(0)        11 u  534 1024  377    0.159   -0.006   0.012
(0.1) ssh -q  -x  -o GSSAPIAuthentication=no admin@10.x.x.3 '/usr/sbin/ntpq -pn'
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 128.x.x.254     .LOCL.           1 u  268 1024  0      0.623  36000.2 102.947
*10.x.x.1        LOCAL(0)        11 u  308 1024  377    0.135    0.000   0.022
+10.x.x.2        .STEP.          10 u  512  258  377    0.090    0.073   0.012
(0.2) ssh -q  -x  -o GSSAPIAuthentication=no admin@10.x.x.4 '/usr/sbin/ntpq -pn'
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 128.x.x.254     .LOCL.           1 u  608 1024  0      0.597  35992.8 102.098
*10.x.x.1        LOCAL(0)        11 u  466 1024  377    0.118    0.002   0.018
+10.x.x.2        .STEP.          10 u  488  251  377    0.090    0.071   0.012
 
Note: Removing the 'n' flag from the command below (ntpq -p), uses name resolution. Hostnames are shown instead of IP addressees. This can affects the readability of the output.
 
 
4. General Avamar Grid Observations:
  • All nodes are set to prefer 128.xxx.xxx.254 as the primary time source.
  • The secondary time source for all nodes is the local BIOS clock on Avamar Utility Node (node 0.s or 10.x.x.1).
  • The tertiary time source is set to the first storage node (node 0.0 or 10.x.x.2) which is itself referencing the Avamar Utility Node.
  • All nodes are synchronizing with the Avamar Utility Node.
    • The time server marked with an asterisk (*) is the node that the node is currently synchronizing with.
  • In this scenario, 128.xxx.xxx.254 is located remotely. 
    • As It has a 'reach' value of 0 (currently unreachable), it is useless as a time source.
  • 0.s and 0.0 both have a reachability register of octal 377.
    • This is the highest figure attainable. Therefore, the nodes are all synchronizing with the secondary source.
Note: The 'reach' field: A full discussion of reach-ability is beyond the scope of this article.
However, the 'reach' value is essentially a report on the status of the previous eight transactions between the NTP client and NTP server. A value of 377 means that the last eight transactions were all successful.
 
 

5. Reviewing the ntpq output for node 0.2:

0.2) ssh  -x  admin@10.x.x.164 '/usr/sbin/ntpq -p'

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 128.x.x.254     .LOCL.           1 u  608 1024  377    0.597  35992.8 102.098
*10.x.x.1        LOCAL(0)        11 u  466 1024  377    0.118    0.002   0.018
+10.x.x.2        .STEP.          10 u  488  251  377    0.090    0.071   0.012
 
From the above, it can be determined that:
  • Node 0.2 is polling the Utility Node (10.x.x.1) every 1024 seconds
  • Node 0.2 is synchronizing with the Utility node
  • The Utility Node is at stratum 11
  • The reachability register for the Utility Node is octal 377.
  • The clock on the Utility Node has a 0.002 milliseconds (or 2 microseconds) difference with the clock on Node 0.2.
  • The roundtrip delay to the Utility Node is 118 milliseconds.
  • The measurement of the variance in latency on the network (jitter) between node 0.2 and the Utility Node is 0.018 milliseconds (or 18 microseconds)
 

NTP configuration (/etc/ntp.conf):

  • If reviewing the /etc/ntp.conf file on node 0.2, it should correspond to the ntpq output above:
#Customer premises / external time servers.
#
server 128.x.x.254     
# - - - - -
# DPN time servers here and in the other module(s).
#
server 10.x.x.1
server 10.x.x.2

Primary time source: An external server located remote to the Avamar grid
Secondary time source: The utility node
Tertiary time source: Node 0.0

 
Logging:
  • NTP logging is directed to the /var/log/messages file.
  • To view NTP-related logging, grep the contents of /var/log/messages* for NTP
 
Resolving Time Synchronization Issues: 
  • If an Avamar experiences time synchronization issues, the problem must be fixed. Resolving time synchronization issues is beyond the scope of this article.
  • If an external time server is unreliable, as in the example given above, it is acceptable to use an internal time server. The internal time may drift slowly from UTC, but the most important consideration is that data nodes are time synchronized with one another.
  • The Avamar utility asktime tool can be used to select new, preferred time sources for NTP. See Avamar: How to configure NTP on an Avamar Server using asktime. 
 
Related articles:
 
Additional Information:
Windows Domain controllers should not be used for good time keeping: http://support.microsoft.com/kb/939322 (External Link)

受影響的產品

Avamar, Avamar Server
文章屬性
文章編號: 000162236
文章類型: How To
上次修改時間: 14 8月 2025
版本:  13
向其他 Dell 使用者尋求您問題的答案
支援服務
檢查您的裝置是否在支援服務的涵蓋範圍內。