Avamar: Troubleshooting Avamar "asktime" and related NTP Issues

Summary: This solution is intended to provide additional information about the use and configuration of the Network Time Protocol (NTP) used by Avamar servers.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

This solution is intended to provide additional information about the use and configuration of the Network Time Protocol (NTP) used by Avamar grids.

It also provides further details and troubleshooting steps for problems arising during NTP configuration using the asktime utility.

It covers the steps required to configure NTP for node additions or extra servers (such as Network Data Management Protocol (NDMP) accelerators).

Cause

Avamar uses NTP to maintain time synchronization against an external time source and across all Avamar Data Store (ADS) nodes.

The ADS software bundle contains a utility asktime which is used to configure NTP. This can be a part of the setup process, or manually if required. 

The purpose of this article is to provide additional information and tips for troubleshooting asktime and NTP-related issues.

Resolution

Notes: 
  • All commands listed in this article should be run from the Avamar Utility Node, as admin, with ssh keys loaded, unless stated. For more information about keys, see Avamar: How to Log in to an Avamar Server and Load Various Keys.
  • Some output in this article is deliberately trimmed for brevity, in particular output of repetitive mapall commands.
  • Any names, IP addresses referenced are, for example, only. These should be replaced with customer-specific details.
 

Section #1 - Basic NTP functionality:

1. The basic schematic of NTP functionality within Avamar is as follows:
  • All nodes (utility and storage) should poll one or more user or public NTP servers.
  • All storage (data) nodes should poll the same user servers. In addition, the storage nodes poll the utility node (0.s) and the first storage node (0.0).
  • The utility node runs in local time. All storage nodes run in UTC.

2. The intention here is that all nodes can maintain time independently. (In an external NTP server is unavailable the nodes continue to be able to maintain synchronization by using 0.s and 0.0.)

 

Section #2 - Additional UNIX utilities for checking NTP functionality:

1. Additional UNIX utilities can be used to verify the status of the Network Time Protocol Daemon (NTPD) which runs on nodes to maintain the system time.
  • service ntpd status/stop/start - verify that the Network Time Protocol daemon (NTPD) is running/stop and start as needed.
  • date - show current system date and time.
  • ntpdate - used to poll a remote NTP server and if required set the local system clock.
  • ntpq - used to view current NTP connectivity.  
 
 

Section #3 - Additional Avamar utilities for checking NTP functionality:

Using the "check.dpn --preinstall --checktime" script to automate running NTPD across all or specified nodes in order to verify that the node or nodes are running NTPD properly and has a time server selected.
This functionality is also used by both the installer and the GSAN at startup and as such is a key indicator that NTPD is working as required.

These commands, especially when used with mapall commands should be sufficient to debug most NTP-related errors.

 

Section #4 - Troubleshooting NTP issues:

1. NTP issues typically occur immediately when performing new installs, as a result of a bad timeserver address provided by the customer, or due to firewall or routing issues.  

2. NTP issues also arise when (after working properly for some time) changes are made to the customer network, an NTP server is removed and so on. Over time, this begins to affect the Avamar grid.  

3. To minimize the risk of issues during the install, use the ntpdate program with the debug option (-d) to verify one or more assigned time servers are available and servicing requests.

See APPENDIX A for complete sample output.

In the example below, the handshake between the Avamar node and the time server can be seen. In this instance, the time server reporting a small offset of 0.000006 sec:  

ntpdate -d 168.xxx.xx.x
offset 0.000006
29 Dec 15:47:15 ntpdate[10500]: adjust time server 168.xxx.xx.x offset 0.000006 sec 
 

Compare this with a timeserver which is unavailable.

ntpdate -d 168.xxx.xx.x
offset 0.000000
29 Dec 15:49:13 ntpdate[10699]: no server suitable for synchronization found 
 

In this example, it is clear that the time server is unavailable, and if asktime was to be run against this timeserver it would never be seen to synchronize. 

In this case work with the customer to verify that the assigned addresses are correct and that the NTP port (UDP 123) is not blocked by a firewall.

4. Do not rely on a simple ping test to verify the timeserver. The ping could be blocked by a Firewall with NTP unblocked, or vice versa. Ntpdate is effectively a replacement for ping when working with NTP in order to verify connectivity.

5. All nodes must be able to communicate with the external time servers, and this can be verified using the Avamar mapall command:
(Assuming that Avamar is installed and that the probe.xml file properly configured.)

mapall --all --user=root ntpdate -d 168.xxx.xx.x
 

6. Review the output and verify that all nodes can communicate with the time server per the examples above.

If all nodes are communicating as necessary, then run ntpdate without the "-d" flag to actually update the system time (assuming NTPD is not already running):

mapall --all --user=root ntpdate 168.xxx.xx.x
Using /usr/local/avamar/var/probe.xml
(0.s) ssh  -x  root@10.x.xxx.xxx 'ntpdate 168.xxx.xx.x'
29 Dec 17:40:41 ntpdate[23552]: adjust time server 168.xxx.xx.x offset 0.014792 sec
(0.0) ssh  -x  root@10.x.xxx.xxx 'ntpdate 168.xxx.xx.x'
30 Dec 01:40:42 ntpdate[18131]: adjust time server 168.xxx.xx.x offset 0.029407 sec
(0.1) ssh  -x  root@10.x.xxx.xxx 'ntpdate 168.xxx.xx.x'
30 Dec 01:40:43 ntpdate[16250]: adjust time server 168.xxx.xx.x offset 0.000689 sec
 

Note: Restarting the NTPD service (service NTPD restart) calls ntpdate to perform similar steps. However when the service is restarted from within asktime there is no immediate indicator that the ntpdate command has succeeded or failed. Ideally asktime should be run separately to verify that the connectivity is good.

 
 

Section #5 - NTP server selection:

The NTP services on a customer site are likely managed by the network team, whereas the backups (Avamar) are more likely managed by a server team. 

The server team may not know about NTP servers, or any Firewall changes required to allow connectivity, or change request requirements and so on.

For this reason, plan ahead of the installation and ensure that the customer knows what is required in advance.

If there are no obvious NTP servers available:
  • Many routers are configured to run NTP, so try, for instance, the default gateway IP. If it responds, then ensure that it does have the correct time!
  • Windows Active Directory (AD) servers run NTP by default. However these can be running on Virtual Machines which are often unreliable. If ntpdate reports a good connection but NTPD is unable to establish a good synchronization with the target, work with the customer to find a good local time source.
  • If a time source can be located always confirm with the customer that it is appropriate before configuring Avamar to use it.
  • Where possible, try to locate NTP servers within the local campus where possible to minimize issues caused by slow or high latency connections.
  • Where possible try to use multiple NTP servers. The more servers that are available, the better NTP can work to compare the various servers and establish the most accurate time.

Note: Avamar can work with no time servers: The data nodes synchronize with the Avamar Utility Node, however the entire grid will eventually suffer drift over a period of time.

Note: check.dpn (and hence the install and GSAN startup) warns if there are fewer than three time serves selected. This is only a warning rather than an error but do attempt to configure multiple servers where possible.

 
 

Section #6 - Further troubleshooting:

Occasionally, despite ntpdate initially showing a good timeserver handshake, NTPD consistently fails to use the timeserver as an authoritative time source. This can be verified by the output of ntpq (details below):

Frequently the output of ntpdate can be used to demonstrate this to the customer, for instance:
  • Rerunning "ntpdate -d <timeserver>" shows a fluctuating offset value indicating that the time offered is inconsistent (see the previous comment about NTP running on Virtual Machines (VMs)) or that a high network latency is causing issues.
  • Running ntpdate against a couple of different NTP servers consistently shows that they are reporting a different time; this may only be by a matter of seconds but can be good enough that NTPD rejects both servers as valid candidates.
  • Identify good local timeservers. If this occurs during a build and is impacting the schedule, then consider building without any time servers (the system reverts to the utility node as a time source) and reconfigure the time servers later on using asktime once they can be correctly validated.     
 

Section #7 - Using ntpq to inspect current NTPD status:

Note:  A full explanation of the ntpq output data is beyond the scope of this article however can be referenced here: http://doc.ntp.org (external link)

The ntpq utility can be used to view current NTPD configuration and clock selection once asktime has been run, either manually or as a part of the build process.  

A typical output is as follows:  

ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+d-host.company.com 168.xxx.xx.xx    3 u   59   64  377   78.917   -6.205   8.690
+e-host.company.com 168.xxx.xx.xx    3 u   54   64  377   77.521   -4.340   8.744
+f-host.company.com 168.xxx.xx.xx    3 u   58   64  377   78.063   -1.381  10.317
+g-host.company.com 168.xxx.xx.xx    3 u   49   64  377   77.723   -6.972   8.570
*h-host.company.com 128.xx.xx.xx     2 u   49   64  377   77.003   -7.736   8.511
+i-host.company.com 130.xxx.xxx.xxx  2 u   42   64  377   78.341   -1.701   9.984
 j-host.company.com .INIT.          16 u    -  256    0    0.000    0.000 4000.00
 LOCAL(0)        LOCAL(0)         8 l   51   64  377    0.000    0.000   0.001

If ntpq is initially slow to respond this may be because it is trying to resolve names of time servers against a badly configured Domain Name System (DNS) configuration. 

If so, run ntpq with the -n flag to skip name lookups. However, try to also establish why DNS is not resolving names and fix up accordingly:  

ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+128.xxx.xx.xx   168.xxx.xx.xx    3 u   63   64  377   78.917   -6.205   8.690
+128.xxx.xx.xx   168.xxx.xx.xx    3 u   58   64  377   77.521   -4.340   8.744
+10.xxx.xxx.xx   168.xxx.xx.x     3 u   62   64  377   78.063   -1.381  10.317
+10.xxx.xxx.xx   168.xxx.xx.xx    3 u   53   64  377   77.723   -6.972   8.570
*168.xxx.xx.x    128.xx.xx.xx     2 u   53   64  377   77.003   -7.736   8.511
+168.xxx.xx.xx   130.xxx.xxx.xxx  2 u   46   64  377   78.341   -1.701   9.984
 168.xxx.xx.x    .INIT.          16 u    -  256    0    0.000    0.000 4000.00
 127.xxx.x.x     LOCAL(0)         8 l   55   64  377    0.000    0.000   0.001
 

A full discussion of the ntpq output is beyond the scope of this document however ntpq, and the other NTP programs are well documented at http://doc.ntp.org  (external link).

The key to Avamar working is the selection of a good time server, and this is indicated by the asterisk in the leftmost column. This is i-host.company.com or 168.xxx.xx.x in the examples above.

The previous example is for a utility node. The following example shows a storage node which is also attempting to use the utility node (0.s) and first storage node (0.0) as time sources:  

ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+d-host.company.com 168.xxx.xx.xx    3 u   36  128  377   78.403   15.627  26.941
+e-host.company.com 168.xxx.xx.xx    3 u   82  128  377   77.740   10.448  23.707
+f-host.company.com 168.xxx.xx.x     3 u   89  128  377   77.982  -16.786  18.895
+g-host.company.com 168.xxx.xx.xx    3 u   40  128  377   78.565    3.230  16.925
+h-host.company.com 10.xxx.x.xx      2 u   96  128  377   78.082    0.369  17.982
*i-host.company.com 128.xx.xx.xx     2 u   35  128  377   77.954   16.410  26.429
 j-host.company.com .INIT.          16 u    -  256    0    0.000    0.000 4000.00
+utility.company.com 168.xxx.xx.x    3 u   34  128  377    0.226   -1.589  15.290
+sn1.company.com  168.xxx.xx.xx    3 u   97  128  377    0.214   -6.072  31.263
 
From this output:  
      • There is no way to predetermine in this configuration which time server is selected as the authoritative time source (shown in the ntpq -p output in the left column). The main objective is to ensure that all defined time servers are correctly selected and available. The ntpd service is responsible for deciding which time servers are to be used.
      • j-host.company.com is not servicing time requests. This is evident by there being no character in the left column, and also the various other states being at their startup defaults; a connection has never been made in order to start the time adjustment process. This is the kind of issue that testing with ntpdate helps to determine beforehand.
      • In the output from the utility node (utility.company.com) and storage node 0.0 (sn.company.com) are also present in the list of valid servers as described earlier.  

Note:  The Utility (0.s) and first storage (0.0) nodes will not offer time requests until they are fully stabilized, so these servers may be marked as INIT for some time after the initial configuration as they will not respond to requests until they themselves are correctly synchronized.

 

Section #8 - Configuring time on additional nodes:

Node Adds: 

Configuring time services on additional storage nodes to be added to a grid is well documented in the Procedure Generator "Capacity Upgrade Procedures" section. 

 Other nodes (accelerator and so forth):

It is imperative that the time is correctly configured on additional nodes such as accelerators in order that backup timestamps are correctly recorded. This is a manual process:

a. As root, copy /etc/ntp.conf from an existing storage node (other than 0.0) to the accelerator. 

This gives the node detail of the external timeservers and the key (0.0 and 0.s) grid timeservers.

b. As root, edit /etc/ntp.conf on both the utility node (0.s) and the first storage node (0.0).
(The ntp.conf file defines runtime parameters for NTP.)  

The new node IP must be added into the access control list on nodes 0.s and 0.0 to allow them to respond to requests from the accelerator:   

# - - - - -
# Inpidual DPN node restrictions - they can listen, but they can't
# change us, except as above.
#
restrict 10.x.xxx.xxx    nomodify
restrict 10.x.xxx.xxx    nomodify
restrict 10.x.xxx.xxx    nomodify
restrict <Accelerator server IP>    nomodify 
 

c. Once the accelerator node is added, restart the ntpd service on both the utility node (0.s) and node 0.0 in order to re-read the configuration file:  

mapall --nodes=0.0,0.s --user=root service ntpd restart 
 

d. On the new node, as root, change the service configuration to automatically run ntpd at boot-up:  

chkconfig --level 35 ntpd on 
 

e. Start ntpd on the new node:

service ntpd start
ntpd: Synchronizing with time server:           [  OK  ]
Starting ntpd:                                  [  OK  ] 
 

f. The accelerator node reports in local time. The time zone is controlled by a file /etc/localtime. This is what gets modified by asktime when setting the time zone. The simplest way to set it is to copy it straight from the utility node:

scp /etc/localtime root@accelerator:/etc/localtime
 

g. Use the date command on the new accelerator node to verify that the correct time and time zone are being reported.

 

Section #9 - Timesync issues during normal operations:

NTP is very reliable unless there is a change in network configuration, time server IPs are modified, and so on. 

As Avamar syncs its nodes to the utility and first storage nodes (0.0 and 0.s), changes can be made and not actually become a problem for some time.

If nodes are misconfigured, they may not be able to synchronize with the utility node or first storage node and will eventually fall out of sync with the other data nodes.

The grid (GSAN) checks that time synchronization is appropriate during the startup of each maintenance activity. If the time discrepancy is different by more than two seconds across any node, then the activity fails. A message similar to the following can be seen in the err.log file on one or more nodes:  

2010/12/30-02:23:10.57646 {0.3} [cpman:3411]  WARN: <0980> samconn::dpntimecheck retrying dpn time check mytime=1293675790
2010/12/30-02:23:10.57712 {0.3} [cpman:3411]  WARN: <0980> samconn::dpntimecheck retrying dpn time check mytime=1293675790
2010/12/30-02:23:10.57782 {0.3} [cpman:3411]  ERROR: <0001> samconn::dpntimecheck time mismatch: synchronize clocks and retry
 

Or this from status.dpn:

Checkpoint failed with result MSG_ERR_BADTIMESYNC : cp.20101229150030 started Wed Dec 29 07:00:30
 

To resolve this issue, review the ntpq output as described above to determine which nodes are not able to synchronize and why. Work with the customer to see if there has been a recent network change which has caused this.

One common cause of this is asktime having been run incorrectly, in that only new nodes were selected for modification; asktime correctly configures those nodes but does not update the ntp.conf access control lists on 0.s and 0.0 to add in the IPs of the new nodes. In addition, it does not restart NTPD on these nodes required to re-read the ntp.conf file. What this means in turn is that the new nodes will never synchronize time with the grid. 

If an external time server is specified, then they should sync time with that server (as should the grid) so both the grid and the new nodes will appear to have an authoritative time server. However, the additional nodes cannot sync to 0.0 and 0.s and so if the external time server becomes unavailable then they will eventually fall out of sync and fail.

Additional Information

Additional Notes: 
 
 

APPENDIX A: 

Sample output for ntpdate command showing a good response:

ntpdate -d 168.xxx.xx.x
29 Dec 15:47:14 ntpdate[10500]: ntpdate 4.2.0a@1.1190-r Thu Oct  5 04:11:33 EDT 2006 (1)
Looking for host 168.xxx.xx.x and service ntp
host found : h-host.company.com
transmit(168.xxx.xx.x)
receive(168.xxx.xx.x)
transmit(168.xxx.xx.x)
receive(168.xxx.xx.x)
transmit(168.xxx.xx.x)
receive(168.xxx.xx.x)
transmit(168.xxx.xx.x)
receive(168.xxx.xx.x)
transmit(168.xxx.xx.x)
server 168.xxx.xx.x, port 123
stratum 2, precision -17, leap 00, trust 000
refid [168.xxx.xx.x], delay 0.10268, dispersion 0.00024
transmitted 4, in filter 4
reference time:    d0c646c4.f9ac7000  Wed, Dec 29 2010 15:46:12.975
originate timestamp: d0c64702.fb7f0000  Wed, Dec 29 2010 15:47:14.982
transmit timestamp:  d0c64702.f192b7fe  Wed, Dec 29 2010 15:47:14.943
filter delay:  0.10661  0.10268  0.10274  0.10298
         0.00000  0.00000  0.00000  0.00000
filter offset: -0.00184 0.000006 0.000053 -0.00000
         0.000000 0.000000 0.000000 0.000000
delay 0.10268, dispersion 0.00024
offset 0.000006
29 Dec 15:47:15 ntpdate[10500]: adjust time server 168.xxx.xx.x offset 0.000006 sec 
 

 

Sample output for ntpdate command showing a poor response:

ntpdate -d 168.xxx.xx.x
29 Dec 15:49:09 ntpdate[10699]: ntpdate 4.2.0a@1.1190-r Thu Oct  5 04:11:33 EDT 2006 (1)
Looking for host 168.xxx.xx.x and service ntp
host found : j-host.company.com
transmit(168.xxx.xx.x)
transmit(168.xxx.xx.x)
transmit(168.xxx.xx.x)
transmit(168.xxx.xx.x)
transmit(168.xxx.xx.x)
168.xxx.xx.x: Server dropped: no data
server 168.xxx.xx.x, port 123
stratum 0, precision 0, leap 00, trust 000
refid [168.xxx.xx.x], delay 0.00000, dispersion 64.00000
transmitted 4, in filter 4
reference time:    00000000.00000000  Wed, Feb  6 2036 22:28:16.000
originate timestamp: 00000000.00000000  Wed, Feb  6 2036 22:28:16.000
transmit timestamp:  d0c64778.b4da6e75  Wed, Dec 29 2010 15:49:12.706
filter delay:  0.00000  0.00000  0.00000  0.00000
         0.00000  0.00000  0.00000  0.00000
filter offset: 0.000000 0.000000 0.000000 0.000000
         0.000000 0.000000 0.000000 0.000000
delay 0.00000, dispersion 64.00000
offset 0.000000
29 Dec 15:49:13 ntpdate[10699]: no server suitable for synchronization found

Affected Products

Avamar, Avamar Server
Article Properties
Article Number: 000163671
Article Type: Solution
Last Modified: 24 Sep 2025
Version:  10
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.