ECS: xDoctor: RAP081: Symptom Code: 2048: All NTP servers are NOT suitable for synchronization

Summary: xDoctor detected a Network Time Protocol (NTP) daemon issue.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

All nodes in an ECS rack should have the NTP daemon running, and the configured NTP servers should be capable of synchronizing time. If not, this may lead to problems with frontend data ingestion.

Symptom

Message

NTP_NOT_SUITABLE_ERROR

Message = All NTP servers are NOT suitable for synchronization.
Extra = [List of nodes]

Cause

The above symptoms remain as a WARNING if it does not occur within 24 hours.
After 24 hours, if this persists the severity will then be increased to an ERROR, and a RAP081 is reported.

Resolution

This means that on each node listed in the 'Extra' field cannot synchronize with the NTP Server.

Verification:
1. Get the list of NTP Servers on each of the listed nodes:

Command:

# getrackinfo -r | grep NTP

Example:

admin@node1:~> getrackinfo -r | grep NTP
        NTPServer =  xxx.xxx.xxx.xxx

2. For each NTP Server listed in step 1, test if it is capable of synchronizing time.

Command:

# sudo ntpdate -p 2 -d <NTP IP Address / NTP FQDN>

Or

# sudo ntpdate -p 2 -d `getrackinfo -r | grep NTP |grep -oP "(?:[0-9]{1,3}\.){3}[0-9]{1,3}"`

Example (capable of synchronizing time):

admin@node1:~> sudo ntpdate -p 2 -d xxx.xxx.xxx.xxx
22 Feb 13:47:48 ntpdate[110901]: ntpdate 4.2.8p11@1.3728-o Thu Jun 14 09:26:52 UTC 2018 (1)
Looking for host <NTP IP Address> and service ntp
<NTP IP Address> reversed to <NTP hostname>
host found : <NTP hostname>
transmit(<NTP IP Address>)
receive(<NTP IP Address>)
transmit(<NTP IP Address>)
receive(<NTP IP Address>)
server <NTP IP Address>, port 123
stratum 2, precision -24, leap 00, trust 000
refid [<NTP IP Address>], delay 0.02615, dispersion 0.00003
transmitted 2, in filter 2
reference time:    e01a7b0d.af9e6616  Fri, Feb 22 2019 13:43:41.686
originate timestamp: e01a7c06.748e0c65  Fri, Feb 22 2019 13:47:50.455
transmit timestamp:  e01a7c06.7478b000  Fri, Feb 22 2019 13:47:50.454
filter delay:  0.02635  0.02615  0.00000  0.00000
         0.00000  0.00000  0.00000  0.00000
filter offset: 0.000043 -0.00002 0.000000 0.000000
         0.000000 0.000000 0.000000 0.000000
delay 0.02615, dispersion 0.00003
offset -0.000022

22 Feb 13:47:50 ntpdate[110901]: adjust time server <NTP IP address> offset -0.000022 sec

Example: (If it is not capable of syncing time it outputs)

admin@node1:~> sudo ntpdate -p 2 -d xxx.xxx.xxx.xxx
22 Feb 13:47:48 ntpdate[110901]: ntpdate 4.2.8p11@1.3728-o Thu Jun 14 09:26:52 UTC 2018 (1)
Looking for host <NTP IP Address> and service ntp
<NTP IP Address> reversed to <NTP hostname>
host found : <NTP hostname>
transmit(<NTP IP Address>)
transmit(<NTP IP Address>)
transmit(<NTP IP Address>)

server <NTP IP Address>, port 123
stratum 2, precision -24, leap 00, trust 000
refid [<NTP IP Address>], delay 0.02615, dispersion 0.00003
transmitted 2, in filter 2
reference time:    e01a7b0d.af9e6616  Fri, Feb 22 2019 13:43:41.686
originate timestamp: e01a7c06.748e0c65  Fri, Feb 22 2019 13:47:50.455
transmit timestamp:  e01a7c06.7478b000  Fri, Feb 22 2019 13:47:50.454
filter delay:  0.02635  0.02615  0.00000  0.00000
         0.00000  0.00000  0.00000  0.00000
filter offset: 0.000043 -0.00002 0.000000 0.000000
         0.000000 0.000000 0.000000 0.000000
delay 0.02615, dispersion 0.00003
offset -0.000022

22 Feb 13:47:50 ntpdate[112232]: no server suitable for synchronization found

3. Add the FQDN to the NTP section in the getrackinfo -r result.

Command:

# sudo setrackinfo -a NTPServer < NTP FQDN >

4. Check for network separation and static routes, as NTP sent from the management interface over Policy-Based Routing could cause the problem.

Command:

# getrackinfo -n;getrackinfo -t

Example:

admin@node1:~>getrackinfo -n;getrackinfo -t
Named networks
==============
Node ID       Network          Ip Address        Netmask            Gateway            VLAN               Interface
Static route list
=================
Node ID      Network            Netmask           Gateway           Interface

5. Confirm if NTP servers are listening in their environment and is often a firewall blocking the port. 

Command:

# sudo ntpq -c as

Example: (Below we see one NTP server that is not reachable and the other is blocking likely due to an ACL)

admin@node1:~> sudo ntpq -c as
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 56633  8011   yes    no  none    reject    mobilize  1

6. Check if there is any date drift in NTP. 

Command:

# viprexec "date +%s" 2>&1 | grep "^15"

Example:

admin@node1:~>viprexec "date +%s" 2>&1 | grep "^15"
1554470147
1554470111
1554470096
1554470142
1554470144
1554470109
1554470124
1554470140

7. Check for the ntpd service status and then restart the service. (Even if the status is up and running, proceed with the restart.) 
Note: The ntpd.service is a non-impact service.

Command:

# viprexec systemctl status ntpd.service | grep Active:

Example:

admin@node1:~> viprexec systemctl status ntpd.service | grep Active:
   Active: active (running) since Tue 2019-08-06 02:49:06 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Wed 2019-08-07 20:13:27 UTC; 58min ago
   Active: active (running) since Tue 2019-08-06 02:49:06 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago

Command: 

# viprexec systemctl restart ntpd.service

Example:

admin@node1:~> viprexec systemctl restart ntpd.service
Output from host : 192.168.219.8
Output from host : 192.168.219.7
Output from host : 192.168.219.6
Output from host : 192.168.219.4
Output from host : 192.168.219.3
Output from host : 192.168.219.2
Output from host : 192.168.219.5
Output from host : 192.168.219.1

8. Verify the md5sum ntp.conf file on all the nodes.

Command:

# viprexec "sudo md5sum /etc/ntp.conf"

Example:

admin@node1:~> viprexec "sudo md5sum /etc/ntp.conf"

Output from host : 192.168.219.2
741f0abb12ac82a21f150004bd407334  /etc/ntp.conf

Output from host : 192.168.219.5
741f0abb12ac82a21f150004bd407334  /etc/ntp.conf

Output from host : 192.168.219.4
741f0abb12ac82a21f150004bd407334  /etc/ntp.conf

Output from host : 192.168.219.1
7da6eb8009abc18ed1875f1f15ade72a  /etc/ntp.conf

Output from host : 192.168.219.3
741f0abb12ac82a21f150004bd407334  /etc/ntp.conf

Output from host : 192.168.219.8
741f0abb12ac82a21f150004bd407334  /etc/ntp.conf

Output from host : 192.168.219.6
741f0abb12ac82a21f150004bd407334  /etc/ntp.conf

Output from host : 192.168.219.7
741f0abb12ac82a21f150004bd407334  /etc/ntp.conf

Note: This maybe due to having a public and management interfaces and the nodes are all configured to go out of public per the last configuration provided. On older versions of ECS PBR can be stuck where one node is valid and the rest of the nodes seemed to be behind a firewall.

9. Add 123 to ns_mgmt in getrackinfo -r result and then check if the NTP has started transmitting and receiving.

Command:

# sudo setrackinfo -a ns_mgmt 123

Example:

admin@node1:~>sudo setrackinfo -a ns_mgmt 123

Should the error still persists place the port 123 back to the public interface and again check for the synchronization.

Command:

# sudo setrackinfo -d ns_mgmt 123

Example:

admin@node1:~> sudo setrackinfo -d ns_mgmt 123

Check the status of the NTP synchronization after performing each of the above steps.

Resolution:
This means that the server as configured is not an NTP server or that it is not functioning as expected. The customer's network team must be engaged to resolve the NTP issue.

Additional Information

If the above resolution does not work, the customer's network team must be engaged to resolve the NTP issue.

For symptom 'NTP daemon not running' (NTPD_NOT_RUNNING), see knowledge article:
ECS: xDoctor: RAP081: Symptom Code: 2048: NTP daemon not running

For symptom 'All NTP servers adjust an offset higher than the error threshold' (NTP_ERROR_OFFSET_ERROR), see knowledge article:
ECS: xDoctor: RAP081: Symptom Code: 2048: All NTP servers adjust an offset higher than the error threshold

For symptom 'System time difference above ERROR Threshold', see knowledge article:
ECS: xDoctor: RAP081: Symptom Code: 2048: System time difference above ERROR threshold

Affected Products

ECS

Products

ECS Appliance, ECS Appliance Gen 1, ECS Appliance Gen 2, ECS Appliance Gen 3, ECS Software
Article Properties
Article Number: 000230633
Article Type: Solution
Last Modified: 03 Oct 2024
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.