ECS: xDoctor: RAP081: Symptom Code: 2048: NTP daemon not running
Summary: xDoctor detected a Network Time Protocol (NTP) daemon issue.
Symptoms
All nodes in an ECS rack should have the NTP daemon running, and the configured NTP servers should be capable of synchronizing time. If not, this may lead to problems with frontend data ingestion. NTP best practices state to use at least four and no more than seven NTP servers.
|
Symptom |
Message |
|---|---|
|
NTPD_NOT_RUNNING |
Message = NTP daemon not running |
These symptoms mean that ntpd is not running on each node listed in the 'Extra' field.
Cause
The above symptom remains as a WARNING for 24 hours if the issue does not recur within that time. If the issue persists after 24 hours, the severity is increased to an ERROR, and RAP081 is reported.
Resolution
To set up the automation repository with xDoctor 4-8.104.0 and later versions, follow ECS: ObjectScale: How to run KB Automation Scripts (Auto Pilot).
- Run the automation command from a rack control node with xDoctor 4-8.104.0 or later.
Only --target-rack is supported for this action. This script was in the xDoctor 4-8.104.0 release, which features version 3.0.0 of the Ansible automation scripts.
# sudo xdoctor autopilot --kb 64221 --target-rack redExample:
admin@ecs-n1:~> sudo xdoctor autopilot --kb 64221 --target-rack red Checking for existing screen sessions... Starting screen session 'autopilot_kb_64221_20250630_162310'... Screen session 'autopilot_kb_64221_20250630_162310' started successfully. Attaching to screen session 'autopilot_kb_64221_20250630_162310'...
2. The automation may take several minutes to complete. The automation may report transient 'failures' in the rolling output which can be safely ignored.
3. The final summary includes relevant status information, restarts, or changes that occurred as part of the automation. It also includes recommendations for known conditions.
Clean Example:
TASK [Summary Dump] *****************************************************************************************************************************************************************************************************************
ok: [169.254.1.1] => {
"formatted_summary": [
"| ========================== NTP SUMMARY ============================= |",
"| |",
"| NTP addresses: 10.174.xxx.52 10.18.yyy.52 10.104.zz.52 10.34.ww.52 |",
"| Management Network Separation: False |",
"| NTP checks results: |",
"| - No issue with 10.174.xxx.52 |",
"| - No issue with 10.18.yyy.52 |",
"| - No issue with 10.104.zz.52 |",
"| - No issue with 10.34.ww.52 |",
"| General system time: 1751300610 (epoch): Mon Jun 30 16:23:30 UTC 2025 |",
"| |",
"| Current Times: |",
"| 169.254.1.1--> date: 06/30/25 16:23:34 hwClock: 06/30/25 16:23:34 |",
"| 169.254.1.2--> date: 06/30/25 16:23:34 hwClock: 06/30/25 16:23:34 |",
"| 169.254.1.3--> date: 06/30/25 16:23:34 hwClock: 06/30/25 16:23:34 |",
"| 169.254.1.4--> date: 06/30/25 16:23:34 hwClock: 06/30/25 16:23:34 |",
"| 169.254.1.5--> date: 06/30/25 16:23:34 hwClock: 06/30/25 16:23:34 |"
]
}
PLAY RECAP **************************************************************************************************************************************************************************************************************************
169.254.1.1 : ok=66 changed=12 unreachable=0 failed=0 skipped=64 rescued=1 ignored=1
169.254.1.2 : ok=34 changed=8 unreachable=0 failed=0 skipped=35 rescued=1 ignored=0
169.254.1.3 : ok=34 changed=8 unreachable=0 failed=0 skipped=35 rescued=1 ignored=0
169.254.1.4 : ok=34 changed=8 unreachable=0 failed=0 skipped=35 rescued=1 ignored=0
169.254.1.5 : ok=34 changed=8 unreachable=0 failed=0 skipped=35 rescued=1 ignored=0
=====================================================================================================================================================================================================================================
Status: PASS
Time Elapsed: 0h 0m 24s
Debug log: /tmp/autopilot/log/autopilot_64221_20250630_162310.log
Message: SysTime Collected: 1751300610 ; OS date (epoch): 1751300610 ; hwclock (epoch): 1751300610
=====================================================================================================================================================================================================================================
Restart Example:
TASK [Summary Dump] *********************************************************************************************************************************************************************
ok: [169.254.1.7] => {
"formatted_summary": [
"| ========================== NTP SUMMARY ============================= |",
"| |",
"| NTP addresses: 10.xxx.yyy.52 10.xx.yy.52 |",
"| Management Network Separation: False |",
"| General system time: 1731540353 (epoch) |",
"| ntpd was restarted on 169.254.1.1 |",
"| Node: 169.254.1.1 | AssID: 11417 | NTP Addr: 10.xx.yy.52 | Status Code: 9014 | |",
"| ntpd was restarted on 169.254.1.2 |",
"| Node: 169.254.1.2 | AssID: 35745 | NTP Addr: 10.xx.yy.52 | Status Code: 9014 | |",
"| ntpd was restarted on 169.254.1.4 |",
"| Node: 169.254.1.4 | AssID: 19898 | NTP Addr: 10.xx.yy.52 | Status Code: 9014 | |",
"| |",
"| == RECOMMENDATIONS KEY == |",
"| Network Issue: Have customer check routes (ipv4+ipv6) to the NTP through all fw, switches, and VLANs |",
"| Auth Issue: Authentication should not be required. Customer will need to resolve |",
"| Config Issue: Consult with SWARM / CE |",
"| Mgmt Route Missing: Reset mgmt separation from setrackinfo OR consult SWARM |",
"| Port 123 on ns_mgmt: Confirm with customer that port config is intended |",
"| SUGGESTION: If any NTP is unsuitable and can be safely removed/replcaced, follow kb 19614 |"
]
}
After a ntpd restart, one or more of the NTP servers may remain in a 'rejected' status while NTP's peer associations settle.
If any errors or conditions are reported in the summary, or symptoms that need additional explanation, engage ECS and OBS support to help with investigating.
Manual steps to identify and address ntpd services not running on nodes:
Verification:
- Confirm that the NTP service is running:
Command:
# sudo service ntpd status
Example:
admin@ecsnode1:~> sudo service ntpd status * ntpd.service - NTP Server Daemon Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled) Drop-In: /run/systemd/generator/ntpd.service.d `-50-insserv.conf-$time.conf Active: inactive (dead) since Wed 2019-08-07 20:00:00 UTC; 3s ago Docs: man:ntpd(1) Main PID: 63810 (code=exited, status=0/SUCCESS)
Aug 07 19:25:49 ecsnode1.gslabs.lab.emc.com sntp[63803]: 2019-08-07 19:25:49.504908 (+0000) -0.00017 +/- 0.051426 10.73.242.40 s2 no-leap Aug 07 19:25:49 ecsnode1.gslabs.lab.emc.com start-ntpd[63780]: Time synchronized with 10.73.242.40 Aug 07 19:25:50 ecsnode1.gslabs.lab.emc.com ntpd[63809]: ntpd 4.2.8p12@1.3728-o Wed Oct 17 16:05:35 UTC 2018 (1): Starting Aug 07 19:25:50 ecsnode1.gslabs.lab.emc.com ntpd[63809]: Command line: /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -x -g -u ntp:ntp -c /etc/ntp.conf Aug 07 19:25:50 ecsnode1.gslabs.lab.emc.com ntpd[63810]: proto: precision = 0.089 usec (-23) Aug 07 19:25:50 ecsnode1.gslabs.lab.emc.com ntpd[63810]: switching logging to file /var/log/ntp Aug 07 19:25:50 ecsnode1.gslabs.lab.emc.com start-ntpd[63780]: Starting network time protocol daemon (NTPD) Aug 07 19:25:50 ecsnode1.gslabs.lab.emc.com systemd[1]: Started NTP Server Daemon. Aug 07 20:00:00 ecsnode1.gslabs.lab.emc.com systemd[1]: Stopping NTP Server Daemon... Aug 07 20:00:00 ecsnode1.gslabs.lab.emc.com systemd[1]: Stopped NTP Server Daemon
- Confirm whether the Process Identifier (PID) is present or missing for NTP:
Command:
# sudo service ntpd status;ps ax | grep ntpd | grep -v grep
Example (PID is missing):
admin@node1:~> ps ax | grep ntpd | grep -v grep admin@node1:~>
Resolution:
- When ntpd is not running, the service must be (re)started:
Command:
# sudo service ntpd restart
Example:
admin@node1:~> sudo service ntpd restart admin@node1:~>
- Confirm that the service is running and the PID is present on the ECS.
Command
# sudo service ntpd status;ps ax | grep ntpd | grep -v grep
Example:
admin@node1:~> sudo service ntpd status;ps ax | grep ntpd | grep -v grep * ntpd.service - NTP Server Daemon Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled) Drop-In: /run/systemd/generator/ntpd.service.d `-50-insserv.conf-$time.conf Active: active (running) since Wed 2019-08-07 20:13:27 UTC; 3min 25s ago Docs: man:ntpd(1) Process: 913 ExecStart=/usr/sbin/start-ntpd start (code=exited, status=0/SUCCESS) Main PID: 944 (ntpd) Tasks: 2 (limit: 512) Memory: 820.0K CPU: 588ms CGroup: /system.slice/ntpd.service |-944 /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -x -g -u ntp:ntp -c /etc/ntp.conf `-945 ntpd: asynchronous dns resolver
Aug 07 20:13:26 ecsnode1.gslabs.lab.emc.com systemd[1]: Starting NTP Server Daemon... Aug 07 20:13:26 ecsnode1.gslabs.lab.emc.com sntp[937]: sntp 4.2.8p12@1.3728-o Wed Oct 17 16:05:30 UTC 2018 (1) Aug 07 20:13:26 ecsnode1.gslabs.lab.emc.com sntp[937]: 2019-08-07 20:13:26.567273 (+0000) +0.00003 +/- 0.048796 10.73.242.40 s2 no-leap Aug 07 20:13:26 ecsnode1.gslabs.lab.emc.com start-ntpd[913]: Time synchronized with 10.73.242.40 Aug 07 20:13:27 ecsnode1.gslabs.lab.emc.com ntpd[943]: ntpd 4.2.8p12@1.3728-o Wed Oct 17 16:05:35 UTC 2018 (1): Starting Aug 07 20:13:27 ecsnode1.gslabs.lab.emc.com ntpd[943]: Command line: /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -x -g -u ntp:ntp -c /etc/ntp.conf Aug 07 20:13:27 ecsnode1.gslabs.lab.emc.com ntpd[944]: proto: precision = 0.074 usec (-24) Aug 07 20:13:27 ecsnode1.gslabs.lab.emc.com ntpd[944]: switching logging to file /var/log/ntp Aug 07 20:13:27 ecsnode1.gslabs.lab.emc.com start-ntpd[913]: Starting network time protocol daemon (NTPD) Aug 07 20:13:27 ecsnode1.gslabs.lab.emc.com systemd[1]: Started NTP Server Daemon. 944 ? Ss 0:00 /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -x -g -u ntp:ntp -c /etc/ntp.conf 945 ? S 0:00 ntpd: asynchronous dns resolver
Additional Information
If the above resolution does not work, the customer's network team must be engaged to resolve the NTP issue.
For symptom 'All NTP servers are NOT suitable for synchronization' (NTP_NOT_SUITABLE_ERROR), see knowledge article ECS: xDoctor: RAP081: Symptom Code: 2048: All NTP servers are NOT suitable for synchronization.
For symptom 'All NTP servers adjust an offset higher than the error threshold' (NTP_ERROR_OFFSET_ERROR), see knowledge article ECS: xDoctor: RAP081: Symptom Code: 2048: All NTP servers adjust an offset higher than the error threshold.
For symptom 'System time difference above ERROR Threshold', see knowledge article ECS: xDoctor: RAP081: Symptom Code: 2048: System time difference above ERROR threshold
ECS: ObjectScale: How to run KB Automation Scripts (Auto Pilot)