Openshift Event Code: 1038NODE0009
Summary: Clock not synchronising.
Symptoms
Cause
NodeClockNotSynchronising alert triggers when a node is affected by issues with the NTP server for that node. For example, this alert might trigger when certificates are rotated for the API Server on a node, and the certificates fail validation because of an invalid time.Resolution
Diagnosis
To diagnose the underlying issue, start a debug pod on the affected node and check the chronyd service:
oc -n default debug node/<affected_node_name> systemctl status chronyd |
Mitigation
-
If the
chronydservice is failing or stopped, start it:systemctl start chonyd
If the chronyd service is ready, restart it
systemctl restart chronyd
If
chronydstarts or restarts successfuly, the service adjusts the clock and displays something similar to the following example output:Oct 18 19:39:36 ip-100-67-47-86 chronyd[2055318]: System clock wrong by 16422.107473 seconds, adjustment started Oct 19 00:13:18 ip-100-67-47-86 chronyd[2055318]: System clock was stepped by 16422.107473 seconds
-
Verify that the
chronydservice is running:systemctl status chronyd
-
Verify using PromQL:
min_over_time(node_timex_sync_status[5m]) node_timex_maxerror_seconds
node_timex_sync_statusreturns1if NTP is working properly,or0if NTP is not working properly.node_timex_maxerror_secondsindicates how many seconds NTP is falling behind.The alert triggers when the value for
min_over_time(node_timex_sync_status[5m])equals0and the value fornode_timex_maxerror_secondsis greater than or equal to16.
Support
If all the above steps cannot resolve the issue, contact the Dell EMC technical support for further investigation.