NTP sync on 2nd node issue

Question

Out of my 3 node cluster, the 2nd node on both production site and DR site would not sync with time server. So for this I was running this command to re-sync the time with the time server. isi_for_array -s 'killall ntpd && ntpdate xxx.xxx.xxx.xxx' Later I noticed this was working for a period of time and again the second node was getting out of sync (maybe atleast after 4 hours). Goodthing - the ISILON OneFS is a UNIX kernel and I wrote a cronjob to re-sync the time with the NTP server every 15 mins this way. 5,20,35,50 * * * * isi_for_array -s 'killall ntpd && ntpdate xxx.xxx.xxx.xxx' This way the time is now up-to-date with the time server and all the 3 nodes show accurate time on all times. Any comments - thoughts gentlemen ?

johnsonka · Answer

Hello tazatemc,

Thank you for your question! When it comes to NTP on the cluster, there are a few things we can check in order to determine whether the issue may be environmental or a cluster malfunction.

First, do you use an authoritative time source like time.gov, utilizing a local NTP server, or using your Windows Domain Controllers? If you are using your DCs, are they configured as NTP servers or are you relying on smbtime? If you are relying on smbtime it would be prudent to configure your DCs as NTP servers as the time sync will be far more reliable.

Additionally, we should be addressing a potential issue with ntp drift by resetting the ntp drift file that exists on the cluster. I would recommend:

# isi_for_array "killall ntpd && cat /dev/null > /var/crash/ntp.drift && ntpdate -ub "

Once this is complete, wait at least 5 minutes and:

# isi_for_array "ntpdate -ub "

After you have updated the time the second time, please check the status of the ntp connections between the nodes and their time sources. You can do this by:

# isi_for_array -s "ntpq -np"

This output will look something like this (in this example, we were only checking specific nodes. In your 3 node cluster, please check them all as indicated in the command above):

table.PNG.png

You can interpret this table by using this graphic:

whattheymean.PNG.png

When looking at this, you are looking for any node that has not elected a peer other than itself or any node that has marked another node or your time source as a "false ticker." If you run in to a false ticker issue or an issue where your nodes are not electing a peer, I would recommend creating a service request with Support:

To create a service request, you have a couple options:

1. Log in to your online account on support.emc.com and go to this page: https://support.emc.com/servicecenter/createSR

2. Call in to EMC Isilon Support at 1-800-782-4362 (For a complete local country dial list, please see this document: http://www.emc.com/collateral/contact-us/h4165-csc-phonelist-ho.pdf)

If your nodes are ALL not electing a peer or ALL marking your time source as a false ticker, I would recommend pointing your cluster at an authoritative time source rather than one configured in your environment (this may be a good troubleshooting step to begin with as well). This will allow troubleshooting to move forward beyond your environment/network. If you nodes are able to stay in sync and peer with the authoritative time source, the issue will not lie in the cluster or the ntp daemon, but in the time source you had been using.

Please let us know if you have any additional questions or concerns. We would be more than happy to help!

virtualphoton · Answer

Appreciate your brief explanation on that Katie.

Today I did a code upgrade on my cluster from 7.2.0.2 to 7.2.0.3 and I noticed the time is syncing fine with our external NTP server and things look extremely smooth with this release.

Thanks again!

Taz~

Isilon

Was this post helpful?