ntpdate works, but ntpd can't synchronize
Solution 1
For those of you looking for a grand solution, I apologize. This is going to be cheesy.
Yes, the time server is unreachable, for a reason I couldn't ever determine. The good news is that one of the external DNS servers to which I have access turns out to be serving NTP packets itself, and it is connecting to that external time server for its ticks. It's a workaround, not a fix. But, I'll take what I can get.
So, in the end, I only lose one stratum of service.
As a side note, I did register with the NTP bug database so I could write enhancement bug 2297, asking for formal documentation for the peer refids .INIT., .LOCL., and LOCAL(0).
Solution 2
The 377
in the reach
column means that connectivity is ok; telnet
won't connect because NTP is UDP.
Try removing the server 127.127.1.0
from your config - the *
by *LOCAL(0)
tells us that the local server with stratum 5 is being used for sync, preferred over the remote server with stratum 1; the delay and offset both being 0.000 likely has a lot to do with that.
Solution 3
If you are going to include the local clock fudge its level a fair bit. It looks like you have it set to 5. I generally set it to at least 8 (fudge 127.127.1.0 stratum 8
). If you don't fudge it, you can appear like an atomic clock to other hosts on your network. On one network I scanned, I found a lot of low strata servers announcing times which were usually incorrect by hours or days.
Shane is correct about the reach
value which indicates you have access to the server. The high offset
and jitter
values for your time server indicates it may not be very reliable. They may be high, because your server is still synchronizing. The fact that the poll
interval has increased to 128 indicates that your server is getting consistent results. It should gradually increase to 1024 seconds.
Try running a loop like:
while sleep 60; do
ntpq -n -c peers; done
This will give you an idea how well ntp is working. You should see it stabilize over time.
There are a number of restrictions which can be set on ntpd
to limit how much information about the server can be accessed remotely. It is possible you are restricted to only using the upstream server as a time source.
Firewall rules restricting traffic to port 123 for both source and destination are possible. This provides a working ntp setup but limits access by other tools. Some tools allow you to use port 123 as the source port if it available. I am partial to using ntpdate
in debug mode.
If you are correct about the refid
of the upstream server being your IP address, it appears to be using your server as it's preferred timesource. Try adding restrict noquery
to your configuration. It may be your upstream server is poorly configured. Try adding your router and/or nameservers as sources, I find they can be better sources than the official corporate server.
Solution 4
I had the same problem:
ntpq -p was showing reach = 0
Yet 1- ntpd was running 2- ntp.conf has servers listed 3- ntpdate worked using those servers 4- ntpdate -u worked using those servers 5- nc showed TCP port 123 was open on those server 6- nc showed UDP port 123 was open on those server
So basically ntpdate worked and there were not firewall issues and yet ntpq -p showed reach =0 for each server listed.
Turned out to be the restrict lines in ntp.conf. I just removed all the restrict lines from ntp.conf and restarted ntpd and everything worked from there.
Related videos on Youtube
dafydd
Updated on September 18, 2022Comments
-
dafydd over 1 year
This is in RHEL 5.5.
First, ntpdate to the remote host works:
$ ntpdate XXX.YYY.4.21 24 Oct 16:01:17 ntpdate[5276]: adjust time server XXX.YYY.4.21 offset 0.027291 sec
Second, here are the server lines in my /etc/ntp.conf. All
restrict
lines have been commented out for troubleshooting.server 127.127.1.0 server XXX.YYY.4.21
I execute
service ntpd start
and check withntpq
:$ ntpq ntpq> peer remote refid st t when poll reach delay offset jitter ============================================================================== *LOCAL(0) .LOCL. 5 l 36 64 377 0.000 0.000 0.001 timeserver.doma .LOCL. 1 u 39 128 377 0.489 51.261 58.975 ntpq> opeer remote local st t when poll reach delay offset disp ============================================================================== *LOCAL(0) 127.0.0.1 5 l 40 64 377 0.000 0.000 0.001 timeserver.doma XXX.YYY.22.169 1 u 43 128 377 0.489 51.261 58.975
XXX.YYY.22.169 is the address of the host I'm working on. A reverse lookup on the IP address in my ntp.conf file validates that the ntpq output is correctly naming the remote server. However, as you can see, it appears to just roll over to my .LOCL. time server. Also,
ntptrace
just returns the local time server, andntptrace XXX.YYY.4.21
times out.$ ntptrace localhost.localdomain: stratum 6, offset 0.000000, synch distance 0.948181 $ ntptrace XXX.YYY.4.21 XXX.YYY.4.21: timed out, nothing received ***Request timed out
This looks like my ntp daemon is just querying itself.
I am thinking about the possibility that the router-I-don't-control between my test network timeserver and the corporate network timeserver is blocking on source port. (I think ntpdate sends on port 123, which gets it around that filter and is why I can't use it while ntpd is running.) I have email in to the network folks to check that.
Finally,
telnet XXX.YYY.4.21 123
never times out or completes a connection.The questions:
What am I missing, here?
What else can I check to try to figure out where this connection is failing?
Would
strace ntptrace XXX.YYY.4.21
show me the source port ntptrace is sending from? I can deconstruct most strace calls, but I can't figure out the location of that datum.If I can't directly examine the gateway router between my test network and the timeserver, how might I build evidence that it's responsible for these disconnections? Alternately, how might I rule it out?
-
David Schwartz over 11 yearsI think everything's fine. You just need to wait a while for the server to synch. Because your clock isn't stable yet, other people's clocks don't seem stable to you, because you're measuring them against yours. It takes awhile (hours at least) for this to settle down.
-
dafydd over 11 yearsHi, David. I haven't seen a change in 24 hours. I'll try the proposed answers tomorrow and report back.
-
-
Martin Schröder over 11 yearsMake that loop
watch -d ntpq -n -c peers
-
dafydd over 11 yearsI would believe the link was okay except for
ntptrace XXX.YYY.2.21
failing. I think I should be seeing something there. I'll try removing the local server tomorrow and report on what I get. Thanks! -
dafydd over 11 yearsWhen I have the
127.127.1.0
server available, the syslog will show mesynchronized to LOCAL(0), stratum 5
. After removing that line and waiting 20 minutes, I still don't get any synchronization to the remote server. -
ravi yarlagadda over 11 years@dafydd Did you restart the
ntpd
service after changing the config? What's the output fromntpq -pn
now? -
dafydd over 11 years@BillThor If I drop the
server 127.127.1.0
line, ntpq shows the correct remote, a continuing refid of.LOCL.
and nosynchronized
message in syslog. If I also enablerestrict noquery
, I still get nosynchronized
line in syslog andntpq -c peers
times out without returning anything. I think I'm seeing .LOCL. because the remote is unreachable. -
dafydd over 11 years@BillThor After restoring the
server 127.127.1.0
andrestrict
lines, and addingfudge 127.127.1.0 stratum 10
, I now get asynchronized to LOCAL(0)
line in syslog and timeouts in ntpq for bothlocalhost.localdomain
and the remote server. Trying again withstratum 8
repeats the timeout. Commenting thefudge
repeats the timeout. Commenting therestrict
lines restores original behavior. -
dafydd over 11 yearsYes, every time. And, I watch the syslog for a
synchronized
message before I start queryingntpq
. -
ravi yarlagadda over 11 yearsRight, but what is it showing when queried when the sync message hasn't occurred?
-
dafydd over 11 yearsI'll get responses for both servers, but the
REFID
is still.LOCL.
for both hosts, and both hosts have spaces for theirntpq
tally codes. Usually, the LOCAL(0) host will tally assys.peer
, while the remote host will always tally asreject
. Also, just to make sure, have you seen my comments back to BillThor? -
ravi yarlagadda over 11 yearsBoth servers? If you only have the one remote server in your config, what other server is the query showing? And that refid is indeed interesting - what's that remote server syncing off of?
-
dafydd over 11 yearsSorry, the
LOCAL(0)
server shows arefid
of.LOCL.
, which makes sense. The remote server still shows the samerefid
of.LOCL.
and never synchronizes. Either myntpd
isn't hitting the remote server at all, which I suspect, or it's in a sync loop. The latter is possible, given therestrict
behavior I've seen below. -
ravi yarlagadda over 11 years@dafydd Right - what's the remote server set to sync to? The
reach
incrementing indicates connectivity. A sync loop appears to be your problem. -
dafydd over 11 yearsUnfortunately, I have no way to correct that. I'll ask the administrators of that time server what their settings are, and see if they can restrict from my subnet, at least.
-
ravi yarlagadda over 11 years@dafydd I'm not sure what you mean - what would they need to restrict? They're either syncing from you or they aren't - as Bill mentioned, you can block them from syncing from your end with a
restrict noquery
. Why not test syncing to a real time server out on the internet? -
dafydd over 11 yearsI'll reset
restrict noquery
with no other options, and let you know how it goes. -
ravi yarlagadda over 11 years@dafydd You should already have a restrict line for that system, if it's your server. Just add
noquery
to it. And keep in mind, you're cutting of their time sync to you if you do this - I'd make sure they have a working secondary time source. -
BillThor over 11 years@dafydd It really looks like you are syncing with yourself. Try adding server lines for your nameservers and gateway router.
-
dafydd over 11 years@BillThor Thanks. That was the suggestion that got me to finding another way to that timeserver.