ntpdate works, but ntpd can't synchronize

rhel5 ntp ntpd ntpdate

17,899

Solution 1

For those of you looking for a grand solution, I apologize. This is going to be cheesy.

Yes, the time server is unreachable, for a reason I couldn't ever determine. The good news is that one of the external DNS servers to which I have access turns out to be serving NTP packets itself, and it is connecting to that external time server for its ticks. It's a workaround, not a fix. But, I'll take what I can get.

So, in the end, I only lose one stratum of service.

As a side note, I did register with the NTP bug database so I could write enhancement bug 2297, asking for formal documentation for the peer refids .INIT., .LOCL., and LOCAL(0).

Solution 2

The 377 in the reach column means that connectivity is ok; telnet won't connect because NTP is UDP.

Try removing the server 127.127.1.0 from your config - the * by *LOCAL(0) tells us that the local server with stratum 5 is being used for sync, preferred over the remote server with stratum 1; the delay and offset both being 0.000 likely has a lot to do with that.

Solution 3

If you are going to include the local clock fudge its level a fair bit. It looks like you have it set to 5. I generally set it to at least 8 (fudge 127.127.1.0 stratum 8). If you don't fudge it, you can appear like an atomic clock to other hosts on your network. On one network I scanned, I found a lot of low strata servers announcing times which were usually incorrect by hours or days.

Shane is correct about the reach value which indicates you have access to the server. The high offset and jitter values for your time server indicates it may not be very reliable. They may be high, because your server is still synchronizing. The fact that the poll interval has increased to 128 indicates that your server is getting consistent results. It should gradually increase to 1024 seconds.

Try running a loop like:

while sleep 60; do
    ntpq -n -c peers; done

This will give you an idea how well ntp is working. You should see it stabilize over time.

There are a number of restrictions which can be set on ntpd to limit how much information about the server can be accessed remotely. It is possible you are restricted to only using the upstream server as a time source.

Firewall rules restricting traffic to port 123 for both source and destination are possible. This provides a working ntp setup but limits access by other tools. Some tools allow you to use port 123 as the source port if it available. I am partial to using ntpdate in debug mode.

If you are correct about the refid of the upstream server being your IP address, it appears to be using your server as it's preferred timesource. Try adding restrict noquery to your configuration. It may be your upstream server is poorly configured. Try adding your router and/or nameservers as sources, I find they can be better sources than the official corporate server.

Solution 4

I had the same problem:

ntpq -p was showing reach = 0

Yet 1- ntpd was running 2- ntp.conf has servers listed 3- ntpdate worked using those servers 4- ntpdate -u worked using those servers 5- nc showed TCP port 123 was open on those server 6- nc showed UDP port 123 was open on those server

So basically ntpdate worked and there were not firewall issues and yet ntpq -p showed reach =0 for each server listed.

Turned out to be the restrict lines in ntp.conf. I just removed all the restrict lines from ntp.conf and restarted ntpd and everything worked from there.

View more solutions

17,899

dafydd

Updated on September 18, 2022

Comments

dafydd over 1 year
This is in RHEL 5.5.

First, ntpdate to the remote host works:
```
$ ntpdate XXX.YYY.4.21
24 Oct 16:01:17 ntpdate[5276]: adjust time server XXX.YYY.4.21 offset 0.027291 sec
```
Second, here are the server lines in my /etc/ntp.conf. All restrict lines have been commented out for troubleshooting.
```
server 127.127.1.0
server XXX.YYY.4.21
```
I execute service ntpd start and check with ntpq:
```
$ ntpq
ntpq> peer
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        .LOCL.           5 l   36   64  377    0.000    0.000   0.001
 timeserver.doma .LOCL.           1 u   39  128  377    0.489   51.261  58.975

ntpq> opeer
 remote           local          st t when poll reach   delay   offset    disp
==============================================================================
*LOCAL(0)        127.0.0.1        5 l   40   64  377    0.000    0.000   0.001
 timeserver.doma XXX.YYY.22.169   1 u   43  128  377    0.489   51.261  58.975
```
XXX.YYY.22.169 is the address of the host I'm working on. A reverse lookup on the IP address in my ntp.conf file validates that the ntpq output is correctly naming the remote server. However, as you can see, it appears to just roll over to my .LOCL. time server. Also, ntptrace just returns the local time server, and ntptrace XXX.YYY.4.21 times out.
```
$ ntptrace
localhost.localdomain: stratum 6, offset 0.000000, synch distance 0.948181

$ ntptrace XXX.YYY.4.21
XXX.YYY.4.21: timed out, nothing received
***Request timed out
```
This looks like my ntp daemon is just querying itself.

I am thinking about the possibility that the router-I-don't-control between my test network timeserver and the corporate network timeserver is blocking on source port. (I think ntpdate sends on port 123, which gets it around that filter and is why I can't use it while ntpd is running.) I have email in to the network folks to check that.

Finally, telnet XXX.YYY.4.21 123 never times out or completes a connection.

The questions:

What am I missing, here?

What else can I check to try to figure out where this connection is failing?

Would strace ntptrace XXX.YYY.4.21 show me the source port ntptrace is sending from? I can deconstruct most strace calls, but I can't figure out the location of that datum.

If I can't directly examine the gateway router between my test network and the timeserver, how might I build evidence that it's responsible for these disconnections? Alternately, how might I rule it out?
- David Schwartz over 11 years
  
  I think everything's fine. You just need to wait a while for the server to synch. Because your clock isn't stable yet, other people's clocks don't seem stable to you, because you're measuring them against yours. It takes awhile (hours at least) for this to settle down.
- dafydd over 11 years
  
  Hi, David. I haven't seen a change in 24 hours. I'll try the proposed answers tomorrow and report back.
Martin Schröder over 11 years

Make that loop watch -d ntpq -n -c peers
dafydd over 11 years

I would believe the link was okay except for ntptrace XXX.YYY.2.21 failing. I think I should be seeing something there. I'll try removing the local server tomorrow and report on what I get. Thanks!
dafydd over 11 years

When I have the 127.127.1.0 server available, the syslog will show me synchronized to LOCAL(0), stratum 5. After removing that line and waiting 20 minutes, I still don't get any synchronization to the remote server.
ravi yarlagadda over 11 years

@dafydd Did you restart the ntpd service after changing the config? What's the output from ntpq -pn now?
dafydd over 11 years

@BillThor If I drop the server 127.127.1.0 line, ntpq shows the correct remote, a continuing refid of .LOCL. and no synchronized message in syslog. If I also enable restrict noquery, I still get no synchronized line in syslog and ntpq -c peers times out without returning anything. I think I'm seeing .LOCL. because the remote is unreachable.
dafydd over 11 years

@BillThor After restoring the server 127.127.1.0 and restrict lines, and adding fudge 127.127.1.0 stratum 10, I now get a synchronized to LOCAL(0) line in syslog and timeouts in ntpq for both localhost.localdomain and the remote server. Trying again with stratum 8 repeats the timeout. Commenting the fudge repeats the timeout. Commenting the restrict lines restores original behavior.
dafydd over 11 years

Yes, every time. And, I watch the syslog for a synchronized message before I start querying ntpq.
ravi yarlagadda over 11 years

Right, but what is it showing when queried when the sync message hasn't occurred?
dafydd over 11 years

I'll get responses for both servers, but the REFID is still .LOCL. for both hosts, and both hosts have spaces for their ntpq tally codes. Usually, the LOCAL(0) host will tally as sys.peer, while the remote host will always tally as reject. Also, just to make sure, have you seen my comments back to BillThor?
ravi yarlagadda over 11 years

Both servers? If you only have the one remote server in your config, what other server is the query showing? And that refid is indeed interesting - what's that remote server syncing off of?
dafydd over 11 years

Sorry, the LOCAL(0) server shows a refid of .LOCL., which makes sense. The remote server still shows the same refid of .LOCL. and never synchronizes. Either my ntpd isn't hitting the remote server at all, which I suspect, or it's in a sync loop. The latter is possible, given the restrict behavior I've seen below.
ravi yarlagadda over 11 years

@dafydd Right - what's the remote server set to sync to? The reach incrementing indicates connectivity. A sync loop appears to be your problem.
dafydd over 11 years

Unfortunately, I have no way to correct that. I'll ask the administrators of that time server what their settings are, and see if they can restrict from my subnet, at least.
ravi yarlagadda over 11 years

@dafydd I'm not sure what you mean - what would they need to restrict? They're either syncing from you or they aren't - as Bill mentioned, you can block them from syncing from your end with a restrict noquery. Why not test syncing to a real time server out on the internet?
dafydd over 11 years

I'll reset restrict noquery with no other options, and let you know how it goes.
ravi yarlagadda over 11 years

@dafydd You should already have a restrict line for that system, if it's your server. Just add noquery to it. And keep in mind, you're cutting of their time sync to you if you do this - I'd make sure they have a working secondary time source.
BillThor over 11 years

@dafydd It really looks like you are syncing with yourself. Try adding server lines for your nameservers and gateway router.
dafydd over 11 years

@BillThor Thanks. That was the suggestion that got me to finding another way to that timeserver.