What is NTP dispersion and how do I control it?

25,724

Solution 1

I see some confusion going on in the answers here. For starters, ntpclient, at least in -s mode, isn't acting as a full NTP client, it's only sending and receiving one packet, so there's no "last 8 packets received". It isn't actually estimating its own dispersion at all.

Instead, the value it's printing is the value called "root dispersion" (rootdisp) in the packet returned by the server, which is an estimate of the total amount of error/variance between that server and the correct time. The way this is calculated is pretty simple: every NTP server either gets its time from an external clock (for example a radio or GPS receiver), or from another NTP server. If a server gets its time from an external clock, its root dispersion is the estimated maximum error of that clock. If it gets its time from another NTP server, its root dispersion is that server's root dispersion plus the dispersion added by the network link between them.

One point of confusion here is that while ntpq and chrony display dispersion and root dispersion in seconds, which is what people are used to looking to, ntpclient displays it in microseconds. Regardless, a value of 1217163 is still quite high. A good NTP server knows the time within a few milliseconds; a bad one within a few tens or hundreds of milliseconds. Yours is telling you that its time can only be trusted to within +/- 1.2 seconds.

You can actually get ntpclient to synchronize to this server anyway by passing the -x 0 or -t option (depending on version of ntpclient), which disables NTP sanity checks. If you only need roughly accurate time (to within a few seconds), that may be good enough. However, ntpclient is being pretty reasonable in refusing to synchronize to such a bad server. Your ntpq output on the ubuntu machine is showing a jitter of hundreds of milliseconds for all of its servers, even though they have low delay, which indicates either a very unreliable network, a conspiracy of all of the servers to provide erratic time, or a basic timekeeping problem on the server itself.

It also concerns me that the server 10.31.10.22 is advertising a refid of LOCL (undisciplined local clock) but has a stratum of 1. Usually the local clock is fudged to a stratum of 10 so that it's only used as a last-resort synchronization source to keep a herd from drifting apart. Either 10.31.10.22 is misconfigured and providing bad time to the rest of the network, or it's being disciplined to good time by some program outside of NTP's control, in which case the misconfiguration is simply that it's advertising the LOCL refid; it should be overridden to e.g. GPS or whatever is providing its time.

Solution 2

Just a partial answer for "What is dispersion?":

A typical NTP round trip:

client |        | server
    t1 |------->| t2
    t3 |<-------| t4

This yields two values, offset (the time difference between client and server), and the delay (essential the network travel time) with the following formulas:

offset= ((t4 - t3) + (t1 - t2)) / 2
delay = (t4 - t1) - (t3 - t2)

The client selects the current offset from the last 8 packets received, choosing the one with the smallest delay.

The same 8 packets are used to calculate the dispersion by doing a weighted average of the difference of these 8 offsets to the one selected in the last step, where the delay is used as the weighting factor, giving greater weight to smaller delays. It is a measure for the "spread" of the values and used to calculate the quality of a time server, especially if you have multiple to choose from.

Solution 3

Your dispersion and skew are enormous, there is a very large offset from the local clock to that peer. You should compare the offsets with the local date and set the clock manually.

Get ntpd running and show ntpq -p from a host using all of the peers. It will select the better ones.

Solution 4

According to the this cisco documentation, "dispersion, reported in seconds, is the maximum clock time difference that was ever observed between the local clock and server clock". With ntp servers that are not totally broken, a high dispersion should never occur. The only feasible scenario is when your client inits ntp and so far has only its local clock available. And even then, a dispersion as high as you report corresponds to clocks that are off by more than two weeks.

It should be sufficient to ensure that the local clock is not too far off in the beginning (even a couple of hours would still be acceptable), either by adjusting the clock (and date even!) in the BIOS or by issuing ntpdate once before starting ntpd on the client.

Share:
25,724

Related videos on Youtube

Jeff
Author by

Jeff

Updated on September 18, 2022

Comments

  • Jeff
    Jeff over 1 year

    We roll out Ubuntu 14.04 servers on isolated networks, running ntpd 4.2.6p5, configured to use multiple NTP servers as provided by customers (no access to pool.ntp.org). Our dumb terminal client devices run an old version of BusyBox (1.00-rc2) and ntpclient 2010 from Larry Doolittle.

    This setup has worked great for years, but recently we've hit a roadblock with a new customer. They provided us with 5 in-house NTP server addresses which seem to work great on their own, as far as ntpdate-debian is concerned on the Linux server. On the BusyBox side however, ntpclient complains with "Dispersion too high". From the debug output, ntpclient gets "1217163.1" from the NTP server but the max value it supports is absolute(65536).

    $ /usr/sbin/ntpclient -s -i 15 -h 10.17.162.250 -d
    Configuration:
      -c probe_count 1
      -d (debug)     1
      -g goodness    0
      -h hostname    10.17.162.250
      -i interval    15
      -l live        0
      -p local_port  0
      -q min_delay   800.000000
      -s set_clock   1
      -x cross_check 1
    Listening...
    Sending ...
    recvfrom
    packet of length 48 received
    Source: INET Port 123 host 10.17.162.250
    LI=0  VN=3  Mode=4  Stratum=4  Poll=4  Precision=-20
    Delay=60745.2  Dispersion=1346801.8  Refid=10.31.10.21
    Reference 3668859928.942079
    (sent)    3668859928.708371
    Originate 3668859928.708371
    Receive   3668859928.963271
    Transmit  3668859928.963369
    Our recv  3668859928.708371
    Total elapsed:      0.00
    Server stall:      93.09
    Slop:             -93.09
    Skew:          255443.94
    Frequency:             0
     day   second     elapsed    stall     skew  dispersion  freq
    42463 56728.708  rejected packet: abs(DISP)>65536
    

    These are all devices on the same LAN so frankly I am flabbergasted. Aghast even.

    Here's the ntpq -pn output from the Ubuntu 14.04 server:

    user@host:~$ ntpq -pn
         remote           refid      st t when poll reach   delay   offset  jitter
    ==============================================================================
     127.127.1.0     .LOCL.          10 l 1025   64    0    0.000    0.000   0.000
     10.17.162.249   10.17.6.10       5 u   23 1024   37    0.865  1381.07 697.260
     10.31.10.22     .LOCL.           1 u 1044 1024   17   29.586  -838.06 397.342
     10.17.6.10      10.31.10.21      4 u 1065 1024   17    0.366  105.245 402.999
    *10.31.10.21     132.246.11.238   3 u    5 1024   37   29.418  794.292 616.796
     10.17.6.11      10.31.10.21      4 u 1038 1024   17    0.408  120.030 381.058
    

    My questions are:

    1. What is dispersion and what can alter its value?
    2. What commands could I run to get more details from the NTP servers?
    3. Could the fault lie on the Ubuntu server side, with an improper ntp.conf? There is nothing special there really.
    4. Would switching to chrony change anything in this case?
    • Criggie
      Criggie about 8 years
      Just assuming - are the clocks of the five provided NTP servers any good? Can you drop the worst ones out of your configs?
    • Martin Schröder
      Martin Schröder about 8 years
      Your offsets and jitters are way too high. Get at least one proper source.
  • Hagen von Eitzen
    Hagen von Eitzen about 8 years
    Sure about the formulas? After all, only t4-t2 and t3-t1 are knowable to the parties involved
  • Jeff
    Jeff about 8 years
    Added ntpq -pn output to my question. Thank you for looking into this.
  • Thomas
    Thomas about 8 years
    @HagenvonEitzen The time can be included in the packet
  • John Mahowald
    John Mahowald about 8 years
    Offset and jitter in the hundreds? That's not very good. You mentioned no access to Internet sources like pool.ntp.org but those perform much better. Consider adding a reference clock like GPS, a radio source, a PPS input or similar. Or pick a host with a local clock that isn't all over the place.
  • answer42
    answer42 about 8 years
    ntpclient is reporting values in microseconds, so the listed dispersion is actually ~1.2 seconds, not weeks :) Also, the interpretation in that Cisco doc doesn't apply to this value.
  • Jeff
    Jeff about 8 years
    Fantastic answer. I will try -x 0 or -t and report back. Regarding 10.31.10.22, I might take it out of the servers list. Great catch. I don't really have any information regarding these servers, are there any other debug commands to get details from an NTP server or is it pretty much ntpq -p?
  • Jeff
    Jeff about 8 years
    As you said, the -t switch trusts the in-house NTP server despite high dispersion. We still can't explain why it randomly peaks like that, but that's for another post maybe. Thank you.
  • answer42
    answer42 about 8 years
    @Jeff glad to help :)
  • Ian Riley
    Ian Riley over 7 years
    @Sven I also believe there is an issue with the formulas; see page 28 here and also this White Paper, both by Mills. By the way you have your t's laid out, it should be offset = 1/2 * [(T2-T1) + (T4-T3)] and `delay = (T3-T1) - (T4-T2)'
  • Admin
    Admin over 6 years
    Sven, do you have t3/t4 in the right place in you typical round trip? The traffic flow and delay calculation seem to indicate they should be the other way around: t4 -t1 should be the total RTT, t3-t2 should be the time in spent inside the server.