Chrony 3.1 refuses to sync with ntp server

24,547

Solution 1

There is a similar bug in RH Bugzilla that was closed as notabug. The issue is a combination of poor time server and a change in defaults for newer chrony to not use them.

https://bugzilla.redhat.com/show_bug.cgi?id=1525833

"The server is ignored for synchronization of the clock because it's too inaccurate. In the "chronyc sources" output there is "+/- 4695ms", which is larger than the default maxdistance of 3 seconds. The maxdistance option was added in chrony-2.2, so that's why it worked with chrony-2.1. Older versions only have a hardcoded limit for the root dispersion to be smaller than 16 seconds.

The tcpdump output shows that the NTP server has a root dispersion of about 3.6 seconds. Is it a Windows NTP server? You can also check the root dispersion with "chronyc ntpdata".

A larger maxdistance needs to be set in chrony.conf to allow chronyd to use the server for synchronization."

Solution 2

For chrony 3.1.

We've pieced together a solution based on the following thread, but for a concise, simple to check answer try the following. Check the status of the time sync you receive with following command (-v explains the columns)

chronyc sources -v

The far right column (e.g. +/- 10.5s) tells you the 'estimated error' of the time update received from the server in question.

Our issue was that the time received from windows NTP server exceeded the 'max estimated error' threshold of 3 seconds (they were +/- 10 seconds) and therefore chrony was not updating system time accordingly. Setting our servers to use UK NTP pool servers corrected the issue (+/- 50 ms)

Solution 3

If you have a windows based NTP server, maybe this will be a fix for you (It worked for me in a similar problem):

https://chrony.tuxfamily.org/faq.html

3.4. Using a Windows NTP server? A common issue with Windows NTP servers is that they report a very large root dispersion (e.g. three seconds or more), which causes chronyd to ignore the server for being too inaccurate. The sources command might show a valid measurement, but the server is not selected for synchronisation. You can check the root dispersion of the server with the chronyc's ntpdata command.

The maxdistance value needs to be increased in chrony.conf to enable synchronisation to such a server. For example:

maxdistance 16.0

Share:
24,547
Jawad Al Shaikh
Author by

Jawad Al Shaikh

Just another minimalist practicing development in IT domain. #OpenToWork

Updated on September 18, 2022

Comments

  • Jawad Al Shaikh
    Jawad Al Shaikh almost 2 years

    I have 70 machine with CentOS 7.2 and chrony version 2.1.1 syncing perfect with my NTP server protocol v3.

    Recently I added 30 machines with CentOS 7.4 and chrony version 3.1, but these 30 machines refuse to sync, I followed all the troubleshooting procedures and I am totally stuck figuring out how to fix that. commands output:

    chronyc tracking
    Reference ID    : 00000000 ()
    Stratum         : 0
    Ref time (UTC)  : Thu Jan 01 00:00:00 1970
    System time     : 0.000000013 seconds fast of NTP time
    Last offset     : +0.000000000 seconds
    RMS offset      : 0.000000000 seconds
    Frequency       : 11.390 ppm fast
    Residual freq   : +0.000 ppm
    Skew            : 0.000 ppm
    Root delay      : 1.000000000 seconds
    Root dispersion : 1.000000000 seconds
    Update interval : 0.0 seconds
    Leap status     : Not synchronised
    
    
    chronyc sources
    210 Number of sources = 1
    MS Name/IP address         Stratum Poll Reach LastRx Last sample
    ===============================================================================
    ^? 172.17.172.220                4   7   377   644   -11.6s[ -11.6s] +/- 8147ms
    
    
    
    tcpdump -n -i lo port 323 [Note: I applied "chronyc sources" in other terminal but nothing captured, in the working machines it capture some packets!]
    
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
    ^C
    0 packets captured
    0 packets received by filter
    0 packets dropped by kernel
    
    
     tcpdump -n -i eno2  port 123
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eno2, link-type EN10MB (Ethernet), capture size 262144 bytes
    15:03:09.870958 IP 192.168.0.100.44841 > 172.17.172.220.ntp: NTPv4, Client, length 48
    15:03:10.112707 IP 172.17.172.220.ntp > 192.168.0.100.44841: NTPv3, Server, length 48
    15:11:45.678320 IP 192.168.0.100.46832 > 172.17.172.220.ntp: NTPv4, Client, length 48
    15:11:45.892482 IP 172.17.172.220.ntp > 192.168.0.100.46832: NTPv3, Server, length 48
    15:20:22.634981 IP 192.168.0.100.41310 > 172.17.172.220.ntp: NTPv4, Client, length 48
    15:20:22.871226 IP 172.17.172.220.ntp > 192.168.0.100.41310: NTPv3, Server, length 48
    15:28:55.820943 IP 192.168.0.100.39143 > 172.17.172.220.ntp: NTPv4, Client, length 48
    15:28:55.873988 IP 172.17.172.220.ntp > 192.168.0.100.39143: NTPv3, Server, length 48
    15:37:35.840998 IP 192.168.0.100.57333 > 172.17.172.220.ntp: NTPv4, Client, length 48
    15:37:35.913139 IP 172.17.172.220.ntp > 192.168.0.100.57333: NTPv3, Server, length 48
    15:46:15.814980 IP 192.168.0.100.56932 > 172.17.172.220.ntp: NTPv4, Client, length 48
    15:46:15.882518 IP 172.17.172.220.ntp > 192.168.0.100.56932: NTPv3, Server, length 48
    15:54:48.587705 IP 192.168.0.100.33711 > 172.17.172.220.ntp: NTPv4, Client, length 48
    15:54:48.632963 IP 172.17.172.220.ntp > 192.168.0.100.33711: NTPv3, Server, length 48
    ^C
    14 packets captured
    14 packets received by filter
    0 packets dropped by kernel
    
    chronyc activity
    200 OK
    1 sources online
    0 sources offline
    0 sources doing burst (return to online)
    0 sources doing burst (return to offline)
    0 sources with unknown address
    
    
    chronyc ntpdata  172.17.172.220
    Remote address  : 172.17.172.220 (AC11ACDC)
    Remote port     : 123
    Local address   : 192.168.0.100 (C0A80064)
    Leap status     : Normal
    Version         : 3
    Mode            : Server
    Stratum         : 4
    Poll interval   : 8 (256 seconds)
    Precision       : -6 (0.015625000 seconds)
    Root delay      : 0.031219 seconds
    Root dispersion : 8.063156 seconds
    Reference ID    : AC11AC88 ()
    Reference time  : Sun Nov 12 09:21:36 2017
    Offset          : +11.719727516 seconds
    Peer delay      : 0.215471357 seconds
    Peer dispersion : 0.015626255 seconds
    Response time   : 0.000000000 seconds
    Jitter asymmetry: -0.47
    NTP tests       : 111 111 1101
    Interleaved     : No
    Authenticated   : No
    TX timestamping : Kernel
    RX timestamping : Kernel
    Total TX        : 35
    Total RX        : 35
    Total valid RX  : 35
    
    
    chronyc serverstats
    NTP packets received       : 0
    NTP packets dropped        : 0
    Command packets received   : 6
    Command packets dropped    : 0
    Client log records dropped : 0
    

    What should I do to fix:

    Reference ID : 00000000 ()
    Stratum : 0
    NTP packets received : 0

    I already rebooted whole OS, tried all chronyc commands like makestep and waitsync. but nothing working. I also tried to find reported bugs but couldn't find any related.

    note that firewalld disabled. and /etc/chrony.conf is exact copy from the working 70 machines.

    Update:
    By activating tcpdump's verbose mode, it seems chrony 3.1 timestamps corrupted, even by trying chronyc makestep 1 -1 it didn't sync, also I ran debug mode "see below":

    tcpdump -n -i eno2  port 123 -vvvvv
    tcpdump: listening on eno2, link-type EN10MB (Ethernet), capture size 262144 bytes
    20:25:15.708374 IP (tos 0x0, ttl 64, id 399, offset 0, flags [DF], proto UDP (17), length 76)
        192.168.0.100.49105 > 172.17.172.220.ntp: [bad udp cksum 0x1a45 -> 0xf15f!] NTPv4, length 48
            Client, Leap indicator:  (0), Stratum 0 (unspecified), poll 6 (64s), precision 32
            Root Delay: 0.000000, Root dispersion: 0.000000, Reference-ID: (unspec)
              Reference Timestamp:  0.000000000
              Originator Timestamp: 3719492661.028820399 (2017/11/12 20:24:21)
              Receive Timestamp:    1089474065.361510029 (2070/08/17 02:09:21)
              Transmit Timestamp:   2540453432.493019109 (1980/07/03 13:30:32)
                Originator - Receive Timestamp:  +1664948700.332689629
                Originator - Transmit Timestamp: -1179039228.535801290
    20:25:15.964038 IP (tos 0x0, ttl 122, id 18400, offset 0, flags [none], proto UDP (17), length 76)
        172.17.172.220.ntp > 192.168.0.100.49105: [udp sum ok] NTPv3, length 48
            Server, Leap indicator:  (0), Stratum 4 (secondary reference), poll 6 (64s), precision -6
            Root Delay: 0.031219, Root dispersion: 8.154785, Reference-ID: 172.17.172.136
              Reference Timestamp:  3719467375.940868199 (2017/11/12 13:22:55)
              Originator Timestamp: 2540453432.493019109 (1980/07/03 13:30:32)
              Receive Timestamp:    3719492726.471868199 (2017/11/12 20:25:26)
              Transmit Timestamp:   3719492726.471868199 (2017/11/12 20:25:26)
                Originator - Receive Timestamp:  +1179039293.978849090
                Originator - Transmit Timestamp: +1179039293.978849090
    

    Debug mode output:

    /usr/sbin/chronyd -d -d
    2017-11-12T17:32:37Z main.c:473:(main) chronyd version 3.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SECHASH +SIGND +ASYNCDNS +IPV6 +DEBUG)
    2017-11-12T17:32:37Z conf.c:406:(CNF_ReadFile) Reading /etc/chrony.conf
    2017-11-12T17:32:37Z conf.c:572:(CNF_ParseLine) commandkey directive is no longer supported
    2017-11-12T17:32:37Z conf.c:572:(CNF_ParseLine) generatecommandkey directive is no longer supported
    2017-11-12T17:32:37Z local.c:149:(calculate_sys_precision) Clock precision 0.000000016 (-26)
    2017-11-12T17:32:37Z sys_linux.c:317:(get_version_specific_details) Linux kernel major=3 minor=10 patch=0
    2017-11-12T17:32:37Z sys_linux.c:338:(get_version_specific_details) hz=100 nominal_tick=10000 max_tick_bias=1000
    2017-11-12T17:32:37Z local.c:663:(lcl_RegisterSystemDrivers) Local freq=11.390ppm
    2017-11-12T17:32:37Z util.c:1172:(UTI_DropRoot) Dropped root privileges: UID 998 GID 996
    2017-11-12T17:32:37Z reference.c:209:(REF_Initialise) Frequency 11.390 +/- 0.031 ppm read from /var/lib/chrony/drift
    2017-11-12T17:32:37Z sys_generic.c:251:(update_slew) slew offset=0.000000e+00 corr_rate=0.000000e+00 base_freq=11.389873 total_freq=11.389862 slew_freq=-1.093958e-11 duration=10000.000000 slew_error=1.203354e-13
    2017-11-12T17:32:37Z ntp_core.c:1089:(transmit_timeout) Transmit timeout for [172.17.172.220:123]
    2017-11-12T17:32:37Z ntp_io.c:831:(NIO_SendPacket) Sent 48 bytes to 172.17.172.220:123 from [UNSPEC] fd 8
    2017-11-12T17:32:37Z ntp_io_linux.c:652:(NIO_Linux_ProcessMessage) Received 90 (48) bytes from error queue for 172.17.172.220:123 fd=8 if=3 tss=1
    2017-11-12T17:32:37Z ntp_core.c:1994:(update_tx_timestamp) Updated TX timestamp delay=0.000010086
    2017-11-12T17:32:38Z ntp_io.c:669:(process_message) Received 48 bytes from 172.17.172.220:123 to 192.168.0.100 fd=8 if=3 tss=1 delay=0.000014398
    2017-11-12T17:32:38Z ntp_core.c:1563:(receive_packet) NTP packet lvm=34 stratum=4 poll=6 prec=-6 root_delay=0.031219 root_disp=8.201569 refid=ac11ac88 []
    2017-11-12T17:32:38Z ntp_core.c:1568:(receive_packet) reference=1510478575.936134800 origin=3724568162.405584875 receive=1510507968.499134800 transmit=1510507968.499134800
    2017-11-12T17:32:38Z ntp_core.c:1570:(receive_packet) offset=10.547374307 delay=0.099570973 dispersion=0.015824 root_delay=0.130790 root_dispersion=8.217393
    2017-11-12T17:32:38Z ntp_core.c:1573:(receive_packet) remote_interval=0.000000000 local_interval=0.099570973 server_interval=0.000000000 txs=K rxs=K
    2017-11-12T17:32:38Z ntp_core.c:1577:(receive_packet) test123=111 test567=111 testABCD=1111 kod_rate=0 interleaved=0 presend=0 valid=1 good=1 updated=1
    2017-11-12T17:32:38Z sources.c:353:(SRC_AccumulateSample) ip=[172.17.172.220] t=1510507957.951760493 ofs=-10.547374 del=0.130790 disp=8.217393 str=4
    2017-11-12T17:32:38Z sourcestats.c:658:(SST_GetSelectionData) n=1 off=-10.547374 dist=8.282888 sd=4.000000 first_ago=0.049800 last_ago=0.049800 selok=0
    2017-11-12T17:32:38Z sources.c:770:(SRC_SelectSource) badstat=1 sel=0 badstat_reach=1 sel_reach=0 max_reach_ago=0.000000
    

    Confirming that the issue within ver 3.1:

    By removing 3.1 yum remove chrony and reverting back to chronyd version 2.1.1 yum localinstall /home/chrony-2.1.1-1.el7.centos.x86_64.rpm, Sync worked perfect!

  • jewbix.cube
    jewbix.cube almost 4 years
    I had servers that were almost a full day offset and wouldn't sync. Updating maxdistance solved this for me, thank you.