How does one make sure or even guarantee server time are sync correctly between dozens of servers across multiple datacenter on different location?

5,660

Solution 1

some dude from data center accidentally modify one of the web server date/time

This is your first problem. It is most likely caused by a combination of:

  • 'dude[s] from [the] data center' with insufficient training and
  • Overly high privileges

Changing system time requires administrative privileges. Changing the time manually on a system that not only has the correct time, but whose time is being managed using NTP is a sign of insufficient training. Solve this problem first, because until you solve it, accurate system time is probably the most visible of your problems. What else are they doing on this system, and why?

My managers ... said we shouldn't use timestamp to check expiry in the first place

If there is a viable alternative option that has been proposed, I'd at least consider it. Somehow I suspect that isn't the case.

Network Time Protocol is implemented, because of data centers are spread across different continents so we have one NTP server in each data center.

I'd recommend two in each data center. And I'd have them each reference a different set of external NTP servers as well as referencing each other. This is going to result in more stable time and make you much more robust to single failures. I'm also paranoid and over-engineer things, so there's that. Still, NTP servers require roughly nil in terms of resources so run them wherever.

The servers within the data center will have cron jobs to check against the time with their NTP server from the same data center. If time is out of sync it will auto update the server date/time.

This is a bad plan. Cron has no place changing the time in an NTP system. The servers should run real NTP clients. These clients should each reference the (two) local NTP servers.

If you want to use cron, use cron on each server to verify that the server is successfully synchronized with both local NTP servers. You can do this by examining the output of the ntpq command. You should learn about the output of the ntpq command; it is your friend.

To address the questions you report as having been raised:

But then with our managers not happy with it, and think it could still easily causes the same problem. e.g. what if someone accidentally modify the NTP date/time? what if all the NTP servers are out of sync with each other? which NTP servers we can really trust? and blah blah..

The first question isn't insane. A bit paranoid if taken to the extreme, but fine. Answers are:

  • Use more than one independent reference clock. (a single error will be ignored in preference to stable time from other sources)
  • Use a trustworthy reference clock (e.g. GPS) (If your ops guys can modify the time on a GPS satellite accidentally, you have more serious problems than web server clocks.)
  • Use cryptographic keys to ensure that the reference clock you are communicating with is the one you trust.

The second is addressed by configuring the NTP servers to reference each other. They will tend to pull together, all other things being equal. Also by using independent trustworthy reference clocks.

  • If one of three lower stratum reference clocks goes out of sync, it will be ignored.
  • If two go wildly out of sync, they will be ignored.
  • If all three clocks go wildly out of sync, NTP will ignore all three of them and do the best it can (still pretty good, especially if there is a equal-stratum clock it can reference.)
  • You pretty much only have to worry about a malicious attack here.

It can get complex to describe these cases, but NTP is about stable first, and accurate if it has an accurate source.

As far as trust, most people who run a public NTP server have no reason to interfere with your time. Many of them have a reason to provide accurate time. In terms of level of interest in providing accurate time, I'd suggest:

  • GPS satellites.
  • NIST NTP servers.
  • Any well known stratum 1 provider.
  • Any well known stratum 2 provider.
  • Your datacenter (assuming you purchase hosting) should probably have an NTP server or three of their own, for their own use if no other.

Also, and this is important: The NTP protocol is designed to synchronize time to within milliseconds. Not seconds. If you use cron + ntpdate, your time can be off by multiple seconds (thank you variable latency!). NTP will keep your clocks much more stable and accurate under similar circumstances.

Solution 2

Properly configured NTP and GMT for all the servers is the best practice. There are GPS master clock servers you can buy, if this is a huge deal, you have the money, and can justify buying one for each data center. This seems like an operations problem -- they should monitor the times on the servers and alert if they get significantly out of wack.

Share:
5,660

Related videos on Youtube

forestclown
Author by

forestclown

Updated on September 18, 2022

Comments

  • forestclown
    forestclown over 1 year

    Currently our web applications contain a logic to check if the data sent to the web server is expired or not by comparing the timestamp of the data with the date/time of the server.

    Everything goes will, until some dude from data center accidentally modify one of the web server date/time and causes some disruptions in our web services. My managers are of course not happy with this, and said we shouldn't use timestamp to check expiry in the first place...anyway....

    Network Time Protocol is implemented, because of data centers are spread across different continents so we have one NTP server in each data center. The servers within the data center will have cron jobs to check against the time with their NTP server from the same data center. If time is out of sync it will auto update the server date/time.

    But then with our managers not happy with it, and think it could still easily causes the same problem. e.g. what if someone accidentally modify the NTP date/time? what if all the NTP servers are out of sync with each other? which NTP servers we can really trust? and blah blah..

    So my questions are:

    1. What are the current practice to sync date/time between servers across multiple data centers or locations?
    2. How does one manages time stamp between web apps? e.g. Server A send data (contain timestamp of Server A) to Server B (compare timestamp between Server B and the timestamp from the data to see if it has expired or not. This is to avoid HTTP replay)
    3. Should we really not use timestamp check?

    Thanks & Best Regards

    • Admin
      Admin over 12 years
      what if someone accidentally modify the NTP date/time? What if NATO declare war with your data center? What if some alien want to screw up your server? Nothing can fix stupidity.
    • Admin
      Admin over 12 years
      btw, don't use cronjob to sync. Run the ntp daemon --- it would adjust the drift (make your clock go faster or slower) so the clock won't "jump".
    • Admin
      Admin over 12 years
      Why not sync all of the servers with the same source clock, either one of your or an external source?
    • Admin
      Admin over 12 years
      Thanks J-16 SDiZ totally agree with your first comment, that was how I felt...and also good note on second comment! My bad We are hoping to sync with one source clock initially but afraid of network latency because we have servers spreading across the globe.
    • Admin
      Admin over 12 years
      @forestclown NTP accounts for network latency. Have an NTP server an each location, synchronizing with nearby public NTP servers and then peering with each other. But, nothing here will prevent someone from modifying the clock directly (nor would a clock syncing to GPS transmissions); it'll simply fix the time quickly when someone does break it.
    • Admin
      Admin over 12 years
      @Shane Madden, NTP won't fix the time quickly (or usually at all) if someone breaks it. The most likely case (IMHO) is someone 'correcting' the time because they misunderstood the time zone (it takes a very "helpful" person to go out of their way to correct the seconds portion of a computer clock). NTP won't fix time that is off by an hour. It will ignore the external clocks that are now 'very' wrong. Even if the change is small (minutes?), it will take some time for the clock to be corrected.
    • Admin
      Admin over 12 years
      @Slartibartfast Depends what OS and what NTP client is in use, of course. I'm curious which client you've seen that behavior with?
    • Admin
      Admin over 12 years
      Not sure which behaviour you refer to. The refusing to 'correct' a wildly divergent clock is referenced here (-g option): eecis.udel.edu/~mills/ntp/html/ntpd.html . The long time to correct time differences was a reference to slewing the clock as opposed to stepping it. Reading further at the link above, it looks like that is unlikely unless you use the -x option.
    • Admin
      Admin over 12 years
      Who gives sumdood access to change the time on production servers in the first place?
  • ravi yarlagadda
    ravi yarlagadda over 12 years
    @gview Why GMT, might I ask?
  • TonyB
    TonyB over 12 years
    For the more accurate NTP you want at least 3 NTP servers all talking to one another. With only two there isn't a way to tell which server is "wrong" if there is a discrepancy. The NTP servers don't have to be dedicated servers so there is virtually no cost to running multiple per datacenter and peer the datacenters with one another.
  • Slartibartfast
    Slartibartfast over 12 years
    I certainly won't recommend against 3 NTP servers, but I don't think the concern is as serious as you seem to. How did the one NTP server become 'wrong' in a way that was gradual enough that the clients it shares with the other NTP server still think it is stable? If the server clock steps or the stratum changes then the clients will re-evaluate the server's trustworthiness.
  • al.
    al. over 12 years
    +1 for pointing out that the cron/ntpdate approach is flawed.
  • gview
    gview over 12 years
    The only way to have all the times agree is to use the same timezone for the servers.
  • TonyB
    TonyB over 12 years
    Here is more info on why the best practice is to have 4+ NTP servers: support.ntp.org/bin/view/Support/StartingNTP4#Section_7.1.4.‌​3.1.
  • gview
    gview over 12 years
    In implementations of date that I've seen GMT=UTC, but I probably should have advised using UTC.
  • Slartibartfast
    Slartibartfast over 12 years
    That's an interesting article. To best follow that advice, you'd need to have at least 16 independent external NTP servers to sync your 4 internal NTP servers from (4x4), AND you'd need to edit the minclock and minsane settings appropriately. Further, the benefit would be primarily to help ensure stable time on startup / initialization. I'll have to ponder more to decide how important I think that is for my own use.
  • Steve-o
    Steve-o over 12 years
    Your nearest router should have a good clock, saves on additional hopping.