Timekeeping For Distributed Computers

Ask any programmer who has ever had to deal with timekeeping on a computer, and they’re likely to go on at length about how it can be a surprisingly difficult thing to keep track of. Time zones, leap years, leap seconds, various timekeeping standards, clock drift, and even relativity are all problems that can creep in to projects. Issues with timekeeping are exacerbated in distributed systems as well, adding another layer of complexity when we need to reliably determine the order that a series of actions occurred across a number of different computers with a high precision. One solution to this problem is the implementation of a vector clock.

When using other systems such as logical clocks to attempt to keep track of the order of events on different computers, a problem that may arise is that these systems don’t always track these changes with perfect reliability due to many issues such as varying temperature, race conditions, or clock skew. The vector clock instead tracks causal relationships between events. Each separate process maintains its own vector clock, represented by a list of integers. When one of these processes performs an event, it increments its own clock and sends it out to the rest of the system. By keeping track of this clock as it is updated by various processes across the computer the distributed system can be much more confident about the order in which events took place.

Of course, there are always downsides with elegant solutions like this. In the case of vector clocks the downside is largely increased overhead for keeping track of all of the sets of integers. But in systems where the ordering of processes is of the upmost importance, this is worth the trade-off to ensure reliability. And unless we hook all of our computers up to atomic clocks like they do for some computers at CERN we will have to take the increased overhead instead.

14 thoughts on “Timekeeping For Distributed Computers

    1. I was going to say, just use ntp UTC on all computers. But I think the point of this article is more this sentence:

      “systems don’t always track these changes with perfect reliability due to many issues such as varying temperature, race conditions, or clock skew.”

      Still you would think UTC with ntp updates would serve as a good correction

          1. With NTP, definitely not: the asymmetric route problem already causes you problems.

            But overall, it just depends on what you’re doing. If you’re processing packets so fast that nanoseconds matter, you’re just not going to be able to control the time on two systems well enough to guarantee it, which is what Drone’s explaining below. At some point you have to do time transfer a different way.

            Basically, the problem is that it’s much, much harder to improve the synchronization between two systems than it is to increase the bandwidth. And in fact at some point you start to just hit physics, and it becomes impossible.

            But guaranteeing ordering with synchronization means your bandwidth can’t scale without improving the synchronization methods. Vector clocks break that dependency by guaranteeing ordering within the system itself.

        1. @Pat said: “NTP’s accuracy isn’t anywhere near good enough at high enough performance levels.”

          The problem is not just accuracy, the problem is things like NTP are entirely the wrong tools to begin with.

          These are distributed systems.[1] This is not just timekeeping, it’s also a question of causality and determinism. Imagine a distributed computerized financial system. It’s not just the value of a currency or asset at a given time, it’s when a transaction was made and who made it relative to other traders? In bidding, who bid first, second, et-cetera, and when? Most importantly, does everyone trust the arbiter of the transactions, and how is the arbiter itself proven trustworty?

          Take a look at Matrix Clocks. Matrix clocks mitigate some, but not all, issues with timekeeping, causality, and determinism in distributed systems.[2][3] The subject is deep; you can spend an entire life struggling in this field.

          * References:

          1. Distributed Computing

          https://en.wikipedia.org/wiki/Distributed_computing

          2. Matrix Clock (no, not a clock with a matrix display)

          A matrix clock is a mechanism for capturing chronological and causal relationships in a distributed system. Matrix clocks are a generalization of the notion of vector clocks.

          https://en.wikipedia.org/wiki/Matrix_clock

          3. On reducing the complexity of matrix clocks

          https://www.sciencedirect.com/science/article/abs/pii/S0167819103000668?via%3Dihub

          1. Yeah, that’s why I said the accuracy wasn’t *anywhere* near enough. If you could somehow get accuracy + precision enough that it literally wasn’t possible to have ordering issues, maybe.

            But from a realistic perspective that’s just not possible. You can drift away enough in a single second to have questions regarding ordering.

    2. Or a radio receiver for a time station.
      They used to exist. Connected to the COM port of a DOS or Windows 3.1 PC..

      From that point on, the network software could distribute the time information to other PCs in a local network, if needed.

  1. In the 90s we were doing distributed computing with DCE, and using DTS. Ntp not that accurate but it works in many use cases. We currently use PTP for greater precision in the use cases where Ntp is not accurate enough for.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.