Timekeeping For Distributed Computers

Ask any programmer who has ever had to deal with timekeeping on a computer, and they’re likely to go on at length about how it can be a surprisingly difficult thing to keep track of. Time zones, leap years, leap seconds, various timekeeping standards, clock drift, and even relativity are all problems that can creep in to projects. Issues with timekeeping are exacerbated in distributed systems as well, adding another layer of complexity when we need to reliably determine the order that a series of actions occurred across a number of different computers with a high precision. One solution to this problem is the implementation of a vector clock.

When using other systems such as logical clocks to attempt to keep track of the order of events on different computers, a problem that may arise is that these systems don’t always track these changes with perfect reliability due to many issues such as varying temperature, race conditions, or clock skew. The vector clock instead tracks causal relationships between events. Each separate process maintains its own vector clock, represented by a list of integers. When one of these processes performs an event, it increments its own clock and sends it out to the rest of the system. By keeping track of this clock as it is updated by various processes across the computer the distributed system can be much more confident about the order in which events took place.

Of course, there are always downsides with elegant solutions like this. In the case of vector clocks the downside is largely increased overhead for keeping track of all of the sets of integers. But in systems where the ordering of processes is of the upmost importance, this is worth the trade-off to ensure reliability. And unless we hook all of our computers up to atomic clocks like they do for some computers at CERN we will have to take the increased overhead instead.