Title: Logical Clocks
1Logical Clocks
2Time A major issue in distributed systems
- We tend to casually use temporal concepts
- Example p suspects that q has failed
- Implies a notion of time first q was believed
correct, later q is suspected faulty - Challenge relating local notion of time in a
single process to a global notion of time - Discuss this issue before developing practical
tools for dealing with other aspects, such as
system state
3Time in Distributed Systems
- Three notions of time
- Time seen by external observer. A global clock
of perfect accuracy - Time seen on clocks of individual processes.
Each has its own clock, and clocks may drift out
of sync. - Logical notion of time event a occurs before
event b and this is detectable because
information about a may have reached b.
4External Time
- The gold standard against which many protocols
are defined - Not implementable no system can avoid uncertain
details that limit temporal precision! - Use of external time is also risky many
protocols that seek to provide properties defined
by external observers are extremely costly and,
sometimes, are unable to cope with failures
5Time seen on internal clocks
- Most workstations have reasonable clocks
- Clock synchronization is the big problem (will
visit topic later in course) clocks can drift
apart and resynchronization, in software, is
inaccurate - Unpredictable speeds a feature of all computing
systems, hence cant predict how long events will
take (e.g. how long it will take to send a
message and be sure it was delivered to the
destination)
6Logical notion of time
- Has no clock in the sense of real-time
- Focus is on definition of the happens before
relationship a happens before b if - both occur at same place and a finished before b
started, or - a is the send of message m, b is the delivery of
m, or - a and b are linked by a chain of such events
7Logical time as a time-space picture
a
p0 p1 p2 p3
a, b are concurrent
c
c happens after a, b
b
d
d happens after a, b, c
8Notation
- Use arrow to represent happens-before relation
- For previous slide
- a ? c, b ? c, c ? d
- hence, a ? d, b ? d
- a, b are concurrent
- Also called the potential causality relation
9Logical clocks
- Proposed by Lamport to represent causal order
- Write LT(e) to denote logical timestamp of an
event e, LT(m) for a timestamp on a message,
LT(p) for the timestamp associated with process p
- Algorithm ensures that if a ? b, then
LT(a) lt LT(b)
10Algorithm
- Each process maintains a counter, LT(p)
- For each event other than message delivery set
LT(p) LT(p)1 - When sending message m, set LT(m) LT(p)
- When delivering message m to process q, set
LT(q) max(LT(m), LT(q))1
11Illustration of logical timestamps
0 1 2
7
p0 p1 p2 p3
0 2 3 4 5 6
0 1
0 1
6
12Concurrent events
- If a, b are concurrent, LT(a) and LT(b) may have
arbitrary values! - Thus, logical time lets us determine that a
potentially happened before b, but not that a
definitely did so! - Example processes p and q never communicate.
Both will have events 1, 2, ... but even if
LT(e)ltLT(e) e may not have happened before e
13Vector timestamps
- Extend logical timestamps into a list of
counters, one per process in the system - Again, each process keeps its own copy
- Event e occurs at process p p increments
VT(p)p (pth entry in its own vector clock) - q receives a message from p q sets
VT(q)max(VT(q),VT(p)) (element-by-element)
14Illustration of vector timestamps
1,0,0,0 2,0,0,0
p0 p1 p2 p3
2,1,1,0 2,2,1,0
0,0,1,0
0,0,0,1
15Vector timestamps accurately represent
happens-before relation
- Define VT(e)ltVT(e) if,
- for all i, VT(e)iltVT(e)i, and
- for some j, VT(e)jltVT(e)j
- Example if VT(e)2,1,1,0 and VT(e)2,3,1,0
then VT(e)ltVT(e) - Notice that not all VTs are comparable under
this rule consider 4,0,0,0 and 0,0,0,4
16Vector timestamps accurately represent
happens-before relation
- Now can show that VT(e)ltVT(e) if andonly if e
? e - If e ? e, then there exists a chain e0 ? e1 ?
... ? en on which vector timestamps increase hop
by hop - If VT(e)ltVT(e) suffices to look at
VT(e)proc(e), where proc(e) is the place that
e occured. By definition, we know that
VT(e)proc(e) is at least as large as
VT(e)proc(e), and by construction, this implies
a chain of events from e to e
17Examples of VTs and happens-before
- Example suppose that VT(e)2,1,0,1 and
VT(e)2,3,0,1, so VT(e)ltVT(e) - How did e learn about the 3 and the 1?
- Either these events occured at the same place as
e, or - Some chain of send/receive events carried the
values! - If VTs are not comparable, the corresponding
events are concurrent
18Notice that vector timestamps require a static
notion of system membership
- For vector to make sense, must agree on the
number of entries - Later will see that vector timestamps are useful
within groups of processes - Will also find ways to compress them and to deal
with dynamic group membership changes
19What about real-time clocks?
- Accuracy of clock synchronization is ultimately
limited by uncertainty in communication latencies - These latencies are large compared with speed
of modern processors (typical latency may be 35us
to 500us, time for thousands of instructions) - Limits use of real-time clocks to
coarse-grained applications
20Interpretations of temporal terms
- Understand now that a happens before b means
that information can flow from a to b - Understand that a is concurrent with b means
that there is no information flow between a and b - What about the notion of an instant in time,
over a set of processes?
21Neither clock is appropriate
- Problem is that with both clocks, there can be
many events that are concurrent with a given
event - Leads to a philosophical question
- Event e has happened at process p
- Which events are really simultaneous with p?
22Perspectives on logical time
- One view is based on intuition from physics
- Imagine a time-space diagram
- Cones of causality define past and future
- Now is any cut across the system consistent
including no future events and no past events - Next Tuesday will see algorithms based on this
23Causal notions of past, future
a
p0 p1 p2 p3
d
e
f
b
g
c
24Causal notions of past, future
FUTURE
a
p0 p1 p2 p3
d
e
PAST
f
b
g
c
25Issues raised by time
- Time is a tool
- Typical uses of time?
- To put events into some sort of order
- Example the order of updates on a replicated
data item - With one item, logical time may make sense
- With multiple items, consider VT with one element
per item
26Ways to extend time to a total order
- Often extend a logical timestamp or vector
timestamp with actual clock time when the event
occurred and process id where it occurred - Combination breaks any possible ties
- Or can use event names
27An example
- Suppose we are broadcasting messages
- Atomic broadcast is
- Fault-tolerant unless every process with a copy
fails, the message is delivered everywhere (often
expressed as all or nothing delivery) - Ordered if p, q both receive m, n, either both
receive m before n, or both receive n before m - How should we implement this policy?
28Easy case
- In many systems there is really just one source
of broadcasts - Typically we see this pattern when there is
really one reference copy of a replicated object
and the replicas are viewed as cached copies - Accordingly we can use a FIFO ordered broadcast
and reduce the problem to fault-tolerance - FIFO ordering simply requires a counter from
sender
29A more complex example
- Sender-ordered multicast
- Sender places a timestamp in the broadcast
- Receiver waits until it has full set of messages
- Orders them by logical timestamp, breaks ties
with sender-id - Then delivers in this order
- How can it tell when it has the full set?
30A more complex example
m
Deliver m,n or n,m?
n
31A more complex example
- Solution implicitly depends upon membership
- In fact, most distributed systems depend upon
membership - Membership is the most fundamental idea in many
systems for this reason - Receiver can simply wait until all members have
sent one message - System ends up running in rounds, where each
member contributes zero or one messages per round - Use a null message if you have nothing to send
32A more complex example
m
n
33Optimizations
- We could agree in advance on permission to send
- Now, perhaps only p, q have permission
- We treat their messages in rounds but others must
get permission before sending - Avoids all the null messages and ensures fairness
if p, q send at same rate - Dolev explored extensions for varied rates, gets
quite elaborate
34Optimizations
- In the limit, we end up with a token scheme
- While holding the token, p has permission to send
- If q requests the token p must release it
(perhaps after a small delay) - Token carries the sequence number to use
35A more complex example
m1
36A more complex example
m1
37A more complex example
m1
n2
38An example
- Such solutions are expressed in many ways
- With a ring Chang and Maxemchuck messages are
like a train with new message tacked onto end
and old ones delivered from front - Direct all-to-all broadcast
- Like a token moving around the ring, but it
carries the messages with it (inspired by FDDI) - Tree structured in various ways
39More examples
- Old Isis system uses logical clocks
- Sender says here is a message
- Receivers maintain logical clocks. Each proposes
a delivery time - Sender gathers votes, picks maximum, says commit
delivery at time t - Receivers deliver committed messages in timestamp
order from front of a queue
40More examples
m m1,p n2,p
n1,q m2,q
m1,r n2,r
41More examples
m m1,p n2,p m2,q
n1,q m2,q n2,r
m1,r n2,r
42More examples
m m1,p n2,p m2,q m! n!
n1,q m2,q n2,r m!n!
m1,r n2,r m!n!
43More examples
- Later versions of Isis used vector times
- Membership is handled separately
- Each message is assigned a vector time
- Delivered in vector time order, with ties broken
using process id of the sender
44Totem and Transis
- These systems represent time using partial order
information - Message m arrives and includes ordering fields
- Deliver m after n and o
- By transitivity, if n is after p, them m is after
p - Break ties using process id number
45Totem and Transis
m
n o
p
46Things to notice
- Time is just a programming tool
- But membership and message atomicity are very
fundamental - Waiting for m wont work if m never arrives
- And VT is only meaningful if we can agree on the
meaning of the indicies - With failures, these algorithms get surprisingly
complicated suppose p fails while sending m?
47Major uses of time
- To order updates on replicated data
- To define versions of objects
- To deal with processes that come and go in
dynamic networked applications - Processes that joined earlier often have more
complete knowledge of system state - Process that leaves and rejoins often needs some
form of incrementing incarnation number - To prove correctness of complex protocols