Title: Chap 5 Distributed Coordination
1Chap 5 Distributed Coordination
- Physical clock synchronization
- Causality relation
- Snapshot taking
- Lamports logical clock
- Vector logical clock
- Multicast
- Event notification
- Leader election
- Distributed mutual exclusion
25.1 Time and clock
- Two roles of time
- - Defines temporal order among events
- - Duration (measured by timer)
- UTC (Coordinated Universal Time) is based
- on Cesium-133 atom oscillation located at
over 200 labs in the world - Leap seconds to take care of Earths slowing
rotation - With satellites, 0.5ms accuracy is possible.
- (100 MIPS ? 50,000 instructions in 0.5ms).
3Clock skew
- Skew Clock reading (from single clock) is
location-dependent, e.g., distance from satellite
or clock source on a circuit board - Drift Multiple clocks.
- t the real time
- Cp(t) the reading of a clock p at time t
(Cp(t) t for ideal clock) - dCp(t) /dt ticking rate (dCp(t) /dt 1 for
ideal clock) - max drift rate r
- 1- r ? dCp(t) /dt ? 1 r
- In t (sec), two clocks with drift rate r may
drift - apart by 2rt. 2rt lt d ? resynch every d/2r sec
4Clock synchronization methods
- Perfect synchronization physically impossible.
- Time server can receive UTC signals.
- Each client has reasonably accurate quartz
- clock, and periodically gets time from
server. -
Client
UTC
Time server
Client
5Cristians method
Tp
T0
I
Tp
T1
t
Server
Client
- Estimate Tp (propagation delay) from
- T1 T0 2 x Tp I.
- where I processing time.
- Current time t (servers time in message) Tp.
-
6OSF DCE
- Time is an interval t-e, te.
TS1
TS2
TS3
TS4
Reject
New time interval
Two intervals overlap ? cannot say which time is
earlier (In case of overlap, Unix make should
recompile).
7- Process 1
- e1 Get up
- e2 Start breakfast
- e3 End breakfast
- e4 Leave for work
- e5 Make flight
- reservation
- e6 .
- Process 2
- e1 Get up
- e2 Start breakfast
- e3 End breakfast
- e4 Leave for work
- e5 Make flight
- reservation
- e6 .
Observe 1. All events in a process are naturally
ordered. 2. Some events may causally affect an
event in another process.
85.2 Causality relation
Time
p3
p4
p1
p2
P
q2
q3
q4
q5
Q
q1
R
r1
r2
r3
r4
p1
q2
p1
p2
transitive
p1
r3
- Event changes state of process.
- State remains same till next event occurs.
9Formal definition
- a? b defined by
- If a occurs earlier than b in a process, then
a?b. - If a is sending event and b is receiving event of
same message, then a? b. - If a?b and b?c, then a?c. Transitive
- If a? b, then a causally precedes (or happened
before) b a and b are causally related - a and b are concurrent if neither a?b nor b?a.
10Message-related events
- Sending event
- Receiving event
- Message arrival (at kernel) and delivery
- (to user process) Kernel can control timing
of delivery after arrival. - Previous diagram shows only delivery times.
11Snapshots (taken at 200pm by local clocks)
A
B
B
B
A
A
100
0
0
100
159pm
100 In channel
0
100
100
201
100
sum 100
sum 0
sum 200
(a)
(b)
(c)
Snapshots taken at
12Census taking in ancient kingdom
Village
Village
Village
Village
- Want to take census counting all people, some of
whom may be traveling on highways.
13Census taking algorithm
- Close all gates into/out of each village
(process) and count people (record process state)
in village these actions need not be synched
with other villages - Open each outgoing gate and send official with a
red cap (special marker message). - Open each incoming gate and count all travelers
(record channel state messages sent but not
received yet) who arrive ahead of official. - Tally the counts from all villages.
14Algorithm SNAPSHOT
- All processes are initially white Messages sent
by white(red) processes are also white (red) - MSend Marker sending rule for process P
- Suspend all other activities until done
- Record Ps state
- Turn red
- Send one marker over each output channel of P.
- MReceive Marker receiving rule for P
- On receiving marker over channel C,
- if P is white Record state of channel C as
empty - Invoke MSend
- else record the state of C as sequence of white
messages received since P turned red. - Stop when marker is received on each incoming
channel
15Property
- If network is strongly connected and at least one
process initiates MSend, then SNAPSHOT - will take consistent global snapshot
(collection of process states and channel states).
16Snapshots taken by SNAPSHOT algorithm
A
B
B
A
msgs arriving before maker constitute channel
state
100
100 in channel
0
0
0
sum 100
sum 100
(a)
(b)
OK
OK
Need not use time.
17Snapshots taken by SNAPSHOT
B
A
B
A
100
100
0
marker
marker
100
marker
100
sum 200
sum 100
(c)
(d)
Cannot happen
Will be like this
18Cuts corresponding to snapshots
Note that they intersect
A
B
B
A
100
100 in channel
0
0
0
sum 100
sum 100
(a)
(b)
Cut is where past events end.
19Cuts
- Cut C divides all events to PC (those which
happened in the past relative to C) and FC
(future events). - Cut C is consistent if there is no message whose
sending event is in FC and whose receiving event
is in PC.
20Progress shown by cuts
Time
p4
p1
p2
p3
P
Q
q1
q2
q3
1
2
3
4
5
7
8
There are 54 20 possible cuts.
21Inconsistent cuts
Time
p4
p1
p2
p3
P
Q
q1
q2
q3
23 13 9 are inconsistent, and 11
are consistent.
Inconsistent cut cannot actually happen ?States
in Inconsistent cut could not have coexisted.
22More examples
Time
p3
p4
p1
p2
P
q2
q3
q4
q5
Q
M
q1
R
r1
r4
r2
r3
inconsistent cut
consistent cut
23State recorded by SNAPSHOT ?consistent cut
B
A
100
M
State of A before M was sent
100
State of B after M was received
sum 200
Msg M goes from future to past ? SNOPSHOP never
generates such cut (see slide 17)
24More consistent cuts
Time
p3
p4
p1
p2
P
q2
q3
q4
q5
Q
q1
R
r1
r4
r2
r3
25Checkpointing
- Cut C is consistent ? C doesnt contradict
sequence of events experienced by any site ? can
assume it did exist at the same time - Can use snapshot as checkpoint, from which
activity in distributed system can be resumed
after crash