Title: Synchronization
1Synchronization
2Synchronization
- Multiple processes sometimes need to agree on
order of a sequence of events. - This requires some synchronization, which is more
elaborate in distributed systems. - Synchronization may be based on time (absolute or
relative), leader election - The aim is to make it global
3Clock Synchronization
Time
- Execution of Make utility in a distributed
system The edited local version is created
later than the object file according to the local
clocks, although this was because of the
discrepancy of local clocks. - When each machine has its own clock, an event
that occurred after another event may
nevertheless be assigned an earlier time.
4Physical Clocks (1)
- Computation of the mean solar day.
- The period of earths rotation is not constant
- Starting 1958 International Atomic Time (TAI) was
accepted, counting the number transitions of
Cesium 133 in an average solar second
(9,192,631,770 transitions1 second), one solar
second is 1/86400 solar day, which is between to
sun peak times in the sky. Averaged over 50 labs. - Solar day length seems to changed because of
atmospheric drag and tidal friction issues
5Physical Clocks (2)
- TAI seconds are of constant length, unlike solar
seconds. However leap seconds are introduced
when necessary (about 3 msec in a day), to keep
in phase with the sun, 1 sec in every 800 msec of
discrepancy. So far, since 1958, 30 leap seconds
are introduced - This is known as Universal Coordinated Time or UTC
6Clock Synchronization Algorithms
- The relation between clock time and UTC when
clocks tick at different rates. - In perfect world, C(t)t, where t is the UTC,
C(t) is value of the local clock, on all
machines. With modern timer chips, the relative
error is 10-5. - Two clocks needs to be synchronized according to
maximum drift rate for each clock. - If difference between two clocks is to be limited
to ?, then a resynchronization is required every
?/2? seconds, if the ? is the max drift rate.
2?, when clocks drifts in opposite direction.
7Cristian's Algorithm
- Getting the current time from a time server.
- The time should never set to smaller value, as it
will cause consistency problems. So, a large
discrepancy should be consumed slowly, by
adjusting numb of msec to be added per clock
interrupt. - (T1-T0-I)/2 is the one way propagation time,
counting for the servers request (interrupt)
handling time I. Cristian suggest taking average
of the delays in the system Note that the time
server is passive.
8The Berkeley Algorithm the time server is active
and poling the clients.
- The time daemon sends its time and asks all the
other machines for their clock discrepancy values - The answers from the machines is received and an
average time discrepancy is computed, for each
computer - Then, the time daemon tells everyone else how to
adjust their clock - The daemonss time need to be set periodically by
the operator or radio time servers
9Distributed Clock synchronization
- Cristians and Berkeleys algorithms are
centralized - In decentralized distributed algorithms case,
every machine should periodically broadcast its
time and collects time from other peers. - Every peer comes to conclusion about the average
time, using the same algorithm distributedly,
taking into account the communication latencies - In the Internet, a so called Network Time
Protocol-NTP is used, which is assumed to achieve
1-50 msec accuracy.
10Network Time Protocol-NTP
- RFC 1305 defines the NTP
- The recent implementations provide accuracy of
up to 1 microseconds - It is designed to execute on top of IP and UDP
- NTP is organized into multiple Tree structures,
with primary servers at the root the secondary
servers at the internal nodes - NTP design goals accurate UTC synchronization,
Survival despite the losses of connectivity,
allow frequent resynchronization, protect against
malicious interference - NTP communicates clock offset (diff between two
clocks), round-trip delay, dispersion (max error) - Statistical technique is used, based on multiple
comparisons of timing information exchanged - It may operate in three modes multicast,
client/server, symmetric - The SNTP-Simple NTP is also defined in RFC 1769,
with no fault tolerance
11Use of Synchronized clocks
- Used in the implementation of at-most-once
message delivery - Every message is sent with a connection number
and a time stamp - For each connection the recent time stamp is
recorded - If any message on any connection is lower than
the recorded one, the message is discarded. - To remove old messages,
- The server removes all the messages with old
time stamps older than - GCurrentTime-MaxLifeTime-MaxClockSkew
- MaxLifeTime is the max time a message can live
in the system - MaxClockSkew is the distance from UTC.
- To recover from a crash, every ?T, G needs to be
written to the hard disk, to be processed later,
during the recovery phase.
12Coordinator or Leader Election Algorithms
- Bully Algorithm
- A process holds an election for the coordinator,
if it thinks coordinator is failed - Send an election message to all the processes
with higher id numbers, - If no one responds process declares itself as
coordinator - If on of the higher-ups answer, it withdraws from
the contest - Ring Algorithm
- The process are logically or physically ordered
- Process detecting the missing coordinators sends
a message down the ring, if message comes back
to the sender, then it declares itself as the
coordinator
13The Bully Algorithm (1)
- The bully election algorithm
- Process 4 holds an election
- Process 5 and 6 respond, telling 4 to stop
- Now 5 and 6 each hold an election
14The Bully Algorithm (2)
- Process 6 tells 5 to stop
- Process 6 wins and tells everyone
15A Ring Algorithm
- Election algorithm using a ring. Both 5 and 2
decide on failure of the coordinator, about the
same time. Both messages make a full trip round
the network.
16Mutual Exclusion
- Mutual exclusion involves execution of critical
sections, one at a time, in mutual exclusion. - In centralized systems this is achieved using
semaphores, monitors, and similar constructs - How to establish mutual exclusion in distributed
systems - Centralized approach
- Distributed approach
17Mutual Exclusion A Centralized Algorithm
- Process 1 asks the coordinator for permission to
enter a critical region. Permission is granted - Process 2 then asks permission to enter the same
critical region. The coordinator does not reply. - When process 1 exits the critical region, it
tells the coordinator, it will then reply to 2
18MXA Distributed Algorithm
- Two processes want to enter the same critical
region at the same moment. Processes 0 and 2
contend for the CR, so they send a time stamped
MX access to the resource message to every one
else. - Process 0 has the lowest timestamp, so it wins.
- When process 0 is done, it sends an OK also, so 2
can now enter the critical region.
19MXA Token Ring Algorithm
- An unordered group of processes on a network,
logically numbered. - A logical ring constructed in software, where a
token is released by one of the nodes, initially
0. - Token loss must be handled properly, with token
generation algorithm. - Node failure must be handled too
20Comparisonnumber of messages per process to
enter/exit a critical region
- A comparison of three mutual exclusion algorithms
for n odes, regarding complexity and failure or
loss situation.
21The Transaction Model
- Transaction model is all or nothing model.
- Analogy can be made with a discussion process
going on for a project towards signing a
contract. Unless the contract is signed, any
party can withdraw with no harm. - Programming with tx requires special primitives
supplied by the OS, language, or a middleware.
The exact list of primitives may be different for
different application or system environments.
22The Transaction Model (1)
- Updating a daily master inventory tape is fault
tolerant. If something goes wrong, every thing is
redone from the beginning, ie. rewind the tapes
to the beginning and restart the process- all or
nothing.
23The Transaction Model (2)
- Typical examples of primitives for transactions.
Either all nothing between the begin and end is
executed.
24The Transaction Model (3)reservation flight seat
from NY to Malindi in Kenya, capitol city Nairobi.
- Transaction to reserve three flights commits, as
three different operations - Transaction aborts when third flight is
unavailable, during the same booking, as if
nothing has happened
25The Transaction Model (4)Transaction properties
- Atomicity-indivisibility of the tx
- Consistency-no violation of the invariants
- Isolated-no interference between concurrent txs
- Durable- changes are made permanent once
committed - ACID property of txs
26Classification of Txs
- Flat Txs- Txs of ACID properties discussed so
far not practical for most distributed tx
applications - Nested Txs- a number of logically related
complementing sub-transactions form one nested
tx. One problem is the level of ACID, top level
parent aborts very every done child must be
undone every childs universe becomarees the
universe for the parent - Distributed Txs- flat indivisible tx that
operates on data that are distributed across
multiple computers.
27Nested and Distributed Transactions
- A nested transaction
- A distributed transaction
28Implementation
- How to implement nothing or all principle in
case of Dist Txs? - Private workspace implemented so that individual
updates can be undone without effecting the
original data, defending on commit/abort - Writeahead log log of changes is created
throughout execution, so that commit/abort can be
taken care of
29Private Workspace
- The file index and disk blocks for a three-block
file - The situation after a transaction has modified
block 0 and appended block 3 - After committing
30Writeahead Log
- a) N example transaction that changes x and y
- b) d) The log before each statement is
executed. First value is before the change,
second value is after the change
31Concurrency Control (1)
- General organization of managers for handling
transactions. Top level ensures atomicity, middle
level ensures consistency, bottom level ensures
execution
32Concurrency Control (2)
- General organization of managers for handling
distributed transactions.
33SerializabilityFinal result of concurrent tx
exec should be same for different runs, as if the
txs are sequentially executed Concurrency
control algs should synchronize tex executions
(d)
- a) c) Three transactions T1, T2, and T3
- d) Possible schedules
34Concurrency Control Methods
- Two-phase locking
- Pessimistic time-stamp ordering
- Optimistic time-stamp ordering
35Two-phase locking-2PL-1
- Rcquire all the locks during the growing phase,
release them during the shrinking phase. - On conflict operation is delayed
- A lock is never released before the operation on
the data for which the lock is set is complete - Once a lock is released on behalf of a
transaction no other lock can b granted to the
same transaction - In strict 2PL, all the acquired resource are
released at the same timeThis avoids cascaded
aborts deadlocks - 2PL can easily cause deadlocks to happen
- Centralized and versions of distributed 2PL are
possible
36Two-Phase Locking (2)
37Two-Phase Locking (3)
- Strict two-phase locking.
38Pessimistic time-stamp ordering-1
- Every operation of a Tx is time stamped as ts by
an appropriate algorithm (Lamports algorithm) - Every data item in the system is time-stamped for
the last read (tsR) and last write (tsW)
transaction operations - If two operations on a data item x conflict, the
data manager grant the operation to the Tx with
earlier ts
39Pessimistic time-stamp ordering-2
- Read operation of a Tx with time-stamp ts
- If ts lttsW abort the Tx
- If tsgttsW allow execution and set tsR to
max(ts,tsR) - Write operation of a Tx with time-stamp ts
- If ts lttsR abort the Tx
- If tsgttsR allow execution and set tsW to
max(ts,tsW) -
40Pessimistic Timestamp Ordering-3
- Concurrency control using timestamps.
41Optimistic time-stamp ordering
- Go ahead do whatever you want, if there is
conflict during the commit handle it then If
conflicts are rare, most of the time commits take
place without any problem - This requires recording of all read and write ts
on the data items, to check if any of the items
have been changed during decision a commit - Abort, if a changed is detected, commit otherwise
- This scheme has not been much research for
distributed systems
42Snapshot Protocols
- Snapshot Protocol 2
- Process p0 sends take snapshot at ? to all
process and than sets its clock to ? - when its LC reaches ?, pi
- records its ?i and immediately
- sends an empty message along each outgoing
channel. - Start recording messages received over each of
its incoming channels - Pi stops recording messages first time a message
with TSgt ? is received from pj pi declares
messages received from pj as ?ji - Instead of using a message take snapshot at ? a
process can record its state first time it
receive a special empty message serving as a tag
message. - This is protocol 3
43Supplementary for Mullenders book
- Snapshot Protocol 2
- Already covered!!!!
44Snapshot Protocols
- Snapshot Protocol 2
- Process p0 sends take snapshot at ? to all
process and than sets its clock to ? - when its LC reaches ?, pi
- records its ?i and immediately
- sends an empty message along each outgoing
channel. - Start recording messages received over each of
its incoming channels - Pi stops recording messages first time a message
with TSgt ? is received from pj pi declares
messages received from pj as ?ji - Instead of using a message take snapshot at ? a
process can record its state first time it
receive a special empty message serving as a tag
message. - This is protocol 3
45Properties of Snapshots
- Any state constructed by distributed snapshot
algorithm is guaranteed to be consistent.
However, the actual run may not pass through the
constructed states, - yet constructed states are, but the relation
related to the constructed state holds in in
general - Order of two events in a run can be swapped to
put in pre-recording post-recording order.
46Properties of Global Predicates
- Once a predicate became true it remains to be
true is Stability criteria for the predicate
(figure 4.16).