Distributed Systems Principles and Paradigms - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Systems Principles and Paradigms

Description:

Pi timestamps event e with Ci (e) i. Then: Ci (a) i happened before Cj (b) j if and only if: ... message at Pj is queued in queuej, according to its timestamp. ... – PowerPoint PPT presentation

Number of Views:299
Avg rating:3.0/5.0
Slides: 55
Provided by: orin
Category:

less

Transcript and Presenter's Notes

Title: Distributed Systems Principles and Paradigms


1
Distributed Systems Principles and Paradigms
Chapter 05Synchronization
2
Communication Synchronization
  • Why do processes communicate in DS?
  • To exchange messages
  • To synchronize processes
  • Why do processes synchronize in DS?
  • To coordinate access of shared resources
  • To order events

3
Time, Clocks and Clock Synchronization
  • Time
  • Why is time important in DS?
  • E.g. UNIX make utility (see Fig. 5-1)
  • Clocks (Timer)
  • Physical clocks
  • Logical clocks (introduced by Leslie Lamport)
  • Vector clocks (introduced by Collin Fidge)
  • Clock Synchronization
  • How do we synchronize clocks with real-world
    time?
  • How do we synchronize clocks with each other?

05 1
Distributed Algorithms/5.1 Clock Synchronization
4
Physical Clocks (1/3)
  • Problem Clock Skew clocks gradually get out of
    synch and give different values
  • Solution Universal Coordinated Time (UTC)
  • Formerly called GMT (Greenwich Mean Time)
  • Based on the number of transitions per second of
    the cesium 133 atom (very accurate).
  • At present, the real time is taken as the average
    of some 50 cesium-clocks around the world
    International Atomic Time
  • Introduces a leap second from time to time to
    compensate that days are getting longer.
  • UTC is broadcasted through short wave radio (with
    the accuracy of /- 1 msec) and satellite
    (Geostationary Environment Operational Satellite,
    GEOS, with the accuracy of /- 0.5 msec).
  • Question Does this solve all our problems? Dont
    we now have some global timing mechanism?

05 2
Distributed Algorithms/5.1 Clock Synchronization
5
Physical Clocks (2/3)
  • Problem Suppose we have a distributed system
    with a UTC-receiver somewhere in it, we still
    have to distribute its time to each machine.
  • Basic principle
  • Every machine has a timer that generates an
    interrupt H (typically 60) times per second.
  • There is a clock in machine p that ticks on each
    timer interrupt. Denote the value of that clock
    by Cp (t) , where t is UTC time.
  • Ideally, we have that for each machine p, Cp
    (t) t, or, in other words, dC/ dt 1
  • Theoretically, a timer with H60 should generate
    216,000 ticks per hour
  • In practice, the relative error of modern timer
    chips is 10-5 (or between 215,998 and 216,002
    ticks per hour)

05 3
Distributed Algorithms/5.1 Clock Synchronization
6
Physical Clocks (3/3)
Where r is the max. drift rate
Goal Never let two clocks in any system differ
by more than d time units gt synchronize at least
every d/2r seconds.
05 4
Distributed Algorithms/5.1 Clock Synchronization
7
Clock Synchronization Principles
  • Principle I Every machine asks a time server
    for the accurate time at least once every d/2r
    seconds (see Fig. 5-5).
  • But you need an accurate measure of round trip
    delay, including interrupt handling and
    processing incoming messages.
  • Principle II Let the time server scan all
    machines periodically, calculate an average, and
    inform each machine how it should adjust its time
    relative to its present time.
  • Ok, youll probably get every machine in sync.
    Note you dont even need to propagate UTC time
    (why not?)

05 5
Distributed Algorithms/5.1 Clock Synchronization
8
Clock Synchronization Algorithms
  • The Berkeley Algorithm
  • The time server polls periodically every machine
    for its time
  • The received times are averaged and each machine
    is notified of the amount of the time it should
    adjust
  • Centralized algorithm, See Figure 5-6
  • Decentralized Algorithm
  • Every machine broadcasts its time periodically
    for fixed length resynchronization interval
  • Averages the values from all other machines (or
    averages without the highest and lowest values)
  • Network Time Protocol (NTP)
  • the most popular one used by the machines on the
    Internet
  • uses an algorithm that is a combination of
    centralized/distributed

05 6
Distributed Algorithms/5.2 Logical Clocks
9
Network Time Protocol (NTP)
  • a protocol for synchronizing the clocks of
    computers over packet-switched, variable-latency
    data networks (i.e., Internet)
  • NTP uses UDP port 123 as its transport layer. It
    is designed particularly to resist the effects of
    variable latency
  • NTPv4 can usually maintain time to within 10
    milliseconds (1/100 s) over the public Internet,
    and can achieve accuracies of 200 microseconds
    (1/5000 s) or better in local area networks under
    ideal conditions
  • visit the following URL to understand NTP in
    more detail
  • http//en.wikipedia.org/wiki/Network_Time_Protoco
    l

10
The Happened-Before Relationship
  • Problem We first need to introduce a notion of
    ordering before we can order anything.
  • The happened-before relation on the set of events
    in a distributed system is the smallest relation
    satisfying
  • If a and b are two events in the same process,
    and a comes before b, then a ? b. (a happened
    before b)
  • If a is the sending of a message, and b is the
    receipt of that message, then a ? b.
  • If a ? b and b ? c, then a ? c. (transitive
    relation)
  • Note if two events, x and y, happen in different
    processes that do not exchange messages, then
    they are said to be concurrent.
  • Note this introduces a partial ordering of
    events in a system with concurrently operating
    processes.

05 6
Distributed Algorithms/5.2 Logical Clocks
11
Logical Clocks (1/2)
Problem How do we maintain a global view on the
systems behavior that is consistent with the
happened-before relation? Solution attach a
timestamp C(e) to each event e, satisfying the
following properties P1 If a and b are two
events in the same process, and a ?b, then we
demand that C (a) lt C (b) P2 If a corresponds to
sending a message m, and b to the receipt of that
message, then also C (a) lt C (b) Problem How do
we attach a timestamp to an event when theres no
global clock? ? maintain a consistent set of
logical clocks, one per process.
05 7
Distributed Algorithms/5.2 Logical Clocks
12
Logical Clocks (2/2)
Each process Pi maintains a local counter Ci and
adjusts this counter according to the following
rules (1) For any two successive events that
take place within Pi, Ci is incremented by 1. (2)
Each time a message m is sent by process Pi, the
message receives a timestamp Tm Ci. (3)
Whenever a message m is received by a process Pj,
Pj adjusts its local counter Cj Property P1 is
satisfied by (1) Property P2 by (2) and
(3). This is called the Lamports Algorithm
05 8
Distributed Algorithms/5.2 Logical Clocks
13
Logical Clocks Example
Fig 5-7. (a) Three processes, each with its own
clock. The clocks run at different rates. (b)
Lamports algorithm corrects the clocks
05 9
Distributed Algorithms/5.2 Logical Clocks
14
  • Assign the Lamports logical clock values for all
    the events in the above timing diagram. Assume
    that each processs local clock is set to 0
    initially.

15
  • From the above timing diagram, what can you say
    about the following events?
  • between a and b a ? b
  • between b and f b ? f
  • between e and k concurrent
  • between c and h concurrent
  • between k and h k ? h

16
Total Ordering with Logical Clocks
Problem it can still occur that two events
happen at the same time. Avoid this by attaching
a process number to an event Pi timestamps event
e with Ci (e) i Then Ci (a) i happened before
Cj (b) j if and only if 1 Ci (a) lt Cj (a)
or 2 Ci (a) Cj (b) and i lt j
05 10 Distributed Algorithms/5.2
Logical Clocks
17
Example Totally-Ordered Multicast (1/2)
  • Problem We sometimes need to guarantee that
    concurrent updates on a replicated database are
    seen in the same order everywhere
  • Process P1 adds 100 to an account (initial
    value 1000)
  • Process P2 increments account by 1
  • There are two replicas

Outcome in absence of proper synchronization,
replica 1 will end up with 1111, while replica
2 ends up with 1110.
05 11 Distributed Algorithms/5.2
Logical Clocks
18
Example Totally-Ordered Multicast (2/2)
  • Process Pi sends timestamped message msgi to all
    others. The message itself is put in a local
    queue queuei.
  • Any incoming message at Pj is queued in queuej,
    according to its timestamp.
  • Pj passes a message msgi to its application if
  • (1) msgi is at the head of queuej
  • (2) for each process Pk, there is a message
    msgk in queuej with a larger
    timestamp.
  • Note We are assuming that communication is
    reliable and FIFO ordered.

05 12 Distributed Algorithms/5.2 Logical
Clocks
19
  • Fidges Logical Clocks
  • with Lamports clocks, one cannot directly
    compare the timestamps of two events to determine
    their precedence relationship
  • - if C(a) lt C(b) then a ? b
  • - if C(a) lt C(b), it could be a ? b or a ? b
  • - e.g., events e and b in the previous example
    Figure
  • C(e) 1 and C(b) 2
  • thus C(e) lt C(b) but e ? b
  • the main problem is that a simple integer clock
    can not order both events within a process and
    events in different processes
  • Collin Fidge developed an algorithm that
    overcomes this problem
  • Fidges clock is represented as a vector c1 , c
    2 , , cn with an integer clock value for each
    process (ci contains the clock value of process i)

/
/
/
/
20
  • Fidges Algorithm
  • The Fidges logical clock is maintained as
    follows
  • 1 Initially all clock values are set to the
    smallest value.
  • 2 The local clock value is incremented at least
    once before each primitive event in a process.
  • 3 The current value of the entire logical clock
    vector is delivered to the receiver for every
    outgoing message.
  • 4 Values in the timestamp vectors are never
    decremented.
  • 5 Upon receiving a message, the receiver sets
    the value of each entry in its local timestamp
    vector to the maximum of the two corresponding
    values in the local vector and in the remote
    vector received.
  • The element corresponding to the sender is a
    special case it is set to one greater than the
    value received, but only if the local value is
    not greater than that received.

21
  • Get r_vector from the received msg sent by
    process q
  • if l_vector q ? r_vectorq then
  • l_vectorq r_vectorq 1
  • for i 1 to n do
  • l_vectori max(l_vectori, r_vectori)
  • Timestamps attached to the events are compared as
    follows
  • ep ? fq iff Tep p lt Tfq p
  • (where ep represents an event e occurring in
    process p, Tep represents the timestamp vector of
    the event ep , and the ith element of Tep is
    denoted by Tep i.)
  • This means event ep happened before event fq if
    and only if process q received a direct or
    indirect message from p and that message was sent
    after ep had occurred. If ep and fq are in the
    same process (i,e., p q), the local elements of
    their timestamps represent their occurrences in
    the process.

22
  • Assign the Lamports and Fidges logical clock
    values for all the events in the above timing
    diagram. Assume that each processs logical clock
    is set to 0 initially.

23
P1
P2
P3
24
  • The above diagram shows both Lamport timestamps
    (an integer value ) and Fidge timestamps (a
    vector of integer values ) for each event.
  • Lamport clocks
  • 2 lt 5 since b ? h,
  • 3 lt 4 but c ? g.
  • Fidge clocks
  • f ? h since 2 lt 4 is true,
  • b ? h since 2 lt 3 is true,
  • h ? a since 4 lt 0 is false,
  • c ? h since (3 lt 3) is false and (4 lt 0) is false.

25
P1
P2
P4
P3
a
e
j
m
b
k
f
c
n
g
d
h
o
l
i
  • Assign the Lamports and Fidges logical clock
    values for all the events in the above timing
    diagram. Assume that each processs logical clock
    is set to 0 initially.

26
  • From the above timing diagram, what can you say
    about the following events?
  • between b and n
  • between b and o
  • between m and g
  • between c and h
  • between c and l
  • between j and g
  • between k and i
  • between j and h

27
  • READING Reference
  • Colin Fidge, Logical Time in Distributed
    Computing Systems, IEEE Computer, Vol. 24, No.
    8, pp. 28-33, August 1991.

28
Global State (1/3)
Basic Idea Sometimes you want to collect the
current state of a distributed computation,
called a distributed snapshot. It consists of all
local states and messages in transit. Important
A distributed snapshot should reflect a
consistent state
05 15 Distributed Algorithms/5.3
Global State
29
Global State (2/3)
  • Note any process P can initiate taking a
    distributed snapshot
  • P starts by recording its own local state
  • P subsequently sends a marker along each of its
    outgoing channels
  • When Q receives a marker through channel C, its
    action depends on whether it had already recorded
    its local state
  • Not yet recorded it records its local
    state, and sends the marker along each of
    its outgoing channels
  • Already recorded the marker on C indicates
    that the channels state should be recorded
    all messages received before this marker and
    the time Q recorded its own state.
  • Q is finished when it has received a marker
    along each of its incoming channels

05 16 Distributed Algorithms/5.3 Global
State
30
Global State (3/3)
(a) Organization of a process and channels for a
distributed snapshot (b) Process Q receives a
marker for the first time and records its local
state (c) Q records all incoming message (d) Q
receives a marker for its incoming channel and
finishes recording the state of the incoming
channel
05 17 Distributed Algorithms/5.3 Global
State
31
Election Algorithms
Principle Many distributed algorithms require
that some process acts as a coordinator. The
question is how to select this special process
dynamically. Note In many systems the
coordinator is chosen by hand (e.g., file
servers, DNS servers). This leads to centralized
solutions gt single point of failure. Question
If a coordinator is chosen dynamically, to what
extent can we speak about a centralized or
distributed solution? Question Is a fully
distributed solution, i.e., one without a
coordinator, always more robust than any
centralized/coordinated solution?
05 18 Distributed Algorithms/5.4 Election
Algorithms
32
Election by Bullying (1/2)
  • Principle Each process has an associated
    priority (weight). The process with the highest
    priority should always be elected as the
    coordinator.
  • Issue How do we find the heaviest process?
  • Any process can just start an election by
    sending an election message to all other
    processes (assuming you dont know the weights of
    the others).
  • If a process Pheavy receives an election message
    from a lighter process Plight, it sends a
    take-over message to Plight. Plight is out of the
    race.
  • If a process doesnt get a take-over message
    back, it wins, and sends a victory message to all
    other processes.

05 19 Distributed Algorithms/5.4 Election
Algorithms
33
Election by Bullying (2/2)
Question Were assuming something very important
here what?
Assumption Each process knows the process number
of other processes
05 20 Distributed Algorithms/5.4 Election
Algorithms
34
Election in a Ring
  • Principle Process priority is obtained by
    organizing processes into a (logical) ring.
    Process with the highest priority should be
    elected as coordinator.
  • Any process can start an election by sending an
    election message to its successor. If a successor
    is down, the message is passed on to the next
    successor.
  • If a message is passed on, the sender adds
    itself to the list. When it gets back to the
    initiator, everyone had a chance to make its
    presence known.
  • The initiator sends a coordinator message around
    the ring containing a list of all living
    processes. The one with the highest priority is
    elected as coordinator. See Figure 5-12.

Question Does it matter if two processes
initiate an election?
Question What happens if a process crashes
during the election?
05 21 Distributed Algorithms/5.4 Election
Algorithms
35
Mutual Exclusion
  • Problem A number of processes in a distributed
    system want exclusive access to some resource.
  • Basic solutions
  • Via a centralized server.
  • Completely distributed, with no topology
    imposed.
  • Completely distributed, making use of a
    (logical) ring.
  • Centralized Really simple

05 22 Distributed Algorithms/5.5 Mutual
Exclusion
36
Mutual Exclusion Ricart Agrawala
  • Principle The same as Lamport except that
    acknowledgments arent sent. Instead, replies
    (i.e., grants) are sent only when
  • The receiving process has no interest in the
    shared resource or
  • The receiving process is waiting for the
    resource, but has lower priority (known through
    comparison of timestamps).
  • In all other cases, reply is deferred (see the
    algorithm on pg. 267)

05 23 Distributed Algorithms/5.5 Mutual
Exclusion
37
Mutual Exclusion Token Ring Algorithm
Essence Organize processes in a logical ring,
and let a token be passed between them. The one
that holds the token is allowed to enter the
critical region (if it wants to)
05 24 Distributed Algorithms/5.5 Mutual
Exclusion
38
Distributed Transactions
  • The transaction model
  • Classification of transactions
  • Concurrency control

39
The Transaction Model (1)
  • Updating a master tape is fault tolerant.

Question What happens if this computer operation
fails?
  • Both tapes are rewound and the job is restarted
  • from the beginning without any harm being done

40
The Transaction Model (2)
Primitive Description
BEGIN_TRANSACTION Make the start of a transaction
END_TRANSACTION Terminate the transaction and try to commit
ABORT_TRANSACTION Kill the transaction and restore the old values
READ Read data from a file, a table, or otherwise
WRITE Write data to a file, a table, or otherwise
  • Figure 5-18 Example primitives for transactions.

41
The Transaction Model (3)
BEGIN_TRANSACTION reserve BOS -gt JFK reserve JFK -gt ICN reserve SEL -gt KPOEND_TRANSACTION (a) BEGIN_TRANSACTION reserve BOS -gt JFK reserve JFK -gt ICN reserve SEL -gt KPO full gtABORT_TRANSACTION (b)
  1. Transaction to reserve three flights commits
  2. Transaction aborts when third flight is
    unavailable

42
ACID Properties of Transactions
  • Atomic
  • To the outside world, the transaction happens
    indivisibly
  • Consistent
  • The transaction does not violate system
    invariants
  • Isolated
  • Concurrent transactions do not interfere with
    each other
  • Durable
  • Once a transaction commits, the changes are
    permanent

43
Nested Transactions
  • Constructed from a number of subtransactions
  • The top-level transaction may create children
    that run in parallel with one another to gain
    performance or simplify programming
  • Each of these children is called a
    subtransaction and it may also have one or more
    subtransactions
  • When any transaction or subtransaction starts,
    it is conceptually given a private copy of all
    data in the entire system for it to manipulate as
    it wishes
  • If it aborts, its private space is destroyed
  • If it commits, its private space replaces the
    parents space
  • If the top-level transaction aborts, all the
    changes made in the subtransactions must be wiped
    out

44
Distributed Transactions
  • - Transactions involving subtransactions that
    operate on data that are distributed across
    multiple machines
  • - Separate distributed algorithms are needed to
    handle the locking of data and committing the
    entire transaction

45
Implementing Transactions
  • Private Workspace
  • Gives a private workspace (i.e., all the data it
    has access to) to a process when it begins a
    transaction
  • Writeahead Log
  • Files are actually modified in place but before
    any block is changed, a record is written to a
    log telling
  • which transaction is making the change
  • which file and block is being changed
  • what the old and new values are
  • Only after the log has been written successfully,
    the change is made to the file
  • Question Why is a log needed?
  • ? for rollback if necessary

46
Private Workspace
  1. The file index and disk blocks for a three-block
    file
  2. The situation after a transaction has modified
    block 0 and appended block 3
  3. After committing

47
Writeahead Log
x 0 y 0 BEGIN_TRANSACTION x x 1 y y 2 x y y END_TRANSACTION (a) Log x 0 / 1 (b) Log x 0 / 1 y 0 / 2 (c) Log x 0 / 1 y 0 / 2 x 1 / 4 (d)
  • (a) a transaction
  • (b) (d) The log before each statement is
    executed

48
Concurrency Control (1)
  • The goal of concurrency control is to allow
    multiple transactions to be executed
    simultaneously
  • Final result should be the same as if all
    transactions had run sequentially
  • Fig. 5-23 General organization of managers for
    handling transactions

49
Concurrency Control (2)
  • General organization of managers for handling
    distributed transactions.

50
Serializability (1)
BEGIN_TRANSACTION x 0 x x 1END_TRANSACTION (a) BEGIN_TRANSACTION x 0 x x 2END_TRANSACTION (b) BEGIN_TRANSACTION x 0 x x 3END_TRANSACTION (c)
(a) (c) Three transactions T1, T2, and T3
Schedule 1 x 0 x x 1 x 0 x x 2 x 0 x x 3 Legal
Schedule 2 x 0 x 0 x x 1 x x 2 x 0 x x 3 Legal
Schedule 3 x 0 x 0 x x 1 x 0 x x 2 x x 3 Illegal
(d)
  • (d) Possible schedules
  • Question Why is Schedule 3 illegal?

51
Serializability (2)
  • Two operations conflict is they operate on the
    same data and if at least one of them is a write
    operation
  • read-write conflict exactly one of the
    operations is a write
  • write-write conflict involves more than one
    write operations
  • Concurrency control algorithms can generally be
    classified by looking at the way read and write
    operations are synchronized
  • Using locking
  • Explicitly ordering operations using timestamps

52
Two-Phase Locking (1)
  • In two-phase locking (2PL), the scheduler first
    acquires all the locks it needs during the
    growing (1st) phase, and then releases them
    during the shrinking (2nd) phase
  • See the rules on pg. 284
  • Fig. 5-26 Two-phase locking

53
Two-Phase Locking (2)
  • In strict two-phase locking, the shrinking phase
    does not take place until the transaction has
    finished running and has either committee or
    aborted.
  • Fig. 5-27 Strict two-phase locking

54
  • READING
  • Read Chapter 5
Write a Comment
User Comments (0)
About PowerShow.com