Synchronization in Distributed Systems - PowerPoint PPT Presentation

1 / 123
About This Presentation
Title:

Synchronization in Distributed Systems

Description:

Fourth property points out that time is different in centralized and distributed ... Temporal values have meaning in a DS only within a given granularity determined ... – PowerPoint PPT presentation

Number of Views:1327
Avg rating:3.0/5.0
Slides: 124
Provided by: douglas122
Category:

less

Transcript and Presenter's Notes

Title: Synchronization in Distributed Systems


1
Synchronization in Distributed Systems
  • EECS 750
  • Spring 1999
  • Course Notes Set 3
  • Chapter 3
  • Distributed Operating Systems
  • Andrew Tanenbaum

2
Synchronization in Distributed Systems
  • A computation must be composed of components
    separated logically, if not physically, or it
    cannot be considered to be distributed
  • Just as this implied communication as a necessary
    component, so too is some form of synchronization
  • Distributed components of the computation
    cooperate and exchange information
  • This implies implicit and explicit constraints on
    how the components execute relative to one
    another
  • Such constraints are ensured and enforced by
    various forms of synchronization

3
Synchronization
  • Interacting components whose execution is
    constrained are obviously communicating
  • Communication support is a necessary but not
    sufficient set of support for distributed systems
  • Some forms of communication are synchronous
    implying both properties
  • As with communication
  • Different situations require different semantics
  • Weakest adequate semantics are usually the best
    choice
  • As with communication
  • Synchronization in distributed systems is like
    that in uni-processor systems, only more so

4
Synchronization
  • Uni-processor systems present all of the basic
    synchronization scenarios and problems
  • Critical Sections
  • Mutual Exclusion
  • Counting Semaphores - resource allocation
  • Atomic Transactions
  • BUT a uni-processor is pseudo-parallel
  • Canonical problems are often simplified or are
    special cases of the general problem
  • Multi-processors, multi-computers, and networks
    of workstations all have different implications

5
Synchronization
  • Implicit assumptions change in moving from one
    architecture to another
  • Uni-processor Multi-processor
  • True parallelism changes the probability of
    various scenarios by removing pseudo-parallel
    constraints
  • Changes the methods by which critical sections
    must be or are best protected
  • Still preserves the most basic assumption atomic
    operations on shared memory
  • Multiple caches and NUMA hierarchy can make this
    complicated

6
Synchronization
  • Single box Multiple distributed boxes
  • Violates the assumption that all components can
    have atomic access to shared memory
  • Requires new methods of supporting
    synchronization
  • All synchronization methods ultimately decide
  • Which set of computation operations must be
    controlled and which need not
  • In what order to execute computation operations
  • Sets of events that can be done at the same time
    or that can be done in any order are concurrent
  • Sets of events that must be done one at a time
    are sequential

7
Synchronization
  • Many synchronization methods in distributed
    systems thus depend on
  • How the system can tell the time at which events
    occur
  • How the system can tell the order in which events
    occur
  • Under the principle of weakening semantics for
    better performance, there are many forms of
    event ordering
  • We will consider
  • Mutual Exclusion
  • Election
  • Atomic Transactions
  • Deadlock

8
Clock Synchronization
  • Coordination in DS often requires a knowledge of
    when things happen which implies a clock of some
    kind
  • We will see that not all situations requires
    clocks with semantics of the same strength
  • Distributed systems are often more complicated
    than non-distributed equivalents because they
    require distributed rather than centralized
    algorithms
  • The properties of distributed algorithms, as
    always, determine the set of system services
    required
  • Distribution is often more complex and difficult
    than one first expects
  • Centralized architectures are not necessarily bad

9
Clock SynchronizationDistributed Algorithm
Properties
  • Distributed algorithms have properties with
    important implications
  • Information scattered among many components
  • Computation components (processes) make decisions
    based on locally available information
  • Single points of failure should be avoided
  • No common clock or other precise global time
    source
  • Yet, some form of distributed sense of time is
    required
  • How precise depends on what has to be
    synchronized
  • Coarse grain is easy
  • Most DS require fine enough grain to be hard

10
Clock Synchronization Distributed Algorithm
Properties
  • First three properties argue against
    centralization for resource allocation and other
    types of management
  • Limits scalability
  • Single point of failure
  • Requires a new approach to algorithm design
  • Fourth property points out that time is different
    in centralized and distributed systems
  • Temporal values have meaning in a DS only within
    a given granularity determined by clock
    synchronization
  • Unattended clocks can drift by hours or days
  • ITTC uses GPS and Network Time Protocol (NTP) to
    synchronize within fractions of a second

11
Clock Synchronization Distributed Algorithm
Properties
  • Consider the problem of a distributed file system
    and compilation environment with files and
    compilers on multiple distributed machines
  • Make tracks relations among source and output
    files to determine what needs to be recompiled at
    any given time
  • Make uses the creation time stamp of the files to
    determine if a source file is younger than an
    output file
  • Does not depend on the validity of the times of
    each file
  • Does depend on the times imposing a correct
    order on the set of creation events for each file
  • Single incorrect clock in uni-processor works
    just fine
  • Multiple clocks must be synchronized closely
    enough

12
Logical Clocks
  • No computer has an absolute clock and no computer
    keeps absolute time
  • Computers keep logical time for a number of
    reasons and in a number of different senses
  • Logical time is a representation of absolute time
    in the computer subject to a number of different
    constraints
  • How does a computer obtain a sense of time?
  • Often a periodic interrupt updating software
    clock
  • Imposes constraints on temporal resolution and
    overhead
  • Raising interrupt frequency raises the resolution
    but also the overhead

13
Logical Clocks
  • Where does the periodic interrupt come from?
  • Timer hardware with an oscillating crystal
  • Interrupt programmed for every N crystal
    oscillations
  • Crystals differ from one another quite a bit
  • Clock drift is the difference in rate between two
    clocks
  • Clock drift over time results in a difference in
    value between clocks called skew
  • Lamport observed
  • Clock synchronization within reasonable limits is
    possible
  • Useable synchronization need not be absolute

14
Logical Clocks
  • Degree of synchronization required depends on the
    time scale of the operations being synchronized
    and the semantics of the synchronization
  • Lamport based his approach on several
    observations
  • Components which do not interact place no
    constraint on the synchronization of their clocks
  • Interacting components often care only about the
    order in which events occur, not their times
  • Even when a global time is required, it can often
    be a logical time differing arbitrarily from
    real-time
  • When real-time does matter, the system must be
    designed to tolerate the real clock
    synchronization tolerance

15
Logical Clocks
  • Algorithms which depend on temporal ordering but
    which do not depend on absolute time use logical
    clocks
  • Absolute time is given by physical clocks
  • Lamports algorithm synchronizes logical clocks

16
Lamports Logical ClockSynchronization
  • Lamports approach to logical clocks is used in
    many situations in distributed systems where
    ordering is important but global time is not
    required
  • Example of weakening semantics to simplify and/or
    increase efficiency
  • Begin with an important relation happens-before
  • A happens-before B ( ) when all the
    processes involved in a distributed decision
    agree that event A occurred first, and that then
    B occurred
  • Note that this does not mean that A actually
    happened before B according to a hypothetical
    absolute global clock

17
Lamports Logical ClockSynchronization
  • A system can know the happens-before applies
    when
  • 1) Events A and B are observed by the same
    process, or by different processes with the same
    global clock, and A happens before B, then
  • 2) Event A denotes sending a message, and event
    B denotes receiving the same message, then
    since a message cannot be received
    before it is sent
  • 3) Happens-before is transitive so
  • If two elements X and Y do not interact through
    messages then they are concurrent since neither
    can be determined
    nor does it matter

18
Lamports Logical ClockSynchronization
  • We have, thus, distinguished between concurrent
    events whose global ordering can be ignored, and
    events to which a global logical time must be
    assigned
  • This global logical time is denoted as the
    logical clock value of an event
  • Consider the previous two situations
  • Events on the same system
  • send and receive events on different systems

19
Lamports Logical ClockSynchronization
  • On the same system, then
    trivially since the two events on the same system
    can easily use the same clock
  • Note that the temporal granularity of the system
    clock being used must be sufficient to
    distinguish A and B
  • Otherwise
  • When the events occur on different systems, we
    must assign C(A) and C(B) in such a way that the
    necessary relation holds without ever decreasing
    a time value
  • Thus logical clock values of an event may be
    changed but always by moving them forward
  • Logical clocks in a distributed system always run
    as fast or faster than the physical clocks with
    which they interact

20
Lamports Logical ClockSynchronization
  • Consider how logical times are assigned in a
    specific scenario
  • Figure 3-2 page 123 of Tanenbaum
  • In Part A of the figure the three machines have
    clocks running at different speeds, and the event
    times are not consistent with the happens-before
    relation
  • Note that message C arrives at local time 56 even
    though it was sent at local time 60
  • This is a contradiction because
  • Clearly, a message must be received after it is
    sent, so by setting the receiving clock to 61

21
Lamports Logical ClockSynchronization
  • Adjusting the receiving clock to 61 or greater
    ensures that happens-before applies and events
    can be assigned a rational logical order
  • Figure 3-2(b) shows this adjustment to the clock
    at the receiver of C
  • Every message transfer takes at least 1 time tick
  • Any clock, logical or physical, has finite
    resolution
  • Two events occurring close enough together happen
    at the same time
  • All clock values are thus limited to creating
    partial rather than total orders on a set
  • Some distributed algorithms require a total order

22
Lamports Logical ClockSynchronization
  • Additional refinement Tie Breaker
  • If a total order is required and for two events A
    and B
  • then we use some unique property of the processes
    associated with the events to choose the winner
  • Process ID (PID) is often used for this purpose
  • Establishes a total order on a set of events
  • Recall that ties can happen only between events
    happening on the same system since we already
    asserted that every message transfer takes at
    least one tick of the logical clock

23
Lamports Logical ClockSynchronization
  • Following these rules means that the logical
    clock at each node in a distributed system is now
    sufficient to reason about synchronization
    problems
  • Logical clock provides a way for each system to
    decide about the order in which events occur from
    each systems point of view
  • Consider the connection to in-order message
    delivery in ensuring logically consistent
    decision making among distributed components of a
    computation
  • HOWEVER the logical clock values at each
    distributed component may have little or no
    relation to real time or to each other

24
Physical Clocks
  • All clocks are logical clocks in the sense that
    each
  • Has finite resolution
  • Approximates real time
  • Two important questions must always be considered
    for a particular system
  • How do we synchronize the computers logical
    clock with real time
  • How do we synchronize computer clocks with one
    another
  • Computers and distributed software running on
    them may have several clocks, but one is the
    local notion of real time

25
Physical ClocksReal Time
  • Sun Time
  • Humans, including astronomers, want to have time
    keeping stay synchronized with the sun
  • Harder than it seems
  • Consider transition to Gregorian calendar
  • Significantly shorter year was decreed to adjust
    drift in previous scheme
  • Is the year 2000 a leap year? (Hint4,-100,400)
  • Atomic Time
  • 50 Cesium 133 clocks around the world
  • Average number of ticks since 1/1/58

26
Physical Clocks Real Time
  • Atomic time is the official universal time
  • Requires leap seconds every few years to stay
    synchronized with earths rotation
  • Astronomers care because it makes a difference
    where they point their instruments
  • They work at a much finer time scale than you
    might think
  • So do computers and distributed computations
  • GPS (Global Positioning System) satellites now
    make this easy and cheap to get
  • You can also call NIST on the telephone in Ft.
    Collins

27
Physical Clocks Clock Synchronization
  • Consider the difference between accuracy and
    synchronization of two clocks
  • Their accuracy is how closely they agree with
    real time
  • Their synchronization is how closely they agree
    with each other
  • Synchronization of clocks in a network supporting
    distributed decision making is often more
    important than their accuracy
  • Synchronization of clocks affects how easily
    distributed components can decide on an ordering
    of events
  • Synchronization of clocks within a few
    milliseconds of each other is desirable, but
    seconds or minutes of drift from real time could
    be OK

28
Physical Clocks Clock Synchronization
  • Degree of agreement among interacting machines is
    thus a crucial factor
  • Consider make horror scenarios to see this point
  • Same principle applies to banks
  • Network performance measurement experiments often
    depend on time stamps taken on different
    machines
  • Experiments are often restructured to minimize or
    avoid this
  • Physical clocks run at different speeds
  • Manufacturers specify maximum drift rate (rho -
    ?)
  • Manufacturers lie (sorry - provide factually
    unreliable information in a completely sincere
    manner)

29
Physical Clocks Clock Synchronization
  • Maximum resolution desired for global time
    keeping determines the maximum difference
    which can be tolerated between synchronized
    clocks
  • The time keeping of a clock, its tick rate should
    satisfy
  • The worst possible divergence d between two
    clocks is thus
  • So the maximum time ?t between clock
    synchronization operations that can ensure d is

30
Physical Clocks Clock Synchronization
  • Christians Algorithm
  • Periodically poll the machine with access to the
    reference time source
  • Estimate round-trip delay with a time stamp
  • Estimate interrupt processing time
  • figure 3-6, page 129 Tanenbaum
  • Take a series of measurements to estimate the
    time it takes for a timestamp to make it from the
    reference machine to the synchronization target
  • This allows the synchronization to converge
    within d with a certain degree of confidence
  • Probabilistic algorithm and guarantee

31
Physical Clocks Clock Synchronization
  • Wide availability of hardware and software to
    keep clocks synchronized within a few
    milliseconds across the Internet is a recent
    development
  • Network Time Protocol (NTP) discussed in papers
    by David Mill(s)
  • GPS receiver in the local network synchronizes
    other machines
  • What if all have GPS receivers
  • Increasing deployment of distributed system
    algorithms depending on synchronized clocks
  • Supply and demand constantly in flux

32
Physical Clocks At-Most-Once Semantics
  • Traditional approach
  • Each message has unique message ID
  • Server maintains list of IDs
  • Can lose message numbers on server crash
  • How long does server keep IDs?
  • With globally synchronized clocks
  • Sender assigns a timestamp to message
  • Server keeps most recent timestamp for each
    connection
  • reject any message with lower timestamp (is a
    duplicate)
  • removing old timestamps
  • G CurrentTime - MaxLifeTime - MaxClockSkew
  • timestamps older than G are removed

33
Physical Clocks At-Most-Once Semantics
  • After a server crash
  • CurrentTime is recomputed
  • using global synchronization of time
  • All messages older than G are rejected
  • All messages before crash are rejected as
    duplicate
  • some new messages may be wrongfully rejected
  • but at-most-once semantics is guaranteed

34
Physical Clocks Cache Coherence
  • File caching in a distributed file system
  • Many readers, single writer
  • Writer must ask readers to invalidate their
    copies
  • TS on the readers copies helps by making copies
    expire
  • Readers lease their copies of a file block
  • Constrains the period during which a
    non-responding reader may delay a potential
    writer
  • Does NFS server not responding sound familiar
  • Note tradeoff of overhead and latency
  • Lower lease time increases message load and
    decreases delay of ignoring a non-responding
    reader

35
Mutual Exclusion
  • Distributed components still need to coordinate
    their actions, including but not limited to
    access to shared data
  • Mutual exclusion to some limited set of
    operations and data is thus required
  • Consider several approaches and compare and
    contrast their advantages and disadvantages
  • Centralized Algorithm
  • The single central process is essentially a
    monitor
  • Central server becomes a semaphore server
  • Three messages per use request, grant, release
  • Centralized performance constraint and point of
    failure

36
Mutual ExclusionDistributed Algorithm Factors
  • Functional Requirements
  • 1) Freedom from deadlock
  • 2) Freedom from starvation
  • 3) Fairness
  • 4) Fault tolerance
  • Performance Evaluation
  • Number of messages
  • Latency
  • Semaphore system Throughput
  • Synchronization is always overhead and must be
    accounted for as a cost

37
Mutual Exclusion Distributed Algorithm Factors
  • Performance should be evaluated under a variety
    of loads
  • Cover a reasonable range of operating conditions
  • We care about several types of performance
  • Best case
  • Worst case
  • Average case
  • Different aspects of performance are important
    for different reason and in different contexts

38
Mutual ExclusionLamports Algorithm
  • Every site keeps a request queue sorted by
    logical time stamp
  • Uses Lamports logical clocks to impose a total
    global order on events associated with
    synchronization
  • Algorithm assumes ordered message delivery
    between every pair of communicating sites
  • Messages sent from site Sj in a particular order
    arrive at Sj in the same order
  • Note Since messages arriving at a given site
    come from many sources the delivery order of all
    messages can easily differ from site to site

39
Lamports Algorithm Request Resource r
  • Thus, each site has a request queue containing
    resource use requests and replies
  • Note that the requests and replies for any given
    pair of sites must be in the same order in queues
    at both sites
  • Because of message order delivery assumption

40
Lamports Algorithm Entering CS for Resource r
  • Site Si enters the CS protecting the resource
    when
  • This ensures that no message from any site with a
    smaller timestamp could ever arrive
  • This ensures that no other site will enter the CS
  • Recall that requests to all potential users of
    the resource and replies from then go into
    request queues of all processes including the
    sender of the message

41
Lamports Algorithm Releasing the CS
  • The site holding the resource is releasing it,
    call that site
  • Note that the request for resource r had to be at
    the head of the request_queue at the site holding
    the resource or it would never have entered the
    CS
  • Note that the request may or may not have been at
    the head of the request_queue at the receiving
    site

42
Lamport ME Example
Pj
Pi
Pj enters critical section
queue(j10)
15
release(i5)
queue(j10)
14
Pi in critical section
reply(12)
reply(12)
13
13
12
12
queue(j10, i5)
11
11
request (i5)
queue(i5)
queue(j10)
request (j10)
43
Lamports Algorithm Correctness
  • We show that Lamports algorithm ensures mutual
    exclusion through a proof by contradiction
  • Assume two sites and are executing in
    the critical section concurrently
  • For this to happen, L1 and L2 must hold at both
    sites concurrently which implies that at some
    time t both sites and had their
    own requests at the top of their respective
    request_queues
  • Without Loss of generality (WLOG) assume that
  • Due to L1 and FIFO property of communication it
    is clear that at time t must have had the
    request from in

44
Lamports Algorithm Correctness
  • This implies that at site the local request
    is at the head of the local
    even though the request from had a lower
    timestamp
  • This is a contradiction
  • Lamports algorithms thus ensures mutual
    exclusion since assuming otherwise produces a
    contradiction
  • Key idea is that L1 ensures that must place
    the request from ahead of its own because it
    definitely arrived and has a lower logical
    timestamp

45
Lamports AlgorithmComments
  • Performance 3(N-1) messages per CS invocation
    since each requires (N-1) REQUEST, REPLY, and
    RELEASE messages
  • Observation Some REPLY messages are not required
  • If sends a request to and then receives a
    REQUEST from with a timestamp smaller than its
    own REQUEST
  • need not send a reply to because it
    already has enough information to make a decision
  • This reduces the messages to between 2(N-1) and
    3(N-1)
  • As a distributed algorithm there is no single
    point of failure but there is increased overhead

46
Ricart and Agrawala
  • Refine Lamports mutual exclusion by merging the
    REPLY and RELEASE messages
  • Assumption total ordering of all events in the
    system implying the use of Lamports logical
    clocks with tie breaking
  • Request CS (P) operation
  • 1) Site requesting the CS creates a
    message and sends it to all
    processes using the CS including itself
  • Messages are assumed to be reliably delivered in
    order
  • Group communication support can play an obvious
    role

47
Ricart and AgrawalaReceive a CS Request
  • If the receiver is not currently in the CS and
    does not have pending request for it in its
    request_queue
  • Send REPLY
  • If the receiver is already in the CS
  • Queue the request, sending no reply
  • If the receiver desires the CS but has not
    entered
  • Compare the TS of its request to that just
    received
  • REPLY if received is newer
  • Queue the request if pending request is newer

48
Ricart and Agrawala
  • Enter a CS
  • A process enters the CS when it receives a REPLY
    from every member of the group that can use the
    CS
  • Leave a CS
  • When the process leaves the CS it sends a REPLY
    to the senders of all pending messages on its
    queue

49
Ricart and Agrawala Example 1
I
J
K
k in CS
OK(i)
i in CS
OK(k)
OK(j)
OK(j)
request(k12)
request(i8)
50
Ricart and Agrawala Example 2
I
J
K
OK(j)
k in CS
j in CS
OK(i)
OK(i)
i in CS
OK(k)
OK(k)
OK(j)
q(k9)
q(j8, k9)
q(j8)
request(i7)
request(j8)
request(k9)
51
Ricart and AgrawalaProof by Contradiction
  • Assume sites and are executing in the CS
    concurrently
  • Assume that
  • Site clearly received the request from
    after its own
  • Other wise
  • However, can be executing concurrently with
    only if returns a REPLY message in response
    to the request from before exits its CS
  • This is impossible because
  • The assumption leads to a contradiction and thus
    the R-A algorithm ensures mutual exclusion
  • Performance 2(N-1) messages, (N-1) REQUEST and
    (N-1) REPLY

52
Ricart and AgrawalaObservations
  • The algorithm works because the global logical
    clock ensures a global total ordering on events
  • This ensures, in turn, that the decision about
    who enters the CS is unambiguous
  • Single point of failure is now N points of
    failure
  • A crashed group member cannot be distinguished
    from a busy CS
  • Distributed and optimized version is N times
    more vulnerable than the centralized version!
  • Explicit message denying entry helps reliability
    and converts this into busy wait

53
Ricart and AgrawalaObservations
  • Either group communication support is used, or
    each user of the CS must keep track of all other
    potential users correctly
  • Powerful motivation for standard group
    communication primitives
  • Argument against a centralized server said that a
    single process involved in each CS decision was
    bad
  • Now we have N processes involved in each decision
  • Improvements get a majority - Makaewas
    algorithm
  • Bottom Line a distributed algorithm is possible
  • Shows theoretical and practical challenges of
    designing distributed algorithms that are useful

54
Token Passing Mutex
  • General structure
  • One token per CS ? token denotes permission to
    enter
  • Only process with token allowed in CS
  • Token passed from process to process ? logical
    ring
  • Mutex
  • Pass token to process i 1 mod N
  • Received token gives permission to enter CS
  • hold token while in CS
  • Must pass token after exiting CS
  • Fairness ensured each process waits at most N-1
    entries to get CS

55
Token Passing Mutex
  • Correctness is obvious
  • No starvation since passing is in strict order
  • Difficulties with token passing mutex
  • Idle case of no process entering CS pays overhead
    of constantly passing the token
  • Lost tokens diagnosis and creating a new token
  • Duplicate tokens ensure generation of only one
    token
  • Crashes require a receipt to detect dead
    destinations
  • Receipts double the message overhead
  • Design challenge holding time for unneeded token
  • Too short ? high overhead, too long ? high CS
    latency

56
Mutex Comparison
  • Centralized
  • Simplest and most efficient
  • Centralized coordinator crashes create the need
    to detect crash and choose a new coordinator
  • M/use 3 Entry Latency 2
  • Distributed
  • 3(N-1) messages per CS use (Lamport)
  • 2(N-1) messages per CS use (Ricart Agrawala)
  • If any process crashes with a non-empty queue,
    algorithm wont work
  • M/use 2(N-1) Entry Latency 2(N-1)

57
Mutex Comparison
  • Token Ring
  • Ensures fairness
  • Overhead is subtle ? no longer linked to CS use
  • M/use 1 ? ? Entry Latency 0 ? N-1
  • This algorithm pays overhead when idle
  • Need methods for re-generating a lost token
  • Design Principle building fault handling into
    algorithms for distributed systems is hard
  • Crash recovery is subtle and introduces overhead
    in normal operation
  • Performance Metrics M/use and Entry Latency

58
Election Algorithms
  • Centralized approaches often necessary
  • Best choice in mutex, for example
  • Need method of electing a new coordinator when it
    fails
  • General assumptions
  • Give processes unique system/global numbers (e.g.
    PID)
  • Elect process using a total ordering on the set
  • All processes know process number of members
  • All processes agree on new coordinator
  • All do not know if it is up or down ? election
    algorithm is responsible for determining this
  • Design challenge network delay vs. crashed peer

59
Bully Algorithm
  • Suppose the coordinator doesnt respond to P1
    request
  • P1 holds an election by sending an election
    message to all processes with higher numbers
  • If P1 receives no responses, P1 is the new
    coordinator
  • If any higher numbered process responds, P1 ends
    its election
  • Process receives an election request
  • Reply to the sender tells it that it has lost the
    election
  • Holds an election of its own
  • Eventually all but highest surviving process give
    up
  • Process recovering from a crash takes over if
    highest

60
Bully Algorithm
  • Example Processes 0-7, 4 detects that 7 has
    crashed
  • 4 holds election and loses
  • 5 holds election and loses
  • 6 holds election and wins
  • Message overhead variable
  • Who starts an election matters
  • Solid lines say Am I leader?
  • Dotted lines say you lose
  • Hollow lines say I won
  • 6 becomes the coordinator
  • When 7 recovers it is a bully and sends I win
    to all

61
Ring Algorithm
  • Processes have a total order known by all
  • Each process knows its successor ? forming a ring
  • Ring mod N
  • So the successor of Pi is P(i1) mod N
  • No token involved
  • Any process Pi noticing that the coordinator is
    not responding
  • Sends an election message to its successor P(i1)
    mod N
  • If successor is down, send to next member ?
    timeout
  • Receiving process adds its number to the message
    and passes it along

62
Ring Algorithm
  • When election message gets back to election
    initiator
  • Change message to coordinator
  • Circulate to all members
  • Coordinator is highest process in the total order
  • All processes know the order and thus all will
    agree no matter how the election started
  • Strength
  • Only one coordinator chosen
  • Weakness
  • Scalability latency increases with N because the
    algorithm is sequential

63
Ring Algorithm
  • What if more than one process detects a crashed
    coordinator?
  • More than one election will be produced message
    storm
  • All messages will contain the same information
    member process numbers and order of members
  • Same coordinator is chosen (highest number)
  • Refinement might include filtering duplicate
    messages
  • Some duplicates will happen
  • Consider two elections chasing each other
  • Eliminate one initiated by lower numbered process
  • Duplicated until lower reaches source of the
    higher

64
Atomic Transactions
  • All synchronization methods so far have been low
    level
  • Essentially equivalent to semaphores
  • Good for building more powerful higher level
    tools
  • Assume stable storage
  • Contents survive all non-physical disasters
  • Specifically used by system to store data across
    crashes
  • Transaction
  • Performs a single logical function
  • All-or-none computationeither all operations are
    executed or none
  • Must do so in the face of system failures ?
    stable storage

65
Atomic Transactions
  • Transaction Model
  • Start transaction
  • Series of read and write operations
  • Either a commit or abort operation
  • Commit all transaction operations executed
    successfully no transaction operations are
    allowed to hold
  • Roll Back restore system to the original state
    before transaction started
  • Transaction is in limbo before a commit
  • Has neither occurred nor not occurred
  • Depends on who is asking

66
Transactions Properties ACID
  • Atomic
  • Actions occur indivisibly completely or not at
    all
  • Appear to happen instantly, from the POV of any
    interacting process because they are all blocked
  • No intermediate states are visible
  • Consistent
  • System invariants hold, but are specific to
    application
  • Conservation of money semantics in banking
    applications
  • Inside transaction this is violated, but from
    outside, the transaction is indivisible and
    invariants are, well, invariant

67
Transactions Properties ACID
  • Isolated
  • Concurrent transactions do not interfere with
    each other
  • Serializable results from every set of
    transactions looks as if they are done in some
    sequential transaction execution
  • Transaction system must ensure that only legal or
    semantically consistent interleavings of
    transaction components occur
  • Durable
  • Once a transaction commits, results are permanent
  • Relevant to ask permanent with respect to what
  • Generally data structures or stable storage
    contents

68
Transaction Primitives
  • Begin-transaction
  • End-transaction
  • Abort-transaction
  • Returns to state before the begin-transaction
  • Often referred to as roll-back
  • Commit-transaction
  • Changes made in transaction become visible to the
    outside world
  • Transaction operations
  • Read (receive)
  • Write (send)

69
Transaction Example
  • Suppose we have three transactions T1, T2, and T3
  • Two data elements, A and B
  • Scheduled by a round-robin scheduler artificial
    but instructive for this example
  • One operation per time slice
  • Consider what interleavings of component
    operations are consistent with a serial execution
    order of transaction set
  • Obvious choice is to not interleave components of
    different transactions ? constrains concurrency

70
Transaction Example
  • T1 ? T2 ? T3
  • But T1 reads A after T3 writes
  • This implies that T3 ? T1 creating a
    contradiction
  • Atomicity is violated
  • Abort T1

T3
22
Aw
Br
71
Transaction Example
  • T2 ? T3 ? T1
  • T2 writes A after T3s write
  • Requiring T3 ? T2
  • Abort T2
  • Note since we interleaved operations all members
    of the set must be ready to commit before any can
    commit

72
Transaction Example
  • T3 ? T1 ? T2
  • This works because each reaches the commit stage
    without encountering a contradiction

T
Ts
event1
event2
event3
event4
event5
event6
event7
Br
T3
20
Aw
T1
21
Ar
Aw
Ar
Bw
Aw
T2
22
73
Nested Transactions
  • Transaction divided into sub-transactions
  • Structured as a hierarchy
  • Internal nodes are masters for its children
  • Advantages
  • Better performance aborted sub-transactions do
    not abort masters
  • Increased concurrency only need to lock
    sub-transactions

A
C
H
G
F
B
I
J
D
E
74
Nested Transactions
  • Suppose a parent transaction starts several child
    transactions
  • One or more children transactions commit
  • Only after committing are the childs results
    visible to parent
  • Atomicity is preserved at child level
  • But the results are horrible so the parent aborts
  • But child already committed
  • Parent abort must roll back all child
    transactions
  • Even if they have committed
  • Commit of subordinate transactions thus not
    final, and thus not real with respect to the
    containing system

75
Implementing Transactions
  • Conceptually, a transaction is given a private
    workspace
  • Containing all resources it is allowed to access
  • Before commit all operations done to private
    workspace
  • Commit changes in the private workspace are
    reflected into the actual workspace (file system,
    etc.)
  • If the shadowed workspaces of more than one
    transaction intersect ? contain common member
    data items
  • And one of them has a write operation on a common
    member
  • Then there is a conflict
  • And one of the transactions must be aborted

76
Implementing Transactions
  • First level optimization copy on write
  • Private workspace points to the common workspace
  • Copy items into the private space only when
    written
  • Virtual memory systems do this when processes
    fork
  • Copied items are shadowed
  • Commit copies shadowed items into global
    workspace
  • Second level optimization shadow blocks
  • Make units of shadowing as small as possible
  • Disk blocks within a file that are written
    instead of the whole file
  • Specific variables or groups of variables in a
    data space

77
Implementing Transactions
  • Private workspaces are a form of caching
  • Design issues
  • Size of shadowed objects
  • Probability of an intersection of private
    workspaces
  • Constraint on concurrency of transactions
  • Overhead of managing information and detecting
    intersections
  • Analogy to data cache line size and snooping
    cache consistency problems

78
Implementing Transactions Writeahead Log
  • Global copies are changed in the course of a
    transaction
  • Log of changes maintained in stable storage
  • Log entries include of write operation records
  • Transaction name
  • Data item name
  • Old value
  • New value
  • Save log entry before performing write operations
  • Transaction Ti is represented by a series of
    write operation records terminated by the commit
    operation and record

79
Implementing Transactions Writeahead Log
  • Transaction log consists of
  • lt Ti startgt
  • series of write records (Ti, x, old value, new
    value)
  • lt Ti commitgt or lt Ti abortgt
  • Recovery procedures
  • undo(Ti) restores a values written by Ti to old
    values
  • redo(Ti) sets all values written by Ti to new
    values
  • If Ti aborts
  • Execute undo(Ti)

80
Implementing Transactions Writeahead Log
  • If there is a system failure the system can use
    redo(Ti) to make sure all updates are in place
  • Compare writeahead log values to actual value
  • Also use the log to proceed with the transaction
  • If an abort is necessary, use undo(Ti)
  • Note that the commit operation must be done
    atomically
  • Difficult when different machines and processes
    are involved
  • Multiple logs are still a problem to consider

81
Implementing Transactions Two-Phase Commit
  • The commit to the transaction must be atomic
  • Specific roles permit this
  • Figure 3-20, page 153 Tanenbaum
  • Coordinator is selected (transaction initiator)
  • Phase 1
  • Coordinator writes prepare in log
  • Sends prepare message to all processes involved
    in the commit (subordinates)
  • Subordinates write ready (or abort) into log
  • Subordinates reply to coordinator
  • Coordinator collects replies from all
    subordinates

82
Implementing Transactions Two-phase Commit
  • If any subordinate aborts or does not respond ?
    abort
  • If all respond, commit message will make
    transaction results permanent in all subordinates
  • Stable storage is the key to the very end
  • Crashes can be handled by tracing the log to
    recover
  • Phase 2
  • Coordinator logs commit and sends commit message
  • Subordinates write commit into their log
  • Subordinates execute the commit
  • Subordinates send finished message to coordinator
  • System can remove all transaction log entries, if
    desired

83
Concurrency Control
  • Transactions need to run simultaneously
  • All modern data base systems need to serve
    concurrent users -especially in parallelized
    distributed system
  • Transactions can conflict
  • One may write to items others want to read or
    write
  • Most transactions do not conflict
  • Maximizing performance requires us to constrain
    only conflicting transactions
  • Concurrency control methods
  • Locking
  • Optimistic concurrency control
  • Timestamps

84
Locking
  • Locks
  • Semaphore of sorts creating mutual exclusion
    regions within the total data of a DB
  • Simplistic scheme is too restrictive
  • Distinguish read and write locks
  • Many readers, single writer canonical problem
  • Read locks
  • Allow N read locks on a resource
  • Write locks
  • No other lock is permitted

85
Locking
  • Locking granularity
  • File level is too coarse
  • Finer granularity ? less concurrency constraint
  • Finer granularity ? greater overhead managing
    locks and increased probability of deadlock
  • Two-Phase locking
  • Fine-grained locking can lead to inconsistency
    and deadlock
  • Dividing lock requests into two phases helps
    simplify
  • If transaction avoids updating until all locks
    are acquired, this simplifies failure
  • Release all locks and try again

86
Locking
  • Growing phase
  • Transaction obtains locks, may not release any
  • Shrinking phase
  • Once a lock is released, no locks can be obtained
    for rest of the transaction
  • Disadvantage of two-phase locking
  • Concurrency is reduced
  • Resource ordering (prevention) or detection and
    resolution are necessary to handle deadlocks
  • Strict TPL releases no locks until abort/commit
  • Increases concurrency constraint but avoids
    cascade aborts

87
Two-Phase Locking
  • Scenario 1
  • Also safe from deadlock
  • P1 P2
  • lock R1 lock R1
  • ... lock R2
  • lock R2 ...
  • ... unlock R2
  • unlock R2 unlock R1
  • unlock R1

88
Two-Phase Locking
  • Scenario 2
  • Susceptible to deadlock
  • P1 P2
  • lock R1 lock R2
  • ... lock R1
  • lock R2 ...
  • ... unlock R1
  • unlock R1 unlock R2
  • unlock R2

89
Optimistic Concurrency Control
  • Based on the observation that transactions rarely
    conflict
  • Expected value argument
  • Cumulative overhead of avoiding conflicts is more
    expensive than detecting and resolving conflicts
  • Let a transaction make all changes
  • Without checking for conflicts
  • Deadlock free
  • At commit time
  • Check for conflicts with files that have changed
    since the transaction began
  • if found ? abort all but one conflicting
    transaction and redo

90
Optimistic Concurrency Control
  • Optimistic changes made to private workspace
  • Distributed transactions need some form of global
    clock
  • Basis for comparing time for file changes
  • Make canonical problem
  • Parallelism is maximized
  • No waiting on locks
  • Inefficient when an abort is needed
  • Not a good strategy in systems with many
    potential conflicts ? bets on conflict
    probability
  • ? Load ? ? Conflicts? ? Failures ? ? Load
  • Positive feedback scenario

91
Timestamp Ordering
  • Each transaction Ti assigned a unique timestamp
    TS(Ti)
  • If Ti enters system before Tj,
  • TS(Ti) lt TS(Tj)
  • Imposes a total ordering on transactions
  • Each data item, Q, gets two timestamps
  • W-timestamp(Q) largest write timestamp
  • R-timestamp(Q) largest read timestamp
  • General concept
  • Process transactions in a serial order
  • Can use the same file, but must do it in order
  • Therefore atomicity is preserved

92
Timestamp Ordering
  • For a read
  • if (TS(Ti ) lt W-timestamp(Q))
  • reject read
  • roll back and re-start Ti
  • else / TS(Ti ) ? W-timestamp(Q) /
  • execute read
  • R-timestamp max(R-timestamp, TS(Ti ))
  • Timestamp ordering is deadlock-free
  • Total ordering of file accesses ? no cycles can
    result

93
Timestamp Ordering Example
  • Three transactions T1, T2, and T3
  • two data elements, A and B
  • scheduled in a round-robing scheduler
  • one operation per time slice
  • use read and write timestamps

94
Timestamp Ordering Example
  • Three transactions T1, T2, and T3

95
Deadlocks
  • Definition Each process in a set is waiting for
    a resource to be released by another process in
    set
  • The set is some subset of all processes
  • Deadlock only involves the processes in the set
  • Remember the necessary conditions for DL
  • Remember that methods for handling DL are based
    on preventing or detecting and fixing one or more
    necessary conditions

96
Deadlocks Necessary Conditions
  • Mutual exclusion
  • Process has exclusive use of resource allocated
    to it
  • Hold and Wait
  • Process can hold one resource while waiting for
    another
  • No Preemption
  • Resources are released only by explicit action by
    controlling process
  • Requests cannot be withdrawn (i.e. request
    results in eventual allocation or deadlock)
  • Circular Wait
  • Every process in the DL set is waiting for
    another process in the set, forming a cycle in
    the SR graph

97
Deadlock Handling Strategies
  • No strategy
  • Prevention
  • Make it structurally impossible to have a
    deadlock
  • Avoidance
  • Allocate resources so deadlock cant occur
  • Detection
  • Let deadlock occur, detect it, recover from it

98
No Strategy The Ostrich Algorithm
  • Assumes deadlock rarely occurs
  • Becomes more probable with more processes
  • Catastrophic consequences when it does occur
  • May need to re-boot all or some machines in
    system
  • Fairly common and works well when
  • DL is rare and
  • Other sources of instability are more common
  • How reboots of Window or MacOS are prompted by
    DL?

99
Deadlock Prevention
  • Ordered resource allocation most common example
  • Consider link with two-phase-locking grow and
    shrink
  • Works but requires global view of all resources
  • A total order on resources must exist for the
    system
  • Process code must allocate resources in order
  • Under utilizes resources when period of use of a
    resource conflict with the total resource order
  • Consider process Pi and Pk using resources R1
    and R2
  • Pi uses R1 90 of its execution time and R2 10
  • Pk uses R2 90 of its execution time and R1 10
  • One holds one resource far too long

100
Deadlock Avoidance
  • General method Refuse allocations that may lead
    to deadlock
  • Method for keeping track of states
  • Need to know resources required by a process
  • Bankers algorithm
  • Must know maximum number allocated to Pi
  • Keep track of resources available
  • For each request, make sure maximum need will not
    exceed total available
  • Under utilizes resources
  • Never used
  • Advance knowledge not available and CPU-intensive

101
Deadlock Detection and Resolution
  • Attractive for two main reasons
  • Prevention and avoidance are hard, have
    significant overhead, and require information
    difficult or impossible to obtain
  • Deadlock is comparatively rare in most systems so
    a form of the argument for optimistic concurrency
    control applies detect and fix comparatively
    rare situations
  • Availability of transactions helps
  • DL resolution requires us to kill some
    participant(s)
  • Transactions are designed to be rolled back and
    restarted

102
Centralized Deadlock Detection
  • General method Construct a resource graph and
    analyze it
  • Analyze through resource reductions
  • If cycle exists after analysis, deadlock has
    occurred
  • Processes in cycle are deadlocked
  • Local graphs on each machine
  • Pi requests R1
  • R1s machine places request in local graph
  • If cycle exists in local graph, perform
    reductions to detect deadlock
  • Need to calculate union of all local graphs
  • Deadlock cycle may transcend machine boundaries

103
Graph Reduction
  • Cycles dont always mean deadlock!

P1
P2
P3
P1
P2
Deadlock
P3
No Deadlock
P2
P3
104
Waits-For Graphs (WFGs)
  • Based on Resource Allocation Graph (SR)
  • An edge from Pi to Pj
  • means Pi is waiting for Pj to release a resource
  • Replaces two edges in SR graph
  • Pi ?R
  • R ? Pj
  • Deadlocked when a cycle is found

105
Centralized Deadlock Detection
  • All hosts communicate resource state to
    coordinator
  • Construct global resource graph on coordinator
  • Coordinator must be reliable and fast
  • When to construct the graph is an important
    choice
  • Report every
Write a Comment
User Comments (0)
About PowerShow.com