Distributed Transaction Management - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Distributed Transaction Management

Description:

Data items in a database, i.e., your name, your address. So, what are persistent objects? ... Transaction life histories. Successful. Aborted by client. Aborted ... – PowerPoint PPT presentation

Number of Views:966
Avg rating:3.0/5.0
Slides: 69
Provided by: CIT788
Category:

less

Transcript and Presenter's Notes

Title: Distributed Transaction Management


1
Distributed Transaction Management
  • Transactions and Data Synchronization
  • Flat and Nested Distributed Transactions
  • Atomic Commit Protocol
  • Concurrency Control for Distributed Transactions
  • Distributed Deadlock Avoidance and Detection

2
Process/Data Synchronization
  • Consider a client/server database system
  • To improve the system performance, the server may
    be executing multiple processes (threads)
    concurrently from different clients
  • Each process may access to a set of data objects
    maintained in a database
  • Some of the data objects are accessed (shared) by
    more than one processes concurrently (i.e.,
    before the end of the last accessed one)
  • A process may update the value of a shared data
    object and affect the execution result of another
    process which is also accessing to the same data
    object
  • The synchronization of accesses to shared data
    objects from different processes is called data
    synchronization
  • The purposes of data synchronization are to
    ensure
  • The correctness of data objects which are
    persistent objects
  • What is the meaning of persistent objects? Data
    items in a database, i.e., your name, your
    address. So, what are persistent objects?
  • The correctness of the results returned from
    process execution

3
Process/Data Synchronization
4
Process/Data Synchronization
  • How to achieve the objectives
  • By prevention or avoidance (control interleaving
    of process execution)
  • What is the difference between Prevention and
    avoidance?
  • Two processes cannot access to the same data
    object at the same time
  • When one process invokes a method (operation) to
    access to a data object, the data object is
    locked (mutual exclusion)
  • We cannot lookup an account and at the same time
    withdraw from the same account
  • What are the methods/approaches for achieving
    mutual exclusion
  • Operations that are free from interference from
    concurrent operations (processes) and have to be
    done in a single step are called atomic
    operations (cannot be stopped in the middle)
  • Cost the concurrency in process execution is
    lowered
  • Poor in performance (longer response time)
  • Longer waiting time for accessing a shared object

5
What is a transaction?
  • From a process to a transaction
  • The generation of a process normally has specific
    purposes and each process may access to multiple
    data items
  • Specific purpose I.e., buying a ticket or having
    a dinner. It has a corresponding event in our
    world and is not just a simulated event in the
    computer
  • Definition of a transaction from user viewpoint
  • The execution of a program to perform a function
    (functions) by accessing a shared data items
    (database), usually on behalf of a user
    (application)
  • A transaction
  • Is a process (concurrent processes)
  • Each process consists of multiple atomic steps
  • Each step is called an operation
  • Two types of operations
  • Database operations a collection of operations,
    usually read and write, on the database, together
    with some computation
  • Transaction operations a begin operation and end
    operation
  • ACID requirements (atomic, consistency, isolation
    and durability)

6
Distributed Transactions
  • A transaction becomes distributed if it invokes
    operations at different servers
  • Requirements in processing of a distributed
    transaction
  • Maintain data consistency (ensure the correctness
    of results and the values of data items in a
    database)
  • Use a distributed concurrency control protocols
  • Maintain atomicity of distributed transactions
  • Different processes of the same transaction end
    up in the same termination decision (complete or
    abort) (much more difficult than in a centralized
    system, why??)
  • Use an atomic commit protocol (all commit or all
    abort)
  • An atomic commit protocol must have a failure
    model to ensure the transaction atomicity and
    durability even when different types of failures
    (process and network failures) may occur
  • Consider of the use of a new transaction model
    (other than the flat model) to minimize the cost
    of transaction abort
  • What is a transaction model? What is a flat
    transaction model?
  • Nested transaction model

7
Transaction structure database consistency

Database may be temporarily in an inconsistent
state during execution
Database in a consistent state
Database in a consistent state
Execution of Transaction
Begin Transaction
End Transaction
8
Operations in Coordinator interface
openTransaction() -gt trans starts a new
transaction and delivers a unique TID trans. This
identifier will be used in the other operations
in the transaction. closeTransaction(trans) -gt
(commit, abort) ends a transaction a commit
return value indicates that the transaction has
committed an abort return value indicates that
it has aborted. abortTransaction(trans) aborts
the transaction.
The abort cost could be heavy if the transaction
is long
9
Transaction life histories
Successful
Aborted by client
Aborted by server


openTransaction
openTransaction
openTransaction
operation
operation
operation


operation
operation
operation


server aborts
transaction
operation
operation
operation ERROR
reported to client
closeTransaction
abortTransaction
10
Distributed Transactions
  • Distributed transactions
  • Multiple processes needed be created for a
    transaction at different servers to access to
    distributed data objects maintained by the
    servers
  • The multiple processes of a transaction are
    coordinated by a coordinator (distributed
    transaction model)
  • A client starts a transaction by sending an
    OpenTransaction request to a server that manages
    the required data objects
  • The coordinator returns the TID to the client and
    it will responsible for committing or aborting
    transaction at the end of the transaction
  • The process responsible for accessing the
    required data objects is called participant. It
    is responsible for keeping tracking of all the
    recoverable (restore to the original value)
    objects for a transaction
  • Problems for execution of distributed
    transactions
  • Long execution time
  • Highly affected by the performance of the network
    (retransmit)
  • Impacts of message lost and process failure could
    be very serious

11
Distributed Transaction Model
Example
Master Process
Cohort 1 Site 1
Cohort 2 Site 2
Cohort 3 Site 3
12
Distributed Transactions
  • Flat Vs nested distributed transactions
  • Flat distributed transaction
  • No sub-transaction -gt single threads of control
  • One coordinator
  • Multiple participants
  • Sequential execution of participants
  • Nested distributed transactions
  • Sub-transactions and nested sub-transactions
  • Each sub-transaction starts after its parent and
    finishes before it
  • One sub-transaction -gt one coordinator for
    coordinating its participants
  • A participant can be a coordinator if it has
    sub-transactions
  • Different coordinators could have different
    commit/abort decisions
  • Parallel execution of sub-transactions is
    possible
  • Atomicity may only be applied at sub-transaction
    level

13
Distributed Transactions
(a) Flat transaction
(b) Nested transactions
X, Y, Z are servers connected by a network
14
Nested Banking Transaction (Skip)
A client transfers 10 from account A to C, and
then transfers 20 from B to D. A and B are at
separate servers X and Y, and C and D are at
server Z
15
A Distributed Banking Transaction
Client transaction T transfers 4 from account A
to account C, and then transfers 3 from account
B to D
16
Two-phase Commit Protocol
  • Ensure atomicity and durability properties of
    distributed transactions
  • All the processes of a transaction/sub-transaction
    have to reach the same decision (commit/abort)
  • A process cannot reverse its decision after it
    has reached one
  • The (global) commit decision can only be reached
    if all processes voted yes
  • vote yes means that it is willing to commit
  • Once a transaction has committed, all its effects
    become permanent even failures occur
  • The commit/abort decision is made by the
    coordinator of a transaction (sub-transaction)
  • The coordinator collects votes from its
    participants through messages exchanges

17
Operations for Two-Phase Commit
canCommit?(trans)-gt Yes / No Call from
coordinator to participant to ask whether it can
commit a transaction. Participant replies with
its vote doCommit(trans) Call from coordinator
to participant to tell participant to commit its
part of a transaction doAbort(trans) Call from
coordinator to participant to tell participant to
abort its part of a transaction haveCommitted(tran
s, participant) Call from participant to
coordinator to confirm that it has committed the
transaction getDecision(trans) -gt Yes / No Call
from participant to coordinator to ask for the
decision on a transaction after it has voted Yes
but has still had no reply after some delay. Used
to recover from server crash or delayed messages
18
2PC Steps
  • Phase 1 (voting phase) to collect the decision
    of individual process (participant) of a
    distributed transaction for commit or abort
  • Phase 2 (decision phase) make the final global
    commit or abort decision and ensure that
    everybody writes the results into the database
  • Global Commit Rule
  • Aborts a transaction if and only if at least one
    participant votes to abort
  • Commits a transaction if and only if all of the
    participants vote to commit

19
The Two-phase Commit Protocol
Phase 1 (voting phase) 1. The coordinator
sends a canCommit? request to each of the
participants in the transaction 2. When a
participant receives a canCommit? request it
replies with its vote (Yes or No) to the
coordinator. Before voting Yes, it prepares to
commit by saving objects in permanent storage. If
the vote is No the participant aborts
immediately Phase 2 (completion according to
outcome of vote) 3. The coordinator collects
the votes (including its own) (a) If there are
no failures and all the votes are Yes the
coordinator decides to commit the transaction and
sends a doCommit request to each of the
participants (b) Otherwise the coordinator
decides to abort the transaction and sends
doAbort requests to all participants that voted
Yes 4. Participants that voted Yes are waiting
for a doCommit or doAbort request from the
coordinator. When a participant receives one of
these messages, it acts accordingly and in the
case of commit, makes a haveCommitted call as
confirmation to the coordinator
20
Communication in Two-phase Commit
21
Uncertainty Period in 2PC
  • A participant is in uncertainty period after it
    sends a Yes vote to the coordinator
  • It has to wait for the final decision from the
    coordinator and cannot decide to abort
  • The period ends when it receives a commit or
    abort message
  • What should a participant do after it has waiting
    for a long period of time still does not receive
    any decision from its coordinator? Time-out and
    then decide to abort?
  • The coordinator has no uncertainty period since
    it decides as soon as it votes

22
Failure Handling in 2PC
  • Two Phase Commit is resilient to all types of
    failures in which no log information is lost
  • It deals with the problem of failures by writing
    log information into stable storage before any
    decision is made
  • begin_commit, commit, end_of_tran (coordinator)
  • ready log (abort log), commit log (participants)
  • 2PC uses time-out to resolve the failures (making
    an assumption on the longest time to receive a
    reply)
  • A time-out occurs when it cannot get an expected
    message from another process within the expected
    time period
  • Failure types
  • site failures - participant fails or coordinator
    fails
  • loss messages I.e.,
  • an answer from a participant is lost
  • a vote request canCommit is lost
  • final decision is lost

23
Failure Handling in 2PC
  • Participant fails before writing the vote into
    the log
  • The coordinators timeout expires and it decides
    to abort
  • Participant fails after having written the vote
    into the log
  • When the participant recovers, it aborts if its
    vote is No
  • If its vote is yes, it try to find out the final
    decision from the coordinator or other
    participants (the log must be written before
    sending out the vote)
  • Coordinator fails after having written the vote
    request but before the having written the final
    decision
  • All participants which have already answer Yes
    must wait for the recovery of the coordinator for
    the final decision
  • When the coordinator recovers, it sends vote
    requests again

24
Failure Handling in 2PC
  • Coordinator fails after having written the final
    decision but before having written the complete
    record
  • When the coordinator recovers, it send the
    decision to all the participants
  • The answer of a participant is lost
  • The coordinator timeouts and it decides to abort
  • The vote request message is lost
  • The coordinator timeouts and it decides abort
  • The final decision is lost
  • The participants which have voted yes have to
    wait. After the time-out interval, it must ask
    the coordinator for final decision

25
Performance of 2PC
  • Measures time duration and no. of communication
    messages
  • Performance of 2PC depends very much on the
    system architecture
  • Centralized
  • Hierarchical
  • Linear
  • Centralized (for N participants)
  • Total no. of messages 4 (N-1)
  • Time delay 4T
  • Hierarchical
  • depends on configuration
  • messages may be smaller and time may be longer
  • Linear
  • no. of messages will be smallest but the time
    required is the longest

26
Centralized 2PC

27
Linear 2PC

28
Distributed 2PC

29
2PC for Nested Transactions (Skip)
  • When a sub-transaction completes, it makes an
    independent decision either to commit
    provisionally or abort. If a parent aborts, all
    its sub-transactions are forced to abort
  • When a sub-transaction provisionally commits, it
    reports its status and the status of its
    decedents to its parent (coordinator)
  • If a nested sub-transaction aborts, it just
    reports abort to its parent without giving any
    information about its descendants
  • The top level transaction has a list of all the
    sub-transactions in the tree together with their
    commit status
  • Descendants of aborted sub-transactions are
    omitted from the list

30
2PC for Nested Transactions (Skip)
  • The parent may commit even some of its children
    have decided to abort
  • After all its sub-transactions have completed,
    the provisionally committed sub-transactions
    participate in a 2PC
  • Note provisional commit is not backed up in
    stable storage. In case of crash, it cannot be
    recovered. A provisional commit indicates a
    sub-transaction has completed successfully only
    and will probably agree to commit when it is
    subsequently asked to
  • Normally, the only reason for a participant
    sub-transaction being unable to commit is it has
    crashed since it completed its provisional commit

31
2PC for Nested Transactions (Skip)
  • How to identify the set of sub-transactions for
    commit in the hierarchy?
  • a hierarchy approach
  • a linear approach
  • A sub-transaction is an orphan if one of its
    ancestor aborts. It will not take part in the
    commit decision and will eventually be aborted
  • A client starts a set of nested transactions by
    opening a top-level transaction with an
    openTransaction operation and an TID is assigned
  • A client starts a sub-transaction by invoking the
    openSubTransaction operation whose argument
    specifies its parent transaction
  • The new sub-transaction joins the parent
    transaction and a TID for it is returned

32
Operations in Coordinator for Nested
Transactions (Skip)
openSubTransaction(trans) -gt subTrans Opens a new
subtransaction whose parent is trans and returns
a unique subtransaction identifier getStatus(tra
ns)-gt committed, aborted, provisional Asks the
coordinator to report on the status of the
transaction trans. Returns values representing
one of the following committed, aborted,
provisional
33
Transaction T decides whether to commit (Skip)
34
Information held by coordinators of nested
transactions (Skip)
Coordinator of
Child
Participant
Provisional
Abort list
transaction
transactions
commit list
T
T
, T
yes
T
, T
T
, T
1
2
1
12
11
2
T
T
, T
yes
T
, T
T
1
11
12
1
12
11
T
T
, T
no (aborted)
T
2
21
22
2
T
no (aborted)
T
11
11
T
T
, T
T
but not
T
, T
21
12
21
12
21
12
T
no (parent aborted)
T
22
22
35
Hierarchical 2PC for Nested Transactions (Skip)
  • The coordinator of the top-level sends
    canCommit to the coordinators of the
    sub-transactions for which it is the immediate
    parent level by level
  • canCommit?(trans, subTrans) -gt Yes / No
  • Call a coordinator to ask coordinator of child
    subtransaction whether it can commit a
    subtransaction subTrans. The first argument trans
    is the transaction identifier of top-level
    transaction. Participant replies with its vote
    Yes / No
  • Each participant collects the replies from its
    descendants (sub-transactions) before replying to
    its parent (coordinator)

36
Flat 2PC for Nested Transactions (Skip)
  • The coordinator of the top level transaction
    sends canCommit to the coordinators of all the
    sub-transactions at all levels
  • The list of aborted sub-transactions are included
    in the message to eliminate them from the commit
    procedure
  • canCommit?(trans, abortList) -gt Yes / No
  • Call from coordinator to participant to ask
    whether it can commit a transaction. Participant
    replies with its vote Yes / No
  • When a participant (coordinator) receives a
    canCommit request
  • If the participant has any provisionally
    committed transactions that are descendants of
    the top-level transactions, trans
  • Check that they do not have aborted ancestors in
    the abortList. Then prepare to commit
  • Those with aborted ancestors are aborted
  • Otherwise, send a Yes reply to the coordinator

37
Concurrency Control using Locking
  • Simple locks (atomic operations) cannot resolve
    the data synchronization problem for transactions
    (the schedule could be non-serializable)
  • Strict execution delay the reading and updating
    of a data object until the previous transaction
    that has updated the same data object has
    committed/aborted
  • An execution is recoverable if all the effects
    of an aborted transaction can be removed
    (all-or-none property)
  • To ensure recoverability
  • If a transaction has read an uncommitted data,
    not allow it to commit before the transaction
    writing the data has committed

38
Recoverability Example
An unrecoverable schedule due to dirty read
Transaction T BankDeposit ( A, 3)
Transaction U BankDeposit ( A, 5)
balance A.Read () 100 A.Write (balance
3) 103
balance A.Read () 103 A.Write (balance
5) 108
Commit transaction
Abort transaction
39
Concurrency Control using Locking
  • Methods have to be designed to work with locking
  • i.e., Two phase locking
  • Locking a data object before accessing
    (read/write) it (growing phase)
  • Once a transaction releases a lock, it cannot
    submit any lock request (shrinking phase)
  • I.e., locks are released just before the commit
    of a transaction
  • No sharing of uncommitted data objects in
    conflicting modes among concurrently executing
    transactions

40
Transactions T U with Exclusive Locks
Transaction
T


Transaction
U


balance b.getBalance()
balance b.getBalance()
b.setBalance(bal1.1)
b.setBalance(bal1.1)
a.withdraw(bal/10)
c.withdraw(bal/10)
Operations
Locks
Operations
Locks
openTransaction
bal b.getBalance()
lock
B
openTransaction
b.setBalance(bal1.1)
bal b.getBalance()
waits for
T
s
A
a.withdraw(bal/10)
lock
lock on
B
closeTransaction
unlock
A
,
B


lock
B


b.setBalance(bal1.1)


C
c.withdraw(bal/10)
lock
closeTransaction
unlock
B
,
C
41
Lock Compatibility
For one object
Lock requested


read
write
Lock already set
none
OK
OK
read
OK
wait
write
wait
wait
42
Use of locks in Strict Two-phase Locking
1. When an operation accesses an object within a
transaction (a) If the object is not already
locked, it is locked and the operation
proceeds. (b) If the object has a conflicting
lock set by another transaction, the transaction
must wait until it is unlocked. (c) If the object
has a non-conflicting lock set by another
transaction, the lock is shared and the operation
proceeds. (d) If the object has already been
locked in the same transaction, the lock will be
promoted if necessary and the operation proceeds.
(Where promotion is prevented by a conflicting
lock, rule (b) is used.) 2. When a transaction is
committed or aborted, the server unlocks all
objects it locked for the transaction
43
Lock Class
public class Lock private Object object //
the object being protected by the lock private
Vector holders // the TIDs of current
holders private LockType lockType // the
current type public synchronized void
acquire(TransID trans, LockType aLockType
) while(/another transaction holds the lock
in conflicing mode/) try
wait() catch ( InterruptedException
e)/.../ if(holders.isEmpty()) //
no TIDs hold lock holders.addElement(trans)
lockType aLockType else if(/another
transaction holds the lock, share it/ ) )
if(/ this transaction not a holder/)

holders.addElement(trans) else if (/ this
transaction is a holder but needs a more
exclusive lock/) lockType.promote()
44
continued
public synchronized void release(TransID trans
) holders.removeElement(trans) // remove
this holder // set locktype to
none notifyAll()
45
LockManager Class
public class LockManager private Hashtable
theLocks public void setLock(Object
object, TransID trans, LockType lockType)
Lock foundLock synchronized(this) //
find the lock associated with object //
if there isnt one, create it and add to the
hashtable
foundLock.acquire(trans, lockType) //
synchronize this one because we want to remove
all entries public synchronized void
unLock(TransID trans) Enumeration e
theLocks.elements() while(e.hasMoreElements()
) Lock aLock (Lock)(e.nextElement()
) if(/ trans is a holder of this
lock/ ) aLock.release(trans)
46
Locking Rules for Nested Transactions (Skip)
  • Two rules
  • Each set of nested transactions is a single
    entity that must be prevented from observing the
    partial effects of any other set of nested
    transactions
  • Each transaction within a set of nested
    transactions must be prevented from observing the
    partial effects of the other transactions in the
    set
  • Every lock that is acquired by a successful
    sub-transaction is inherited by its parent when
    it completes
  • Parent transactions are not allowed to run
    concurrently with their child transactions
  • Sub-transactions at the same level are allowed to
    run concurrently. When they access the same
    objects, locks serialize their accesses
  • For a sub-transaction to acquire a read/write
    lock, no other transaction can have a write lock
    on it except the parent transaction
  • When a sub-transaction commits, its locks are
    inherited by its parent
  • When a sub-transaction aborts, its locks are
    discarded

47
Nested Transactions Example (Skip)
commit
Suppose T1, T2 and T11 all access to a common
object Suppose that T1 accesses the object first
and successfully acquire a lock, which it passes
on to T11 for the duration of its execution,
getting it back when T11 completes. When T1
completes, T inherits the lock and passes it to T2
48
Concurrency Control for Distributed Transactions
  • Distributed Locking Vs centralized Locking
  • Centralized locking
  • A central server maintains a lock table
  • The central server is responsible for the locking
    of all the data objects in the system
  • All data access requests will be forwarded to the
    central server for locking first
  • Distributed locking
  • Each server maintains a lock table for the
    locking of the data objects managed by it
  • A data access request will be forward to the
    server responsible for locking that data object

49
Distributed Deadlocks
  • Deadlock involving processes located at more than
    one server is called distributed deadlock
  • Using time-out to resolve deadlock is clumsy and
    may result in unpredictable performance (repeat
    restart of a large number of transactions). Why?
  • Deadlock avoidance Vs deadlock detection
  • Deadlock avoidance
  • To prevent the formation of deadlock cycle
  • I.e. add rules in serving lock requests such that
    deadlock cycle is impossible to form (i.e., a
    change in locking procedure)
  • Deadlock detection
  • Following the original rules for granting a lock
  • Periodic (conditionally) search the wait-for
    graph for deadlock cycle
  • In a distributed deadlock, the (global) WFG is
    partitioned at multiple servers
  • How to search the global WFG without incurring
    heavy workload (network servers)?

50
Deadlock Avoidance using TS
  • Deadlock avoidance prevent potential deadlock to
    become deadlock
  • Each transaction is assigned a unique time-stamp,
    e.g., its creation time (distributed dbs
    creation time site ID)
  • Wait-die Rule (non-preemptive)
  • If Ti requests a lock that is already locked by
    Tj, Ti is permitted to wait if and only if Ti is
    older than Tj (Tis time-stamp is smaller than
    that of Tj)
  • If Ti is younger than Tj, Ti is restarted with
    the same time-stamp
  • When Ti requests access to the same lock in the
    second time, Tj may already have finished its
    execution
  • Wound-Wait Rule (preemptive)
  • If Ti requests a lock that is already locked by
    Tj, Ti is permitted to wait if and only if Ti is
    younger than Tj
  • Otherwise, Tj is restarted (with the same
    time-stamp) and the lock is granted to Ti

51
Deadlock Avoidance using TS
  • If TS(Ti) lt TS(Tj), Ti waits else Ti dies
    (Wait-die)
  • If TS(Ti) lt TS(Tj), Tj wounds else Ti waits
    (Wound-wait)
  • Note a smaller TS means the transaction is older
  • Note both methods restart the younger transaction
  • Both methods prevent cyclic wait
  • Consider this deadlock cycle T1-gtT2-gtT3-gt-gtTn-gtT
    1
  • It is impossible since if T1 -gtTn, then Tn is
    not allowed to wait for T1
  • Wait-die Older transaction is allowed to wait
  • Wound-wait Older transaction is allowed to get
    the lock

52
Deadlock Example
Transaction U TS of U lt TS of T
Transaction T
Read (A) Write (B)
Read (C) Write (A) (blocked)
Write (C) (blocked) deadlock formed

53
Deadlock Example (wait-die)
Transaction U TS of U lt TS of T
Transaction T
Read (A) Write (B)
Read (C) Write (A) (restarts) T is
restarted since it is younger than U T
releases its read lock on C before restart
Write (C)

54
Deadlock Example (wound-wait)
Transaction U TS of U lt TS of T
Transaction T
Read (A) Write (B)
Read (C) Write (A) (blocked) since T is
younger than U
Write (C) T is restarted by U since T is
younger than U The write lock on C is granted
to U after T has released its read lock on C

55
Deadlock Resolution by time-out
  • A simple method to break a deadlock cycle is the
    time-out method
  • Once a deadlock is formed, it will exist forever
    until it is resolved
  • In the time-out method, two parameters are
    defined a time-out period (TP) and a time-out
    checking period (TCP). Normally, TPgtgtTCP
  • The time-out checking period defines the period
    for checking the blocked transactions (at the
    lock table) for deadlock
  • If a transaction has been blocked for a period of
    time greater than the time-out period, it will be
    restarted as it is assumed to be involved in a
    deadlock
  • So, no deadlock cycle exists in the system longer
    than TP TPC

56
Deadlock Resolution by time-out
  • The problems in using the time-out method
  • How to define the time-out period (and TCP)
  • If it is large, a deadlock cycle will exist in
    the system for a long period of time
  • If it is small, many transactions will be
    restarted even though they are not involved in
    any deadlocks (false deadlock)
  • The advantages
  • Simple in implementation and the overhead of
    using the time-out method is low and depends on
    the values of TCP and TP
  • No undetected deadlock (can resolve all deadlocks)

57
Interleaving of Transactions U, V and W
U
V
W
lock
D
d.deposit(10)
lock
B
b.deposit(10)
at
Y
lock
A
a.deposit(20)
at
X
lock
C
c.deposit(30)
at
Z
wait at
Y
b.withdraw(30)
wait at
Z
c.withdraw(20)
wait at
X
a.withdraw(20)
58
Distributed Deadlock
(a)
(b)
59
Local and Global wait-for Graphs
60
Distributed Deadlock Detection
  • Merging the local WFGs at different servers to
    build a global wait-for graph
  • How and when to submit the local WFG at a server
    to other servers?
  • Too frequent heavy overhead
  • Too infrequent
  • phantom deadlock (a deadlock is detected but it
    is not a real one)
  • Blocking time is long
  • Edge chasing
  • To reduce the detection overhead,
  • Send the blocking relationship when a transaction
    is blocked
  • Not to send the local WFG to all servers. Only to
    those there is a potential deadlock
  • Not to send the whole WFG is sent. Only sending
    the nodes that are sufficient for deadlock
    detection

61
Edge Chasing
  • The servers attempt to find deadlock cycles by
    forwarding probes, which follow the edges of the
    graph throughout the distributed system
  • A probe consists of transaction wait-for
    relationships representing a path in the global
    wait-for graph (ltT-gtUgt indicating T is blocked by
    U)
  • When a probe returns to the server that generates
    it, a distributed deadlock is detected
  • Initiation step
  • When a server notes that a transaction T starts
    waiting for another blocked transaction U (lock
    reject), it initiates a detection by sending a
    probe containing the edge ltT-gtUgt to the server
    that of the object at which U is blocked. If U is
    sharing a lock, probes are sent to all the
    holders of the lock

62
Edge Chasing
  • Detection step
  • When a server receives a probe ltT-gtUgt, it checks
    whether U is still waiting. If yes (i.e., waiting
    for V), it adds the edge to the probe ltT-gtU-gtVgt.
    If V is blocked, forward the update probe to the
    server that V is waiting
  • Resolution when a deadlock cycle is formed, a
    transaction in the cycle is selected to rollback
    (release all its locks)

63
Probes transmitted to detect deadlock
64
Edge Chasing with Priorities
  • U-gtW V-gtT at about the same time, T requests an
    object locked by U and W is blocked by V
  • Two probes are triggered and the deadlock cycle
    is detected twice
  • Transactions are prioritized to reduce the number
    of probes (from higher priority to lower
    priority, TgtUgtVgtW)
  • Aborting the lowest priority transaction in the
    deadlock cycle
  • Reducing the number of probes
  • T-gtU initiates a probe
  • W-gtV, W-gtV, the probe will not be sent
  • How about if W is blocked by V is the last formed
    edge in the cycle?

65
Two probes Initiated
(c) detection initiated at object requested by W
(a) initial situation
(b) detection initiated at object requested by T
66
Probe Travel Downhill
  • When a transaction starts waiting for an object,
    it forwards the probes in its queue to the server
    of the object, which propagates the probes on
    downhill routes
  • When U starts waiting for V, the coordinator of V
    will save the probe ltU-gtVgt
  • When V starts waiting for W, the coordinator of W
    will store ltV-gtWgt and V will forward its probe
    queue ltV-gtWgt to W
  • When W starts waiting for A, it will forward its
    probe queue ltU-gtV-gtWgt to the server of A, which
    also notes the dependency W-gtU and combines the
    information in the received probe U-gtV-gtW-gtU

67
Probes Travel Downhill
.
.
(b) Probe is forwarded when V starts waiting
(a) V stores probe when U starts waiting
68
References
  • Dollimore 13.1 to 13.41, 14.41, 14.5
  • Tanenbaum 5.6 and 7.5.1
Write a Comment
User Comments (0)
About PowerShow.com