DISTRIBUTED TRANSACTION - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

DISTRIBUTED TRANSACTION

Description:

An atomic unit of database access, which is either completely executed ... Undo and redo must be idempotent as there may be a failure whilst they are executing. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 59
Provided by: bettypur
Category:

less

Transcript and Presenter's Notes

Title: DISTRIBUTED TRANSACTION


1
DISTRIBUTEDTRANSACTION
  • FASILKOM
  • UNIVERSITAS INDONESIA

2
What is a Transaction?
  • An atomic unit of database access, which is
    either completely executed or not executed at
    all.
  • It consists of an application specified sequence
    of operation, beginning with a begin_transaction
    primitive and ending with either commit or abort.

3
E.g.
  • Transfer 200 from account A in London to account
    B in Depok
  • begin_transaction
  • amntA lookup amount in account A
  • amntB lookup amount in account B
  • if (amntA lt 200)
  • abort
  • set account A amntA - 200
  • set account B amntB 200
  • commit

4
Transaction Properties
  • Four main properties, the ACID properties
  • Atomicity A transaction must be all or nothing.
  • Consistency A transaction takes the system form
    one consistent state to another consistent state.
  • Isolation The results of an incomplete
    transactions are not allowed to be revealed to
    other transactions.
  • Durability The results of a committed
    transaction will never be lost, independent of
    subsequent failures.
  • Atomicity durability -gt failure tolerance

5
Failure Tolerance
  • Atomicity durability -gt failure tolerance
  • Types of failures
  • Transaction-local failures detected by the
    application (e.g.insufficient funds)
  • Transaction-local failures not detected by the
    application (e.g. divide by zero)
  • System failures affecting volatile storage (e.g.
    CPU failure)
  • Media failures (e.g. HD crash)
  • What is a volatile storage?
  • What is a stable storage?

6
Recovery
  • Based on redundancy.
  • For example
  • 1.Periodically archive database
  • 2.Every time a change is made, record old and new
    values to a log.
  • 3.If a failure occurs
  • If not damage to physical database undo all
    unreliable changes.
  • If database physically damaged, restore from
    archive and redo changes

7
Logging (1)
  • Database vs transaction log.
  • For each change (begin transaction, commit, and
    abort), write a log record with
  • Transaction ID (TID)
  • Record ID
  • Type of action
  • Old value of record
  • New value of record
  • Other info, e.g. pointer to previous log record
    of this transaction.

8
Logging (2)
  • After a failure we need to undo or redo changes.
  • Undo and redo must be idempotent as there may be
    a failure whilst they are executing.

9
Log Write-ahead Protocol (1)
  • Before performing any update, at least the undo
    portion of the log record must be written to
    stable storage.
  • Before committing a transaction, all log records
    must have been fully recorded on stable storage.
    The commit record is written after these.

10
Log Write-ahead Protocol (2)
  • Reason for first rule
  • If we change log before database
  • log -- change -- crash ?
  • log -- crash ?
  • If we change log after database
  • change -- log -- crash ?
  • change -- crash cant undo

11
Checkpointing (1)
  • How does the recovery manager know which
    transaction to undo an which to redo after a
    failure.
  • Naive approach
  • Examine entire log from the start. Look for begin
    transaction records
  • if a corresponding commit record exists, redo
  • if theres an abort, do nothing and
  • if neither, undo.

12
Checkpointing (2)
  • Alternative
  • Every so often
  • 1) Force all log buffers to disk.
  • 2) Write a checkpoint record to disk containing
  • a) A list of all active transactions
  • b) The most recent log records for each
    transaction in a)
  • 3) Force all database buffers to disk - disk is
    now totally up-to-date.
  • 4) Write address of checkpoint record to fixed
    restart location (had better be atomic).

13
Checkpointing (3)
  • There are 5 categories of transaction

14
Recovery (1)
  • Look for most recent checkpoint record.
  • For all records active at checkpoint must
  • undo all active at failure
  • redo all others

15
Recovery (2)
  • Have 2 lists undo and redo
  • Initially, undo contains all TIDs in checkpoint
    record redo is empty
  • 3 passes through log
  • Forwards from checkpoint to end
  • If we find begin_transaction add undo list.
  • If we find commit, transfer from undo to redo
    list.
  • If we find abort, remove from undo list.
  • Backwards from end to checkpoint undo.
  • Forwards from checkpoint to end redo.

16
Commit Protocols
  • Commit protocols.
  • Assume a set of cooperating managers which deal
    with parts of a transaction.
  • For atomicity we must ensure that
  • At each site, either all actions or none are
    performed.
  • All sites take the same decision on whether to
    commit or abort

17
Two Phase Commit (2PC) Protocol - 1
  • One node, the coordinator, has a special role,
    the others are participants.
  • The coordinator initiates the 2PC protocol.
  • If any participant cannot commit, then all site
    must abort.

18
2PC 2
  • Phase I
  • reach a common decision on whether to abort or
    commit
  • Phase II
  • Implement the decision at all sites

19
2PC - 3
20
2PC Phase 1
  • Coordinator
  • Write prepare record to log
  • Multicast prepare message and set timeout
  • Participant
  • Wait for prepare message
  • If we are willing to commit then
  • force log records to stable storage
  • write ready record in log
  • send ready message to coordinator
  • else
  • write ABORT in log
  • send abort answer message to coordinator

21
2PC Phase 2 (1)
  • Coordinator
  • wait for a reply messages (ready or abort) or
    timeout
  • If timeout expires or any message is abort
  • write global abort record in the log
  • send abort command message to all participants
  • else
  • if all answers were ready
  • write global commit record to log
  • send commit command message to all participants

22
2PC Phase 2 (2)
  • Participants
  • Wait for command message (abort or commit)
  • write abort or commit in the log
  • send ack message to coordinator
  • execute command (may be null)
  • Coordinator
  • wait for ack messages from all participants
  • write complete in the log

23
2PC Site Failures
  • Resilient to all failures in which no log
    information is lost.
  • Site failures
  • participants fails before having written ready to
    log
  • timeout expires ---gt ABORT
  • Participants fails after having written ready to
    log
  • Msg sent -- others take decision. This node gets
    outcome from the coordinator or other
    participants after restart
  • Msg unsent -- timeout expires ---gt ABORT

24
2PC Coordinator Failures
  • Coordinator fails after writing prepare but
    before global commit/global abort (globalX).
  • All participants must wait for recovery of
    coordinator -gt BLOCKING
  • Recovery of coordinator involves restarting
    protocol from identities in prepare log record
  • Participants must identify duplicate prepare
    messages
  • Coordinator fails after having written global X
    but before writing complete.
  • On restart, coordinator must resend decision, to
    ensure blocked processes get it. Others must
    discard duplicate.
  • Coordinator fails after having written complete.
  • No action needed

25
2PC Lost Messages
  • A reply message (ready or abort) from a
    participant is lost.
  • Timeout expires -- coordinator ABORTs
  • A prepare message is lost.
  • Timeout expires -- coordinator ABORTs
  • A commit/abort command message is lost.
  • Timeout in participant -- request repetition of
    command from the coordinator.
  • An ack message is lost
  • Timeout in coordinator -- coordinator resends
    command

26
2PC - Partitions
  • Everything aborts as coordinator cant contact
    all participants. Those participants in
    partition without coordinator may remain blocked
    the resources are still retained until the
    blocked participants are unblocked.

27
2PC - Comments
  • Blocking is a problem if the coordinator or
    network fails which reduces availability -gt use
    3PC.
  • Unilateral abort.
  • Any node can abort until it sends ready (site
    autonomy before the ready state).
  • Efficiency can be increased
  • Elimination of prepare messages. The
    participants, that can commit, will automatically
    send RM.
  • Presumed commit/abort , if theres no information
    found in the log. See CER84 13.5.1,2,3.

28
Impossible Termination in 2PC
  • No operational participant has received the
    command. The operational participants are in the
    R state, but they havent received the ACM or
    CCM, AND
  • At least one participant failed. Unfortunately
    the failed participant acted as the coordinator.

29
Impossible Termination in 2PC
  • The failed participant might have already
    performed an undone action (commit or abort),
    i.e. in the C or A state.
  • The operational participants cant know what the
    failed participant had done, and cant take an
    independent decision.
  • The problem is solved by the 3PC.

30
3PC (1)
3PC
Restart 2
Restart 1
31
3PC (2)
  • Case study
  • See slide no 3.
  • London Coordinator Participant1
  • Depok Participant2

32
3PC (3)
  • 3PC avoids problems with 2PC
  • If any operational participant has received an
    abort then all can abort. The failed participant
    will abort at restart if it hasnt already. As
    2PC E.g. Depok fails, London is operational and
    has received an ACM.
  • If any participants has received the PCM, then
    all can commit. The failed participant
    (e.g.cannot have aborted unilaterally, because it
    had answered READY (RM). The failed participant
    will commit at restart (see restart 1). E.g.
    London fails, Depok is operational and has
    received the PCM.

33
3PC (4)
  • If none of the operational participants has
    received the PCM participant, i.e. all of the
    operational participants are in the R state, then
    2PC would block. With 3PC we can abort safely
    since the failed participant cannot have
    committed. At most it has received the PCM -gt it
    can abort at restart (see restart 2). E.g.
    London fails, Depok is operational and has NOT
    received the PCM (in the R state).

34
3PC (5)
  • 3PC guarantees that there wont be blocking
    condition caused by all possible failures during
    the 2nd phase.
  • Failures during the 3rd phase -gt blocking???
  • If coordinator fails in 3rd phase, then elect
    another and continue the commit process (since
    all must be in the PC state).

35
Consistency Isolation
  • Consistency isolation -gt concurrency control.
  • The Lost Update Problem

36
The Uncommitted Dependency (Temporary Update)
Problem
37
The Inconsistent Analysis Problem
before the update by transaction2
Transaction 1
Transaction 2
sum 0Read Asum sum A
Read A
Read B
Update A
Update B
COMMIT
Read Bsum sum B
after the update by transaction2
38
Concurrent Transactions
  • If we have concurrent transactions, we must
    prevent interference.
  • c.f. lost update problem
  • Prevent T2s read (because T1 has seen it and may
    update it) Locking
  • Prevent T1s update (because T2 has seen it)
    Locking
  • Prevent T2s update (because T1 has already
    updated it and so this is based on obsolete
    values) timestamping
  • Have them work independently and resolve
    difficulties on commit.Optimistic concurrency
    control

39
Serializability
  • What we need is some notion of correctness.
  • Serializability is usually used write to
    transactions.

40
Serial Transactions
  • Two transactions execute serially if all
    operations of one precede all operations of the
    other. e.g
  • S1 Ri(x) Wi(x) Ri(y) Rj(x) Wj(y) Rk(y)
    Wk(x), or
  • S1 TiTjTk, S2 TkTjTi, ..
  • S1 Schedule 1, S2 Schedule 2
  • All serial schedules are correct, but restrictive
    of concurrency .

41
Transaction Conflict
  • Two operations are in conflict if
  • At least one is a write
  • They both act on the same data
  • They are issued by different transactions
  • Which of the following are in conflict?
  • Ri(x) Rj(x) Wi(y) Rk(y) Wj(x)

42
Computationally Equivalent
  • Two schedules (S1 S2) are computationally
    equivalent if
  • The same operations are involved (possibly
    reordered)
  • For every pair of operations in conflict (Oi
    Oj),such that Oi precedes Oj in S1, then also Oi
    precedes Oj in S2.

43
Serializable Schedule
  • A schedule is serializable if it is
    computationally equivalent to a serial schedule.
    e.g Ri(x) Rj(x) Wj(y) Wi(x) (which is not a
    serial schedule) is computationally equivalent
    to Rj(x) Wj(y) Ri(x) Wi(x)
  • (which is a serial schedule TjTi)
  • The following is NOT a serial schedule. But is it
    serialisable? Ri(x) Rj(x) Wi(y) Rk(y)
    Wj(x)The above schedule is computationally
    equivalent to serial schedules TiTjTk, TiTkTj.

44
Serializability in Distributed Systems (1)
  • A local concurrency control mechanism isnt
    sufficient. e.g
  • Site 1 Ri(x) Wi(x) Rj(y) Wj(x) i.e Ti lt Tj
  • Site 2 Rj(y) Wj(y) Ri(y) Wi(y) i.e Tj lt Ti

45
Serializability in Distributed Systems (2)
  • Let T1Tn be a set of transactions and E be an
    execution of these modeled by schedules S1Sm on
    machines 1m.
  • Each local schedule (S1Sm) is serialisable.
  • Then E is serialisable (in distributed systems)
    if, for all i and j, all conflicting operations
    from Ti and Tj in each of the schedules have the
    same order i.e. there is a global total ordering
    for all sites.

46
Locking (1)
  • How to implement serializability ? use locking
  • Shared/eXclusive (Read/Write) locks
  • A transaction T must have SLockx or XLockx before
    any Read X.
  • A transaction T must have XLockx before any Write
    X.
  • A transaction T must issue unLockx after Read x
    or Write x is completed.

47
Locking (2)
  • A transaction T can upgrade the lock, i.e.
    issuing a XLockx after having SLockx, as long as
    T is the only transaction having Slockx.
    Otherwise T must wait.
  • A transaction T can downgrade the lock, i.e.
    issuing a SLockx after having XLockx.

48
Locking (3)
  • E.g.T1 X X Y T2 Y X Y
  • If initially X20, Y30 then either
  • S1 T1 lt T2 X50, Y80
  • S2 T2 lt T1 X70, Y50
  • Both are serial schedules, thus both are correct.

49
Locking (4)
  • However using Shared/eXclusive (Read/Write) locks
    does NOT guarantee serializability.
  • If any transaction releases a lock and then
    acquires another, it may produce incorrect
    results.

50
Locking (5)
51
Locking (6)
  • What is the problem?
  • It was too early unlocking Y in T1 and unlocking
    X in T2. See the italics unLock Y and unLock X.
  • What is the solution?
  • 2 Phase Locking (2PL).

52
2PL - 1
  • Two phase locking (2PL)
  • Before operating on any object the transaction
    must obtain a lock for it.
  • After releasing a lock the transaction never
    acquires more locks
  • 2 phases
  • Expanding (growing) phase acquiring new locks,
    but NEVER releasing any locks.
  • Shrinking phase releasing existing locks, but
    NEVER acquiring new locks.

53
2PL - 2
  • Exercise modify the schedule on slide 50 by
    following the 2 PL.
  • 2PL may cause deadlocks. See ELM00.
  • If a schedule obeys 2PL it is serializable.
  • How is the vice versa? Do all serializable
    schedules follow the 2 PL?

54
2PL - 3
55
Optimistic Concurrency Control
  • Locking is pessimistic. Assume instead that
    contention is rare
  • All updates made to a private copy
  • On commit see if there are conflicts with other
    transactions started afterwards.
  • If not, install changes atomically
  • else ABORT
  • Deadlock free maximum parallelism, but may get
    livelock.
  • What is livelock?

56
Timestamping (1)
  • Again, no deadlock
  • Rules
  • Each transaction receives a globally unique
    timestamp, TSi when started.
  • Updates are not physically installed until
    commit.
  • Every objects in the database carries the
    timestamp of the last transaction to read it
    (RTM(x)) and the last to write it (WTM(x))

57
Timestamping (2)
  • If a transaction, Ti, requests an operation that
    conflicts with a younger transaction Tj, then Ti
    is restarted with a new timestamp.
  • An operation from Ti is in conflict with an
    operation from Tj if.
  • - It is a read and the object has already been
  • update by Tj i.e. TSi lt WTM(x), read operation
    is rejected Ti is started with new time stamp.
    If the read is OK, set RTM(x) max(TSi,RTM(x))
  • - It is update and the object has already been
  • read or update by Tj i.e. TSi lt RTM(x)
    or
  • TSi lt WTM(x), update operation is rejected
    Ti is started with new time stamp. If the
    update is OK, set WTM(x) TSi.

58
References
  • CER84 Ceri, S., G. Pelagatti. Distributed
    Databases Principles and Systems. New York
    McGraw-Hill, 1984
  • ELM00 Elmasri R,. S.B. Navathe. Fundamentals of
    Database Systems 3rd ed. Reading Addison-Wesley,
    2000
Write a Comment
User Comments (0)
About PowerShow.com