Title: DISTRIBUTED TRANSACTION
1DISTRIBUTEDTRANSACTION
- FASILKOM
- UNIVERSITAS INDONESIA
2What is a Transaction?
- An atomic unit of database access, which is
either completely executed or not executed at
all. - It consists of an application specified sequence
of operation, beginning with a begin_transaction
primitive and ending with either commit or abort.
3E.g.
- Transfer 200 from account A in London to account
B in Depok - begin_transaction
- amntA lookup amount in account A
- amntB lookup amount in account B
- if (amntA lt 200)
- abort
- set account A amntA - 200
- set account B amntB 200
- commit
4Transaction Properties
- Four main properties, the ACID properties
- Atomicity A transaction must be all or nothing.
- Consistency A transaction takes the system form
one consistent state to another consistent state. - Isolation The results of an incomplete
transactions are not allowed to be revealed to
other transactions. - Durability The results of a committed
transaction will never be lost, independent of
subsequent failures. - Atomicity durability -gt failure tolerance
5Failure Tolerance
- Atomicity durability -gt failure tolerance
- Types of failures
- Transaction-local failures detected by the
application (e.g.insufficient funds) - Transaction-local failures not detected by the
application (e.g. divide by zero) - System failures affecting volatile storage (e.g.
CPU failure) - Media failures (e.g. HD crash)
- What is a volatile storage?
- What is a stable storage?
6Recovery
- Based on redundancy.
- For example
- 1.Periodically archive database
- 2.Every time a change is made, record old and new
values to a log. - 3.If a failure occurs
- If not damage to physical database undo all
unreliable changes. - If database physically damaged, restore from
archive and redo changes -
7Logging (1)
- Database vs transaction log.
- For each change (begin transaction, commit, and
abort), write a log record with - Transaction ID (TID)
- Record ID
- Type of action
- Old value of record
- New value of record
- Other info, e.g. pointer to previous log record
of this transaction.
8Logging (2)
- After a failure we need to undo or redo changes.
- Undo and redo must be idempotent as there may be
a failure whilst they are executing.
9Log Write-ahead Protocol (1)
- Before performing any update, at least the undo
portion of the log record must be written to
stable storage. - Before committing a transaction, all log records
must have been fully recorded on stable storage.
The commit record is written after these.
10Log Write-ahead Protocol (2)
- Reason for first rule
- If we change log before database
- log -- change -- crash ?
- log -- crash ?
- If we change log after database
- change -- log -- crash ?
- change -- crash cant undo
11Checkpointing (1)
- How does the recovery manager know which
transaction to undo an which to redo after a
failure. - Naive approach
- Examine entire log from the start. Look for begin
transaction records - if a corresponding commit record exists, redo
- if theres an abort, do nothing and
- if neither, undo.
12Checkpointing (2)
- Alternative
- Every so often
- 1) Force all log buffers to disk.
- 2) Write a checkpoint record to disk containing
- a) A list of all active transactions
- b) The most recent log records for each
transaction in a) - 3) Force all database buffers to disk - disk is
now totally up-to-date. - 4) Write address of checkpoint record to fixed
restart location (had better be atomic).
13Checkpointing (3)
- There are 5 categories of transaction
14Recovery (1)
- Look for most recent checkpoint record.
- For all records active at checkpoint must
- undo all active at failure
- redo all others
15Recovery (2)
- Have 2 lists undo and redo
- Initially, undo contains all TIDs in checkpoint
record redo is empty - 3 passes through log
- Forwards from checkpoint to end
- If we find begin_transaction add undo list.
- If we find commit, transfer from undo to redo
list. - If we find abort, remove from undo list.
- Backwards from end to checkpoint undo.
- Forwards from checkpoint to end redo.
16Commit Protocols
- Commit protocols.
- Assume a set of cooperating managers which deal
with parts of a transaction. - For atomicity we must ensure that
- At each site, either all actions or none are
performed. - All sites take the same decision on whether to
commit or abort
17Two Phase Commit (2PC) Protocol - 1
- One node, the coordinator, has a special role,
the others are participants. - The coordinator initiates the 2PC protocol.
- If any participant cannot commit, then all site
must abort.
182PC 2
- Phase I
- reach a common decision on whether to abort or
commit - Phase II
- Implement the decision at all sites
192PC - 3
202PC Phase 1
- Coordinator
- Write prepare record to log
- Multicast prepare message and set timeout
- Participant
- Wait for prepare message
- If we are willing to commit then
- force log records to stable storage
- write ready record in log
- send ready message to coordinator
- else
- write ABORT in log
- send abort answer message to coordinator
212PC Phase 2 (1)
- Coordinator
- wait for a reply messages (ready or abort) or
timeout - If timeout expires or any message is abort
- write global abort record in the log
- send abort command message to all participants
- else
- if all answers were ready
- write global commit record to log
- send commit command message to all participants
222PC Phase 2 (2)
- Participants
- Wait for command message (abort or commit)
- write abort or commit in the log
- send ack message to coordinator
- execute command (may be null)
- Coordinator
- wait for ack messages from all participants
- write complete in the log
232PC Site Failures
- Resilient to all failures in which no log
information is lost. - Site failures
- participants fails before having written ready to
log - timeout expires ---gt ABORT
- Participants fails after having written ready to
log - Msg sent -- others take decision. This node gets
outcome from the coordinator or other
participants after restart - Msg unsent -- timeout expires ---gt ABORT
242PC Coordinator Failures
- Coordinator fails after writing prepare but
before global commit/global abort (globalX). - All participants must wait for recovery of
coordinator -gt BLOCKING - Recovery of coordinator involves restarting
protocol from identities in prepare log record - Participants must identify duplicate prepare
messages - Coordinator fails after having written global X
but before writing complete. - On restart, coordinator must resend decision, to
ensure blocked processes get it. Others must
discard duplicate. - Coordinator fails after having written complete.
- No action needed
252PC Lost Messages
- A reply message (ready or abort) from a
participant is lost. - Timeout expires -- coordinator ABORTs
- A prepare message is lost.
- Timeout expires -- coordinator ABORTs
- A commit/abort command message is lost.
- Timeout in participant -- request repetition of
command from the coordinator. - An ack message is lost
- Timeout in coordinator -- coordinator resends
command
262PC - Partitions
- Everything aborts as coordinator cant contact
all participants. Those participants in
partition without coordinator may remain blocked
the resources are still retained until the
blocked participants are unblocked.
272PC - Comments
- Blocking is a problem if the coordinator or
network fails which reduces availability -gt use
3PC. - Unilateral abort.
- Any node can abort until it sends ready (site
autonomy before the ready state). - Efficiency can be increased
- Elimination of prepare messages. The
participants, that can commit, will automatically
send RM. - Presumed commit/abort , if theres no information
found in the log. See CER84 13.5.1,2,3.
28Impossible Termination in 2PC
- No operational participant has received the
command. The operational participants are in the
R state, but they havent received the ACM or
CCM, AND - At least one participant failed. Unfortunately
the failed participant acted as the coordinator.
29Impossible Termination in 2PC
- The failed participant might have already
performed an undone action (commit or abort),
i.e. in the C or A state. - The operational participants cant know what the
failed participant had done, and cant take an
independent decision. - The problem is solved by the 3PC.
303PC (1)
3PC
Restart 2
Restart 1
313PC (2)
- Case study
- See slide no 3.
- London Coordinator Participant1
- Depok Participant2
323PC (3)
- 3PC avoids problems with 2PC
- If any operational participant has received an
abort then all can abort. The failed participant
will abort at restart if it hasnt already. As
2PC E.g. Depok fails, London is operational and
has received an ACM. - If any participants has received the PCM, then
all can commit. The failed participant
(e.g.cannot have aborted unilaterally, because it
had answered READY (RM). The failed participant
will commit at restart (see restart 1). E.g.
London fails, Depok is operational and has
received the PCM.
333PC (4)
- If none of the operational participants has
received the PCM participant, i.e. all of the
operational participants are in the R state, then
2PC would block. With 3PC we can abort safely
since the failed participant cannot have
committed. At most it has received the PCM -gt it
can abort at restart (see restart 2). E.g.
London fails, Depok is operational and has NOT
received the PCM (in the R state).
343PC (5)
- 3PC guarantees that there wont be blocking
condition caused by all possible failures during
the 2nd phase. - Failures during the 3rd phase -gt blocking???
- If coordinator fails in 3rd phase, then elect
another and continue the commit process (since
all must be in the PC state).
35Consistency Isolation
- Consistency isolation -gt concurrency control.
- The Lost Update Problem
36The Uncommitted Dependency (Temporary Update)
Problem
37The Inconsistent Analysis Problem
before the update by transaction2
Transaction 1
Transaction 2
sum 0Read Asum sum A
Read A
Read B
Update A
Update B
COMMIT
Read Bsum sum B
after the update by transaction2
38Concurrent Transactions
- If we have concurrent transactions, we must
prevent interference. - c.f. lost update problem
- Prevent T2s read (because T1 has seen it and may
update it) Locking - Prevent T1s update (because T2 has seen it)
Locking - Prevent T2s update (because T1 has already
updated it and so this is based on obsolete
values) timestamping - Have them work independently and resolve
difficulties on commit.Optimistic concurrency
control
39Serializability
- What we need is some notion of correctness.
- Serializability is usually used write to
transactions.
40Serial Transactions
- Two transactions execute serially if all
operations of one precede all operations of the
other. e.g - S1 Ri(x) Wi(x) Ri(y) Rj(x) Wj(y) Rk(y)
Wk(x), or - S1 TiTjTk, S2 TkTjTi, ..
- S1 Schedule 1, S2 Schedule 2
- All serial schedules are correct, but restrictive
of concurrency .
41Transaction Conflict
- Two operations are in conflict if
- At least one is a write
- They both act on the same data
- They are issued by different transactions
- Which of the following are in conflict?
- Ri(x) Rj(x) Wi(y) Rk(y) Wj(x)
42Computationally Equivalent
- Two schedules (S1 S2) are computationally
equivalent if - The same operations are involved (possibly
reordered) - For every pair of operations in conflict (Oi
Oj),such that Oi precedes Oj in S1, then also Oi
precedes Oj in S2.
43Serializable Schedule
- A schedule is serializable if it is
computationally equivalent to a serial schedule.
e.g Ri(x) Rj(x) Wj(y) Wi(x) (which is not a
serial schedule) is computationally equivalent
to Rj(x) Wj(y) Ri(x) Wi(x) - (which is a serial schedule TjTi)
- The following is NOT a serial schedule. But is it
serialisable? Ri(x) Rj(x) Wi(y) Rk(y)
Wj(x)The above schedule is computationally
equivalent to serial schedules TiTjTk, TiTkTj.
44Serializability in Distributed Systems (1)
- A local concurrency control mechanism isnt
sufficient. e.g - Site 1 Ri(x) Wi(x) Rj(y) Wj(x) i.e Ti lt Tj
- Site 2 Rj(y) Wj(y) Ri(y) Wi(y) i.e Tj lt Ti
45Serializability in Distributed Systems (2)
- Let T1Tn be a set of transactions and E be an
execution of these modeled by schedules S1Sm on
machines 1m. - Each local schedule (S1Sm) is serialisable.
- Then E is serialisable (in distributed systems)
if, for all i and j, all conflicting operations
from Ti and Tj in each of the schedules have the
same order i.e. there is a global total ordering
for all sites.
46Locking (1)
- How to implement serializability ? use locking
- Shared/eXclusive (Read/Write) locks
- A transaction T must have SLockx or XLockx before
any Read X. - A transaction T must have XLockx before any Write
X. - A transaction T must issue unLockx after Read x
or Write x is completed.
47Locking (2)
- A transaction T can upgrade the lock, i.e.
issuing a XLockx after having SLockx, as long as
T is the only transaction having Slockx.
Otherwise T must wait. - A transaction T can downgrade the lock, i.e.
issuing a SLockx after having XLockx.
48Locking (3)
- E.g.T1 X X Y T2 Y X Y
- If initially X20, Y30 then either
- S1 T1 lt T2 X50, Y80
- S2 T2 lt T1 X70, Y50
- Both are serial schedules, thus both are correct.
49Locking (4)
- However using Shared/eXclusive (Read/Write) locks
does NOT guarantee serializability. - If any transaction releases a lock and then
acquires another, it may produce incorrect
results.
50Locking (5)
51Locking (6)
- What is the problem?
- It was too early unlocking Y in T1 and unlocking
X in T2. See the italics unLock Y and unLock X. - What is the solution?
- 2 Phase Locking (2PL).
522PL - 1
- Two phase locking (2PL)
- Before operating on any object the transaction
must obtain a lock for it. - After releasing a lock the transaction never
acquires more locks - 2 phases
- Expanding (growing) phase acquiring new locks,
but NEVER releasing any locks. - Shrinking phase releasing existing locks, but
NEVER acquiring new locks.
532PL - 2
- Exercise modify the schedule on slide 50 by
following the 2 PL. - 2PL may cause deadlocks. See ELM00.
- If a schedule obeys 2PL it is serializable.
- How is the vice versa? Do all serializable
schedules follow the 2 PL?
542PL - 3
55Optimistic Concurrency Control
- Locking is pessimistic. Assume instead that
contention is rare - All updates made to a private copy
- On commit see if there are conflicts with other
transactions started afterwards. - If not, install changes atomically
- else ABORT
- Deadlock free maximum parallelism, but may get
livelock. - What is livelock?
56Timestamping (1)
- Again, no deadlock
- Rules
- Each transaction receives a globally unique
timestamp, TSi when started. - Updates are not physically installed until
commit. - Every objects in the database carries the
timestamp of the last transaction to read it
(RTM(x)) and the last to write it (WTM(x))
57Timestamping (2)
- If a transaction, Ti, requests an operation that
conflicts with a younger transaction Tj, then Ti
is restarted with a new timestamp. - An operation from Ti is in conflict with an
operation from Tj if. - - It is a read and the object has already been
- update by Tj i.e. TSi lt WTM(x), read operation
is rejected Ti is started with new time stamp.
If the read is OK, set RTM(x) max(TSi,RTM(x)) - - It is update and the object has already been
- read or update by Tj i.e. TSi lt RTM(x)
or - TSi lt WTM(x), update operation is rejected
Ti is started with new time stamp. If the
update is OK, set WTM(x) TSi.
58References
- CER84 Ceri, S., G. Pelagatti. Distributed
Databases Principles and Systems. New York
McGraw-Hill, 1984 - ELM00 Elmasri R,. S.B. Navathe. Fundamentals of
Database Systems 3rd ed. Reading Addison-Wesley,
2000