Title: Distributed Systems:
1Distributed Systems Shared Data
2Overview of chapters
- Introduction
- Co-ordination models and languages
- General services
- Distributed algorithms
- Shared data
- Ch 13 Transactions and concurrency control,
13.1-13.4 - Ch 14 Distributed transactions
- Ch 15 Replication
3Overview
- Transactions and locks
- Distributed transactions
- Replication
4Overview
- Transactions
- Nested transactions
- Locks
- Distributed transactions
- Replication
5Transactions Introduction
- Environment
- data partitioned over different servers on
different systems - sequence of operations as individual unit
- long-lived data at servers (cfr. Databases)
- transactions approach to achieve consistency
of data in a distributed environment
6Transactions Introduction
A
Person 1 Withdraw ( A, 100) Deposit (B, 100)
Person 2 Withdraw ( C, 200) Deposit (B, 200)
C
B
7Transactions Introduction
- Critical section
- group of instructions ? indivisible block wrt
other cs - short duration
- atomic operation (within a server)
- operation is free of interference from operations
being performed on behalf of other (concurrent)
clients - concurrency in server ? multiple threads
- atomic operation ltgt critical section
- transaction
8Transactions Introduction
- Critical section
- atomic operation
- transaction
- group of different operations properties
- single transaction may contain operations on
different servers - possibly long duration
ACID properties
9Transactions ACID
- Properties concerning the sequence of operations
that read or modify shared data - tomicity
- onsistency
- solation
- urability
10Transactions ACID
- Atomicity or the all-or-nothing property
- a transaction
- commits completes successfully or
- aborts has no effect at all
- the effect of a committed transaction
- is guaranteed to persist
- can be made visible to other transactions
- transaction aborts can be initiated by
- the system (e.g. when a node fails) or
- a user issuing an abort command
11Transactions ACID
- Consistency
- a transaction moves data from one consistent
state to another - Isolation
- no interference from other transactions
- intermediate effects invisible to other
transactions - The isolation property has 2 parts
- serializability running concurrent transactions
has the same effect as some serial ordering of
the transactions - Failure isolation a transaction cannot see the
uncommitted effects of another transaction
12Transactions ACID
- Durability
- once a transaction commits, the effects of the
transaction are preserved despite subsequent
failures
13Transactions Life histories
- Transactional service operations
- OpenTransaction() ? Trans
- starts new transaction
- returns unique identifier for transaction
- CloseTransaction(Trans) ? (Commit, Abort)
- ends transaction
- returns commit if transaction committed else
abort - AbortTransaction(Trans)
- aborts transaction
14Transactions Life histories
T OpenTransaction() operation operation
. operation CloseTransaction(T)
Operations have read or write semantics
15Transactions Life histories
- History 2 abort by client
T OpenTransaction() operation operation
. operation AbortTransaction(T)
16Transactions Life histories
- History 3 abort by server
T OpenTransaction() operation operation
. operation
Server aborts!
Error reported
17Transactions Concurrency
- Illustration of well known problems
- the lost update problem
- inconsistent retrievals
- operations used implementations
- Withdraw(A, n)
- Deposit(A, n)
b A.read() A.write( b - n)
b A.read() A.write( b n)
18Transactions Concurrency
Transaction T Withdraw(A,4) Deposit(B,4)
Transaction U Withdraw(C,3) Deposit(B,3)
Interleaved execution of operations on B ? ?
19Transactions Concurrency
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
20Transactions Concurrency
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
21Transactions Concurrency
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read()
bt200
bu B.read() B.write(bu3)
22Transactions Concurrency
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read()
bt200
bu B.read() B.write(bu3)
B.write(bt4)
23Transactions Concurrency
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read()
bt200
bu B.read() B.write(bu3)
B.write(bt4)
Correct B 207!!
24Transactions Concurrency
- The inconsistent retrieval problem
Transaction T Withdraw(A,50) Deposit(B,50)
Transaction U BranchTotal()
25Transactions Concurrency
- The inconsistent retrieval problem
Transaction T A ? B 50
Transaction U BranchTotal
bt A.read() A.write(bt-50)
26Transactions Concurrency
- The inconsistent retrieval problem
Transaction T A ? B 50
Transaction U BranchTotal
bt A.read() A.write(bt-50)
bu A.read() bu bu B. read() bu bu
C.read()
bt B.read() B.write(bt50)
27Transactions Concurrency
- The inconsistent retrieval problem
Transaction T A ? B 50
Transaction U BranchTotal
bt A.read() A.write(bt-50)
bu A.read() bu bu B. read() bu bu
C.read()
bt B.read() B.write(bt50)
28Transactions Concurrency
- Illustration of well known problems
- the lost update problem
- inconsistent retrievals
- elements of solution
- execute all transactions serially?
- No concurrency ? unacceptable
- execute transactions in such a way that overall
execution is equivalent with some serial
execution - sufficient? Yes
- how? Concurrency control
29Transactions Concurrency
- The lost update problem serially equivalent
interleaving
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
30Transactions Concurrency
- The lost update problem serially equivalent
interleaving
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
31Transactions Concurrency
- The lost update problem serially equivalent
interleaving
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read() B.write(bt4)
32Transactions Concurrency
- The lost update problem serially equivalent
interleaving
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read() B.write(bt4)
bu B.read() B.write(bu3)
33Transactions Concurrency
- The lost update problem serially equivalent
interleaving
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read() A.write(bt-4)
bu C.read() C.write(bu-3)
bt B.read() B.write(bt4)
bu B.read() B.write(bu3)
34Transactions Recovery
- Illustration of well known problems
- a dirty read
- premature write
- operations used implementations
- Withdraw(A, n)
- Deposit(A, n)
b A.read() A.write( b - n)
b A.read() A.write( b n)
35Transactions Recovery
Transaction T Deposit(A,4)
Transaction U Deposit(A,3)
Interleaved execution and abort ? ?
36Transactions Recovery
Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
37Transactions Recovery
Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
bu A.read() A.write(bu3)
38Transactions Recovery
Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
bu A.read() A.write(bu3)
Commit
Abort
39Transactions Recovery
- Premature write or Over-writing uncommitted
values
Transaction T Deposit(A,4)
Transaction U Deposit(A,3)
Interleaved execution and Abort ? ?
40Transactions Recovery
- Over-writing uncommitted values
Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
41Transactions Recovery
- Over-writing uncommitted values
Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
bu A.read() A.write(bu3)
42Transactions Recovery
- Over-writing uncommitted values
Transaction T 4 ? A
Transaction U 3? A
bt A.read() A.write(bt4)
bu A.read() A.write(bu3)
Abort
43Transactions Recovery
- Illustration of well known problems
- a dirty read
- premature write
- elements of solution
- Cascading Aborts a transaction reading
uncommitted data must be aborted if the
transaction that modified the data aborts - to avoid cascading aborts, transactions can only
read data written by committed transactions - undo of write operations must be possible
44Transactions Recovery
- how to preserve data despite subsequent failures?
- usually by using stable storage
- two copies of data stored
- in separate parts of disks
- not decay related (probability of both parts
corrupted is small)
45Nested Transactions
- Transactions composed of several
sub-transactions - Why nesting?
- Modular approach to structuring transactions in
applications - means of controlling concurrency within a
transaction - concurrent sub-transactions accessing shared data
are serialized - a finer grained recovery from failures
- sub-transactions fail independent
46Nested Transactions
T Transfer
T1 Deposit
T2 Withdraw
- Sub-transactions commit or abort independently
- without effect on outcome of other
sub-transactions or enclosing transactions - effect of sub-transaction becomes durable only
when top-level transaction commits
47Concurrency control locking
- Environment
- shared data in a single server (this section)
- many competing clients
- problem
- realize transactions
- maximize concurrency
- solution serial equivalence
- difference with mutual exclusion?
48Concurrency control locking
- Protocols
- Locks
- Optimistic Concurrency Control
- Timestamp Ordering
49Concurrency control locking
- Example
- access to shared data within a transaction?
lock ( data reserved for ) - exclusive locks
- exclude access by other transactions
50Concurrency control locking
- Same example (lost update) with locking
Transaction T Withdraw(A,4) Deposit(B,4)
Transaction U Withdraw(C,3) Deposit(B,3)
Colour of data show owner of lock
51Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
52Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
53Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
54Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
55Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bu B.read()
56Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
57Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
CloseTransaction(T)
58Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
CloseTransaction(T)
59Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
CloseTransaction(T)
B.write(bu3)
60Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
CloseTransaction(T)
B.write(bu3)
CloseTransaction(U)
61Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
CloseTransaction(T)
B.write(bu3)
CloseTransaction(U)
62Concurrency control locking
- Basic elements of protocol
- serial equivalence
- requirements
- all of a transactions accesses to a particular
data item should be serialized with respect to
accesses by other transactions - all pairs of conflicting operations of 2
transactions should be executed in the same order - how?
- A transaction is not allowed any new locks after
it has released a lock - Two-phase locking
63Concurrency control locking
- Two-phase locking
- Growing Phase
- new locks can be acquired
- Shrinking Phase
- no new locks
- locks are released
64Concurrency control locking
- Basic elements of protocol
- serial equivalence ? two-phase locking
- hide intermediate results
- conflict between
- release of lock access by other transactions
possible - access should be delayed till commit/abort
transaction - how?
- New mechanism?
- (better) release of locks only at commit/abort
- strict two-phase locking
- locks held till end of transaction
65Concurrency control locking
- How increase concurrency and preserve serial
equivalence? - Granularity of locks
- Appropriate locking rules
66Concurrency control locking
- Granularity of locks
- observations
- large number of data items on server
- typical transaction needs only a few items
- conflicts unlikely
- large granularity
- limits concurrent access
- example all accounts in a branch of bank are
locked together - small granularity
- overhead
67Concurrency control locking
- Appropriate locking rules
- when conflicts?
- Read Write locks
68Concurrency control locking
For one data item
69Concurrency control locking
- Strict two-phase locking
- locking
- done by server (containing data item)
- unlocking
- done by commit/abort of the transactional service
70Concurrency control locking
- Use of locks on strict two-phase locking
- when an operation accesses a data item
- not locked yet
- lock set operation proceeds
- conflicting lock set by another transaction
- transaction must wait till ...
- non-conflicting lock set by another transaction
- lock shared operation proceeds
- locked by same transaction
- lock promoted if necessary operation proceeds
71Concurrency control locking
- Use of locks on strict two-phase locking
- when an operation accesses a data item
- when a transaction is committed/aborted
- server unlocks all data items locked for the
transaction
72Concurrency control locking
- Lock implementation
- lock manager
- managing table of locks
- transaction identifiers
- identifier of (locked) data item
- lock type
- condition variable
- for waiting transactions
73Concurrency control locking
- Deadlocks
- a state in which each member of a group of
transactions is waiting for some other member to
release a lock - no progress possible!
- Example with read/write locks
74Concurrency control locking
- Same example (lost update) with locking
Transaction T Withdraw(A,4) Deposit(B,4)
Transaction U Withdraw(C,3) Deposit(B,3)
Colour of data show owner of lock
75Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
76Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
77Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
78Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
79Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bu B.read()
80Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
81Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
B.write(bu3)
Deadlock!!
82Concurrency control locking
- Solutions to the Deadlock problem
- Prevention
- by locking all data items used by a transaction
when it starts - by requesting locks on data items in a predefined
order - Evaluation
- impossible for interactive transactions
- reduction of concurrency
83Concurrency control locking
- Solutions to the Deadlock problem
- Detection
- the server keeps track of a wait-for graph
- lock edge is added
- unlock edge is removed
- the presence of cycles may be checked
- when an edge is added
- periodically
- example
84Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
85Concurrency control locking
A
C
Held by
T
U
B
86Concurrency control locking
Transaction T A ? B 4
Transaction U C ? B 3
bt A.read()
A.write(bt-4)
bu C.read()
C.write(bu-3)
bt B.read()
bu B.read()
B.write(bt4)
B.write(bu3)
87Concurrency control locking
A
C
Held by
T
U
B
88Concurrency control locking
A
C
Held by
T
U
B
89Concurrency control locking
T
U
B
Cycle ? deadlock
90Concurrency control locking
- Solutions to the Deadlock problem
- Detection
- the server keeps track of a wait-for graph
- the presence of cycles must be checked
- once a deadlock detected, the server must select
a transaction and abort it (to break the cycle) - choice of transaction? Important factors
- age of transaction
- number of cycles the transaction is involved in
91Concurrency control locking
- Solutions to the Deadlock problem
- Timeouts
- locks granted for a limited period of time
- within period lock invulnerable
- after period lock vulnerable
92Overview
- Transactions
- Distributed transactions
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Replication
93Distributed transactions
- Definition
- Any transaction whose activities involve
multiple servers - Examples
- simple client accesses several servers
- nested server accesses several other servers
94Distributed transactions
- Serial execution of requests on different server
95Distributed transactions
- Serial or parallel execution of requests on
different servers
96Distributed transactions
97Distributed transactions
- Commit agreement between all servers involved
- to commit
- to abort
- take one server as coordinator
- simple (?) protocol
- single point of failure?
- tasks of the coordinator
- keep track of other servers, called workers
- responsible for final decision
98Distributed transactions
- New service operations
- AddServer( TransID, CoordinatorID)
- called by clients
- first operation on server that has not joined the
transaction yet - NewServer( TransID, WorkerID)
- called by new server on the coordinator
- coordinator records ServerID of the worker in its
workers list
99Distributed transactions
coordinator
A
1. T XOpenTransaction()
2. XWithdraw(A,4)
T OpenTransaction() XWithdraw(A,4) ZDeposit
(C,4) YWithdraw(B,3) ZDeposit(D,3) CloseTrans
action(T)
B
C,D
100Distributed transactions
coordinator
A
4. XNewServer(T, Z)
T OpenTransaction() XWithdraw(A,4) ZDeposit
(C,4) YWithdraw(B,3) ZDeposit(D,3) CloseTrans
action(T)
B
3. ZAddServer(T, X)
5. ZDeposit(C,4)
C,D
worker
101Distributed transactions
coordinator
A
7. XNewServer(T, Y)
T OpenTransaction() XWithdraw(A,4) ZDeposit
(C,4) YWithdraw(B,3) ZDeposit(D,3) CloseTrans
action(T)
6. YAddServer(T, X)
B
8. YWithdraw(B,3)
worker
C,D
worker
102Distributed transactions
coordinator
A
T OpenTransaction() XWithdraw(A,4) ZDeposit
(C,4) YWithdraw(B,3) ZDeposit(D,3) CloseTrans
action(T)
B
worker
9. ZDeposit(D, 3)
C,D
worker
103Distributed transactions
coordinator
A
10. XCloseTransaction(T)
T OpenTransaction() XWithdraw(A,4) ZDeposit
(C,4) YWithdraw(B,3) ZDeposit(D,3) CloseTrans
action(T)
B
worker
C,D
worker
104Distributed transactions
coordinator
A
B
worker
C,D
worker
105Overview
- Transactions
- Distributed transactions
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Replication
106Atomic Commit protocol
- Elements of the protocol
- each server is allowed to abort its part of a
transaction - if a server votes to commit it must ensure that
it will eventually be able to carry out this
commitment - the transaction must be in the prepared state
- all altered data items must be on permanent
storage - if any server votes to abort, then the decision
must be to abort the transaction
107Atomic Commit protocol
- Elements of the protocol (cont.)
- the protocol must work correctly, even when
- some servers fail
- messages are lost
- servers are temporarily unable to communicate
108Atomic Commit protocol
- Protocol
- Phase 1 voting phase
- Phase 2 completion according to outcome of vote
109Atomic Commit protocol
Coordinator Step Status
Worker Step Status
1 prepared to commit
2 prepared to commit
3 (counting votes) committed
4 committed
done
110Atomic Commit protocol
- Protocol Phase 1 voting phase
- Coordinator for operation CloseTransaction
- sends CanCommit to each worker
- behaves as worker in phase 1
- waits for replies from workers
- Worker when receiving CanCommit
- if for worker transaction can commit
- saves data items
- sends Yes to coordinator
- if for worker transaction cannot commit
- sends No to coordinator
- clears data structures, removes locks
111Atomic Commit protocol
- Protocol Phase 2
- Coordinator collecting votes
- all votes Yes
- commit transaction send DoCommit to workers
- one vote No
- abort transaction
- Worker voted yes, waits for decision of
coordinator - receives DoCommit
- makes committed data available removes locks
- receives AbortTransaction
- clears data structures removes locks
112Atomic Commit protocol
- Timeouts
- worker did all/some operations and waits for
CanCommit - unilateral abort possible
- coordinator waits for votes of workers
- unilateral abort possible
- worker voted Yes and waits for final decision of
coordinator - wait unavoidable
- extensive delay possible
- additional operation GetDecision can be used to
get decision from coordinator or other workers
113Atomic Commit protocol
- Performance
- C ? W CanCommit N-1 messages
- W ? C Yes/No N-1 messages
- C ? W DoCommit N-1 messages
- W ? C HaveCommitted N-1 messages
- (unavoidable) delays possible
114Atomic Commit protocol
- Nested Transactions
- top level transaction subtransactions
- transaction tree
115Atomic Commit protocol
T11
T12
T21
T1
T
T22
T2
116Atomic Commit protocol
- Nested Transactions
- top level transaction subtransactions
- transaction tree
- coordinator top level transaction
- subtransaction identifiers
- globally unique
- allow derivation of ancestor transactions(why
necessary?)
117Atomic Commit protocol
- Nested Transactions Transaction IDs
118Atomic Commit protocol
- Upon completion of a subtransaction
- independent decision to commit or abort
- commit of subtransaction
- only provisionally
- status (including status of descendants) reported
to parent - final outcome dependant on its ancestors
- abort of subtransaction
- implies abort of all its descendants
- abort reported to its parent (always possible?)
119Atomic Commit protocol
- Data structures
- commit list list of all committed
(sub)transactions - aborts list list of all aborted
(sub)transactions - example
120Atomic Commit protocol
121Atomic Commit protocol
122Atomic Commit protocol
123Atomic Commit protocol
T1
T12
T
T21
Z
N
T2
124Atomic Commit protocol
125Atomic Commit protocol
T11
T1
T12
T
T21
Z
N
T2
126Atomic Commit protocol
127Atomic Commit protocol
T11
abort
T1
T12
T
T21
Z
N
T2
128Atomic Commit protocol
129Atomic Commit protocol
T11
abort
T1
T12
T
T21
Z
N
T2
130Atomic Commit protocol
131Atomic Commit protocol
T11
abort
T1
T12
commit
T
T21
Z
N
T2
132Atomic Commit protocol
133Atomic Commit protocol
T11
abort
T1
T12
commit
commit
T
T21
Z
N
T2
134Atomic Commit protocol
135Atomic Commit protocol
136Atomic Commit protocol
T11
abort
T1
T12
commit
commit
T
T21
Z
N
T2
137Atomic Commit protocol
138Atomic Commit protocol
T11
abort
T1
T12
commit
commit
T
T21
Z
N
T2
139Atomic Commit protocol
140Atomic Commit protocol
T11
abort
T1
T12
commit
commit
T
T21
commit
Z
N
T2
141Atomic Commit protocol
142Atomic Commit protocol
T11
abort
T1
T12
commit
commit
T
T21
commit
Z
N
T2
T22
143Atomic Commit protocol
144Atomic Commit protocol
T11
abort
T1
T12
commit
commit
T
T21
commit
Z
N
T2
T22
commit
145Atomic Commit protocol
146Atomic Commit protocol
T11
abort
T1
T12
commit
commit
T
T21
commit
Z
N
T2
T22
commit
abort
147Atomic Commit protocol
148Atomic Commit protocol
149Atomic Commit protocol
- Data structures final data
150Atomic Commit protocol
- Algorithm of coordinator (flat protocol)
- Phase 1
- send CanCommit to each worker in commit list
- TransactionId T
- abort list
- coordinator behaves as worker
- Phase 2 (as for non-nested transactions)
- all votes Yes
- commit transaction send DoCommit to workers
- one vote No
- abort transaction
151Atomic Commit protocol
- Algorithm of worker (flat protocol)
- Phase 1 (after receipt of CanCommit)
- at least one (provisionally) committed descendant
of top level transaction - transactions with ancestors in abort list are
aborted - prepare for commit of other transactions
- send Yes to coordinator
- no (provisionally) committed descendant
- send No to coordinator
- Phase 2 (as for non-nested transactions)
152Atomic Commit protocol
- Algorithm of worker (flat protocol)
- Phase 1 (after receipt of CanCommit)
- Phase 2 voted yes, waits for decision of
coordinator - receives DoCommit
- makes committed data available removes locks
- receives AbortTransaction
- clears data structures removes locks
153Atomic Commit protocol
- Timeouts
- same 3 as above
- worker did all/some operations and waits for
CanCommit - coordinator waits for votes of workers
- worker voted Yes and waits for final decision of
coordinator - provisionally committed child with an aborted
ancestor - does not participate in algorithm
- has to make an enquiry itself
- when?
154Atomic Commit protocol
- Data structures final data
155Atomic Commit protocol
T11
abort
T1
T12
commit
commit
T
T21
commit
Z
N
T2
T22
commit
abort
156Overview
- Transactions
- Distributed transactions
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Replication
157Distributed transactions Locking
- Locks are maintained locally (at each server)
- it decides whether
- to grant a lock
- to make the requesting transaction wait
- it cannot release the lock until it knows whether
the transaction has been - committed
- aborted
- at all servers
- deadlocks can occur
158Distributed transactions Locking
- Locking rules for nested transactions
- child transaction inherits locks from parents
- when a nested transaction commits, its locks are
inherited by its parents - when a nested transaction aborts, its locks are
removed - a nested transaction can get a read lock when all
the holders of write locks (on that data item)
are ancestors - a nested transaction can get a write lock when
all the holders of read and write locks (on that
data item) are ancestors
159Distributed transactions Locking
T11
A
T1
T12
T
T21
Z
N
T2
T22
160Overview
- Transactions
- Distributed transactions
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Replication
161Distributed deadlocks
- Single server approaches
- prevention difficult to apply
- timeouts value with variable delays?
- Detection
- global wait-for-graph can be constructed from
local ones - cycle in global graph possible without cycle in
local graph
162Distributed transactions Deadlocks
W
U
V
163Distributed transactions Deadlocks
- Algorithms
- centralised deadlock detection not a good idea
- depends on a single server
- cost of transmission of local wait-for graphs
- distributed algorithm
- complex
- phantom deadlocks
- edge chasing approach
164Distributed transactions Deadlocks
- Phantom deadlocks
- deadlock detected that is not really a deadlock
- during deadlock detection
- while constructing global wait-for graph
- waiting transaction is aborted
165Distributed transactions Deadlocks
- Edge Chasing
- distributed approach to deadlock detection
- no global wait-for graph is constructed
- servers attempt to find cycles
- by forwarding probes ( messages) that follow
edges of the wait-for graph throughout the
distributed system
166Distributed transactions Deadlocks
- Edge Chasing
- three steps
- initiation transaction starts waiting
- new probe constructed
- detection probe received
- extend probe
- check for loop
- forward new probe
- resolution
167Distributed transactions Deadlocks
- Edge Chasing initiation
- send out probe
- when transaction T starts waiting for U (and U
is already waiting for ) - in case of lock sharing, different probes are
forwarded
T ? U
168Distributed transactions Deadlocks
Initiation
W
C
Z
U
V
169Distributed transactions Deadlocks
- Edge Chasing detection
- when receiving probe
- Check if U is waiting
- if U is waiting for V (and V is waiting)add V to
probe - check for loop in probe?
- yes ? deadlock
- no ? forward new probe
T ? U
T ? U ? V
170Distributed transactions Deadlocks
Initiation
W
C
Z
U
V
171Distributed transactions Deadlocks
- Edge Chasing resolution
- abort one transaction
- problem?
- Every waiting transaction can initiate deadlock
detection - detection may happen at different servers
- several transactions may be aborted
- solution transactions priorities
172Distributed transactions Deadlocks
- Edge Chasing transaction priorities
- assign priority to each transaction, e.g. using
timestamps - solution of problem above
- abort transaction with lowest priority
- if different servers detect same cycle, the same
transaction will be aborted
173Distributed transactions Deadlocks
- Edge Chasing transaction priorities
- other improvements
- number of initiated probe messages ?
- detection only initiated when higher priority
transaction waits for a lower priority one - number of forwarded probe messages ?
- probes travel downhill -from transaction with
high priority to transactions with lower
priorities - probe queues required more complex algorithm
174Overview
- Transactions
- Distributed transactions
- Flat and nested distributed transactions
- Atomic commit protocols
- Concurrency in distributed transactions
- Distributed deadlocks
- Transaction recovery
- Replication
175Transactions and failures
- Introduction
- Approaches to fault-tolerant systems
- replication
- instantaneous recovery from a single fault
- expensive in computing resources
- restart and restore consistent state
- less expensive
- requires stable storage
- slow(er) recovery process
176Transactions and failures
- Overview
- Stable storage
- Transaction recovery
- Recovery of the two-phase commit protocol
177Transactions and failures Stable storage
- Ensures that any essential permanent data will be
recoverable after any single system failure - allow system failures
- during a disk write
- damage to any single disk block
- hardware solution ? RAID technology
- software solution
- based on pairs of blocks for same data item
- checksum to determine whether block is good or
bad
178Transactions and failures Stable storage
- Based on the following invariant
- not more than one block of any pair is bad
- if both are good
- same data
- except during execution of write operation
- write operation
- maintains invariant
- writes on both blocks are done strictly
sequential - restart of stable storage server after crash
- recovery procedure to restore invariant
179Transactions and failures Stable storage
- Recovery for a pair
- both good and the same
- ok
- one good, one bad
- copy good block to bad block
- both good and different
- copy one block to the other
180Transactions and failures
- Overview
- Stable storage
- Transaction recovery
- Recovery of the two-phase commit protocol
181Transactions and failures Transaction recovery
- atomic property of transaction implies
- durability
- data items stored in permanent storage
- data will remain available indefinitely
- failure atomicity
- effects of transactions are atomic even when
servers fail - recovery should ensure durability and failure
atomicity
182Transactions and failures Transaction recovery
- Assumptions about servers
- servers keep data in volatile storage
- committed data recorded in a recovery file
- single mechanism recovery manager
- save data items in permanent storage for
committed transactions - restore the servers data items after a crash
- reorganize the recovery file to improve
performance of recovery - reclaim storage space in the recovery file
183Transactions and failures Transaction recovery
- Elements of algorithm
- each server maintains an intention list for all
of its active transactions pairs of - name
- new value
- decision of server prepared to commit a
transaction - intention list saved in the recovery file
(stable storage) - server receives DoCommit
- commit recorded in recovery file
- after a crash based on recovery file
- effects of committed transactions restored (in
correct order) - effects of other transactions neglected
184Transactions and failures Transaction recovery
- Alternative implementations for recovery file
- logging technique
- shadow versions
- (see book for details)
185Transactions and failures
- Overview
- Stable storage
- Transaction recovery
- Recovery of the two-phase commit protocol
186Transactions and failures two-phase commit
protocol
- Server can fail during commit protocol
- each server keeps its own recovery file
- 2 new status values
- done
- uncertain
187Transactions and failures two-phase commit
protocol
- meaning of status values
- committed
- coordinator outcome of votes is yes
- worker protocol is complete
- done
- coordinator protocol is complete
- uncertain
- worker voted yes outcome unknown
188Transactions and failures two-phase commit
protocol
- Recovery actions (status_at_) in recovery file
- prepared_at_coordinator
- no decision before failure of server
- send AbortTransaction to all workers
- aborted_at_coordinator
- send AbortTransaction to all workers
- committed_at_coordinator
- decision to commit taken before crash
- send DoCommit to all workers
- resume protocol
189Transactions and failures two-phase commit
protocol
- Recovery actions (status_at_) in recovery file
- committed_at_worker
- send HaveCommitted to coordinator
- uncertain_at_worker
- send GetDecision to coordinator to get status
- prepared_at_worker
- not yet voted yes
- unilateral abort possible
- done_at_coordinator
- no action required
190Overview
- Transactions
- Distributed transactions
- Replication
- System model and group communication
- Fault-tolerant services
- Highly available services
- Transactions with replicated data
191Replication
- A technique for enhancing services
- Performance enhancement
- Increased availability
- Fault tolerance
- Requirements
- Replication transparency
- Consistency
192Overview
- Transactions
- Distributed transactions
- Replication
- System model and group communication
- Fault-tolerant services
- Highly available services
- Transactions with replicated data
193System model and group communication
194System model and group communication
- 5 phases in the execution of a request
- FE issues requests to one or more RMs
- Coordination needed to execute requests
consistently - FIFO
- Causal
- Total
- Execution by all managers, perhaps tentatively
- Agreement
- Response
195System model and group communication
- Need for dynamic groups!
- Role of group membership service
- Interface for group membership changes
create/destroy groups, add process - Implementing a failure detector monitor group
members - Notifying members of group membership changes
- Performing group address expansion
- Handling network partitions group is
- Reduced primary-partition
- Split partitionable
196System model and group communication
197System model and group communication
- View delivery
- To all members when a change in membership occurs
- ltgt receive view
- Event occurring in a view v(g) at process p
- Basic requirements for view delivery
- Order if process p delivers v(g)
and then v(g) then no
process delivers v(g) before v(g) - Integrity if p delivers v(g) then p ?
v(g) - Non-triviality if q joins group and remains
reachable then eventually q ? v(g) at
p
198System model and group communication
- View-synchronous group communication
- Reliable multicast handle changing group views
- Guarantees
- Agreement correct processes deliver the same set
of messages in any given view - Integrity if a process delivers m, it will
not deliver it again - Validity if the system fails to deliver m to q
then other processes will
deliver v(g) (v(g) q)
before delivering m
199System model and group communication
200Overview
- Transactions
- Distributed transactions
- Replication
- System model and group communication
- Fault-tolerant services
- Highly available services
- Transactions with replicated data
201Fault-tolerant services
- Goal provide a service that is correct
despite up to f process failures - Assumpti