Title: Transactions and Concurrency
1 Transactions and Concurrency
2Objectives
- Function and importance of transactions.
- Properties of transactions.
- Concurrency Control
- Meaning of serializability.
- How locking can ensure serializability.
- Deadlock and how it can be resolved.
- How timestamping can ensure serializability.
3Transaction Support
- Transaction
- Action, or series of actions, carried out by
user or application, which accesses or changes
contents of database. - Logical unit of work on the database.
- Transforms database from one consistent state to
another, although consistency may be violated
during transaction.
4Example Transaction
Transfer 50 from account A to account
B read(A) AA-50 write(A) read(B) BB50 write(B)
Note, for the transaction to be consistant, AB
have to be the same when it's done. This IS true
at the end, but it's NOT true during the
transaction.
5Transaction Support
- Can have one of two outcomes
- Success - transaction commits and database
reaches a new consistent state. - Failure - transaction aborts, i.e. cancelled, and
database must be restored to consistent state
before it started. - Such a transaction is rolled back or undone.
- Committed transaction cannot be aborted.
- Aborted transaction that is rolled back can be
restarted later.
6Concurrency Control
- Process of managing simultaneous operations on
the database without having them interfere with
one another. - Prevents interference when two or more users are
accessing database simultaneously and at least
one is updating data. - Although two transactions may be correct in
themselves, interleaving of operations may
produce an incorrect result.
7Need for Concurrency Control
- Three examples of potential problems caused by
concurrency - Lost update problem.
- Uncommitted dependency problem.
- Inconsistent analysis problem.
8Lost Update Problem
- Successfully completed update is overridden by
another user. - Uh-oh! Even though transaction 2 added 100
it doesn't show up - How do we avoid this problem?
9Uncommitted Dependency Problem
- One transaction sees the uncommitted intermediate
result of another. - T2 starts adding 100, but rolls back for some
reason. Unfortunately, T1 got the new value
before roll back. - How do we solve this problem?
10Inconsistent Analysis Problem
- Occurs when transaction reads several values but
second transaction updates some of them during
execution of first. - Sometimes referred to as dirty read or
unrepeatable read.
11Inconsistent Analysis Problem
- How do we solve this problem?
12Serializability
- Objective of a concurrency control protocol is to
schedule transactions in such a way as to avoid
any interference. - Could run transactions serially, but this limits
degree of concurrency or parallelism in system. - Serializability identifies those executions of
transactions guaranteed to ensure consistency.
13Jargon
- Schedule
- Sequence of reads/writes by set of concurrent
transactions. - Serial Schedule
- Schedule where operations of each transaction
are executed consecutively without any
interleaved operations from other transactions.
14A Serial Schedule Ends in a consistant state
Transaction 1 Transfer 50 from A to
B Transaction 2 Transfer 10 of A to
B Time T1 T2 A B 1 read(A) 100
100 2 AA-50 100 100 3 write(A)
50 100 4 read(B) 50 100 5 BB50
50 100 6 write(B) 50 150 7 r
ead(A) 50 150 8 temp A 0
.1 50 150 9 AA-temp 50 150 10 wr
ite(A) 45 150 11 read(B) 45 150 12
BBtemp 45 150 13 write(B) 45 1
55
15More Jargon Nonserial Schedule Serializable
- A Nonserial schedule is a schedule where
operations from set of concurrent transactions
are interleaved. - Objective of serializability is to find nonserial
schedules that allow transactions to execute
concurrently without interfering with one
another. - In other words, want to find nonserial schedules
that are equivalent to some serial schedule. Such
a schedule is called serializable.
16A Serializable Schedule Ends in a consistant
state
Time Tran1 Tran2 A B 1 read(A)
100 100 2 AA-50 100 100 3 write(A)
50 100 4 read(A) 50 100 5
temp A 0.1 50 100 6 A A
temp 50 100 7 write(A) 45 100 8 re
ad(B) 45 100 9 BB50 45 100 1
0 write(B) 45 150 11 read(B)
45 150 12 B B temp 45 155 13
write(B) 45 155
17A Nonserializable Schedule whence the extra 10?
Time Tran1 Tran2 A B 1 read(A)
100 100 2 AA-50 100 100 3 read(A
) 100 100 4 temp A
0.1 100 100 5 A A temp 100 100 6
write(A) 90 100 7 read(B) 90 10
0 8 write(A) 50 100 9 read(B)
50 100 10 BB50 50 100 11 write(B)
50 150 12 B B
temp 50 150 13 write(B) 50 160
18Is this schedule serializable?
- Want to see if the schedule is equivalent to a
serial schedule. - There are ways to compare two schedules to see if
they are equivalent. - Conflict serializable
- View serializable
19Conflict Serializability
- In determining serializability, only the ordering
of read and writes is important - (a) If two transactions only read a data item,
they do not conflict and order is not important. - (b) If two transactions either read or write
completely separate data items, they do not
conflict and order is not important. - (c) If one transaction writes a data item and
another reads or writes same data item, order of
execution is important.
20Example of Conflict
Time Tran1 Tran2 1 read(A) 2 write(A) 3
read(A) 4 write(A) 5 read(B)
6 write(B) 7 read(B) 8 write(B
) Time 4 and time 5 don't conflict they're
different items, so you can switch the
order Time 2 and 3 conflict you can't just
swap the order
21That last schedule IS conflict serializable
It's conflict serializable because you can turn
it into a serialized schedule by doing the
following steps 1. swap 4 and 5 2. swap 3
and 4 3. swap 5 and 6 4. swap 4 and
5 Time Tran1 Tran2 1 read(A) 2 write(
A) 3 read(B) 4 write(B) 5 r
ead(A) 6 write(A) 7 read(B) 8
write(B)
22Visual way Precedence Graph
- Create
- node for each transaction
- a directed edge Ti ? Tj, if Tj reads the value of
an item written by Ti - a directed edge Ti ? Tj, if Tj writes a value
into an item after it has been read or written by
Ti. - If precedence graph contains cycle schedule is
not conflict serializable.
23Example - Non-conflict serializable schedule
24View Serializability
- Offers less stringent definition of schedule
equivalence than conflict serializability. - Two schedules S1 and S2 are view equivalent if
- For each data item x, if Ti reads initial value
of x in S1, Ti must also read initial value of x
in S2. - For each read on x by Ti in S1, if value read by
x is written by Tj, Ti must also read value of x
produced by Tj in S2. - For each data item x, if last write on x
performed by Ti in S1, same transaction must
perform final write on x in S2.
25Example - View Serializable schedule
26How Long These Take To Compute
- To determine if something is conflict
serializable, construct a precidence diagram and
look for cycles. Finding cycles in a graph takes
O(N2) - To determine if something is view serializable is
much more computationally difficult in fact it
turns out to be NP complete it'll grow
exponentially with the number of transactions
27Concurrency Control Techniques
- Two basic concurrency control techniques
- Locking,
- Timestamping.
- Both are conservative approaches delay
transactions in case they conflict with other
transactions. - Optimistic methods assume conflict is rare and
only check for conflicts at commit.
28Locking
- Transaction uses locks to deny access to other
transactions and so prevent incorrect updates. - Most widely used approach to ensure
serializability. - Generally, a transaction must claim a shared
(read) or exclusive (write) lock on a data item
before read or write. - Lock prevents another transaction from modifying
item or even reading it, in the case of a write
lock.
29Locking - Basic Rules
- If transaction has shared lock on item, can read
but not update item. - If transaction has exclusive lock on item, can
both read and update item. - Reads cannot conflict, so more than one
transaction can hold shared locks simultaneously
on same item. - Exclusive lock gives transaction exclusive access
to that item.
30When to release a lock
You don't always want to release a lock as soon
as possible T1 transfer 50 from B to A, T2
display the sum of accounts A and
B write_lock(B) read(B) BB-50 write(B) unlock(B)
read_lock(A) read(A) unlock(A)
read_lock(B) read(B) unlock(B) di
splay(AB) WRONG ANSWER! ( write_lock(A) read(A
) AA50 write(A) unock(A)
31Two-Phase Locking (2PL)
- Transaction follows 2PL protocol if all locking
operations precede first unlock operation in the
transaction. - Two phases for transaction
- Growing phase - acquires all locks but cannot
release any locks. - Shrinking phase - releases locks but cannot
acquire any new locks.
32Preventing the lost update problem with 2PL
Transaction 1 debit account by 10 Transaction
2 credit account by 100 Time
T1 T2 1 begin 2 begin writ
e_lock(A) 3 write_lock(A) read(A) 4 WAIT
AA100 5 WAIT write(A) 6 WAIT
commit/unlock(A) 7 read(A) 8 AA-10 9 wri
te(A) 10 commit/unlock(A)
33Problems with 2PL
- If every transaction in a schedule follows 2PL,
schedule is conflict serializable. - However, problems can occur with interpretation
of when locks can be released. - Cascading rollback
- Deadlock
34Cascading Rollback
35Cascading Rollback
- Transactions conform to 2PL.
- T14 aborts.
- Since T15 is dependent on T14, T15 must also be
rolled back. Since T16 is dependent on T15, it
too must be rolled back. - This is called cascading rollback.
- To prevent this with 2PL, leave release of all
locks until end of transaction.
36Another Problem with 2PL Deadlock
Deadlock when two or more transactions are each
waiting for locks to be released that are held by
the other. Time T1 T2 1 begin 2 write
_lock(A) begin 3 read(A) write_lock(B) 4
AA-10 read(B) 5 write(A) BB100 6 writ
e_lock(B) write(B) 7 WAIT write_lock(A) 8
WAIT WAIT 9 WAIT WAIT 10 ( (
( ( ( ( 11 ( ( ( ( ( (
37Deadlock
- Only one way to break deadlock cancel one or
more of the transactions. - Deadlock should be transparent to user, so DBMS
should restart transaction(s). - Three general techniques for handling deadlock
- Timeouts.
- Deadlock prevention.
- Deadlock detection and recovery.
38Timeouts
- Transaction that requests lock will only wait for
a system-defined period of time. - If lock has not been granted within this period,
lock request times out. - In this case, DBMS assumes transaction may be
deadlocked, even though it may not be, and it
aborts and automatically restarts the
transaction.
39Deadlock Prevention
- DBMS looks ahead to see if transaction would
cause deadlock and never allows deadlock to
occur. - Let's say we have three transactions 1,2 and 3,
which are started in that order (1 first, then 2,
then 3). - Wait-Die If 1 wants access to something 2 has
locked, 1 can wait. If 3 wants access to
something 2 has locked, it gets killed (i.e.
Rolled back) - Wound-Wait If 1 wants access to something 2 has
a lock on, then 1 wins and 2 gets rolled back.
If 3 wants access to something 2 has a lock on,
the 3 will wait.
40Deadlock Detection and Recovery
- DBMS allows deadlock to occur but recognizes it
and breaks it. - Usually handled by construction of wait-for graph
(WFG) showing transaction dependencies - Create a node for each transaction.
- Create edge Ti -gt Tj, if Ti waiting to lock item
locked by Tj. - Deadlock exists if and only if WFG contains
cycle. - WFG is created at regular intervals.
41Timestamping
- Transactions ordered globally so that older
transactions, transactions with smaller
timestamps, get priority in the event of
conflict. - Conflict is resolved by rolling back and
restarting transaction. - No locks so no deadlock.
42Timestamping
- Each data item gets two time stamps
- read_timestamp the timestamp of the last
transaction to successfully read the item - write_timestamp the timestamp of the last
transaction to successfully write the item
43Timestamp Ordering Protocol
The timestamping protocol ensures that any
conflicting read and write operations are
executed in the order of the timestamps of their
transactions
Suppose T1 issues read(A) if TS(T1) lt
write_timestamp(A) this means the item to be read
has already been written by another transaction
so, reject the read and roll T1 back otherwise,
the last time this item was written was before T1
started, so everything is ok, read the value and
set the read_timestamp of (A) to T1's timestamp.
44Timestamp Ordering Protocol
Suppose T1 issues write(A) If TS(T1) lt
read_timestamp(A), later concurrent transaction
has already read the value you're about to write
over, so don't do it roll T1 back and restart
it. If TS(T1) lt write_timestamp(A), some later
transaction has written a new value into A and
you're about to write over that. Don't do that!
Roll T1 back and restart it. Otherwise,
everything's ok, go ahead and write A and set the
write_timestamp of (A) to T1's timestamp.
45Example Basic Timestamp Ordering
46Multiversion Read Consistency Protocol Oracle
and Postgres
Each transaction gets a snapshot of the data
which is independent of the other
transactions. The transactions can be isolated
from each other in a few ways read
committed serializable read only (only oracle
has this) You set the isolation level like
this SET TRANSACTION ISOLATION LEVEL
SERIALIZABLE
47Read Committed the default
Each statement in a transaction sees only data
committed before the STATEMENT starts. This
means that one transaction might see two
different values of an item if another
transaction commits a change to that value. If
the transaction tries to update a value which has
already been updated, but not committed, by
another transaction, it will block. If the other
transaction then commits, this transaction then
has to redo itself with the new value. Big
downside of Read Committed is the non-repeatable
read problem. If this is a problem, use
serializable.
48Serializable Isolation Level
Each statement in a transaction only sees data
committed before the TRANSACTION starts, and
changes made by that transaction In this case,
if the transaction tries to write a value which
has already been written but not committed by
another transaction, this transaction throws an
error and needs to be restarted by your
application. Only really necessary for
transactions that need to look at the same data
multiple times and need to make sure the data
hasn't changed.
49Read only An Oracle special
Read only Each statement in the transaction
only sees data committed before the transaction
started. Statements in the transaction can't see
changes made by other statements in the same
transaction.
50Recovery
51Types of errors
- Different errors affect different types of memory
- Primary memory problems in RAM
- System crashes can cause processes to die
- Application errors can cause memory loss
- Secondary memory problems on the hard drive
- Hard drive crashes
- Accidental rm -r
- Kicking the machine too hard
52How Data Flows
When you do a select, information is brought off
the hard drive and loaded into the database
buffer. When the database does a manipulation on
that data, it moves it from the buffer into a
variable. Changes to the variable are then
written back to the buffer when the change is
made. Eventually the buffer gets written back to
the hard drive.
53Committing and the Hard Drive
When you execute the commit statement the data
don't necessarily get written to the hard drive!
( This is for efficiency reasons sometimes a
later transaction will modify the same data, so
it would be wasteful to write the value
twice. Also.... sometimes data which are NOT
committed DO get written. This will happen to
free up space in the buffers.
54A Scenario
- Transfer 50 from A to B
- Step 1 read A, subtract 50, write A
- Step 2 read B, add 50, write B
- If the transaction committed, but failed before
being written to the hard drive REDO (aka
ROLLFORWARD) the transaction by writing all the
writes in the buffer to disk - If the transaction was not committed, you can't
redo the transaction because... let's say the
error happened between steps 1 and 2. - The write to A will be in the buffer, but not the
write to B - If you REDO you'll only write the change to A to
disk. - In this case you have to UNDO (aka ROLLBACK) the
transaction first and then re-execute the
transaction.
55Transaction Logs
- Transaction logs keep track of updates to the
database. Each line of the log has - Transaction identifier.
- Type of log record, (transaction start, insert,
update, delete, abort, commit). - Identifier of data item affected by database
action (insert, delete, and update operations). - Before-image of data item.
- After-image of data item.
- Log management information.
56Sample Log File
57Recovery Techniques
- If database has been damaged
- Need to restore last backup copy of database and
reapply updates of committed transactions using
log file. - If database is only inconsistent
- Need to undo changes that caused inconsistency.
May also need to redo some transactions to ensure
updates reach secondary storage. - Do not need backup, but can restore database
using before- and after-images in the log file.
58Main Recovery Techniques
- Three main recovery techniques
- Deferred Update
- Immediate Update
- Shadow Paging
59Deferred Update
- Updates are not written to the database until
after a transaction has reached its commit point.
- If transaction fails before commit, it will not
have modified database and so no undoing of
changes required. - May be necessary to redo updates of committed
transactions as their effect may not have reached
database.
60Immediate Update
- Updates are applied to database as they occur.
- Need to redo updates of committed transactions
following a failure. - May need to undo effects of transactions that had
not committed at time of failure. - Essential that log records are written before
write to database. Write-ahead log protocol.
61Immediate Update
- If no transaction commit record in log, then
that transaction was active at failure and must
be undone. - Undo operations are performed in reverse order in
which they were written to log.
62Checkpointing
- Create a checkpoint so you don't have to look
through the whole log file to see what needs
undoing and redoing. - Making a checkpoint does this
- Writes all logs in memory to disk
- Writes all modified buffers to disk
- Write CHECKPOINT in the log
- When failure occurs, redo all transactions that
committed since the checkpoint and undo all
transactions active at time of crash.
63Shadow Paging
- Maintain two page tables during life of a
transaction current page and shadow page table. - When transaction starts, two pages are the same.
- Shadow page table is never changed thereafter and
is used to restore database in event of failure. - During transaction, current page table records
all updates to database. - When transaction completes, current page table
becomes shadow page table.