Title: Transactions and Concurrency
1Transactions and Concurrency
- Susan B. Davidson
- University of Pennsylvania
- CSE330 Database Management Systems
- November 25, 2008
Some slide content derived from Ramakrishnan
Gehrke
2From Queries to Updates
- Weve spent a lot of time talking about querying
data
- Yet updates are a really major part of many DBMS
applications
- Particularly important ensuring ACID
properties
- Atomicity each operation looks atomic to the
user
- Consistency each operation in isolation keeps
the database in a consistent state (this is the
responsibility of the user)
- Isolation should be able to understand whats
going on by considering each separate transaction
independently
- Durability updates stay in the DBMS!!!
3What is a Transaction?
- A transaction is a sequence of read and write
operations on data items that logically functions
as one unit of work
- should either be done entirely or not at all
- if it succeeds, the effects of write operations
persist (commit) if it fails, no effects of
write operations persist (abort)
- these guarantees are made despite concurrent
activity in the system, and despite failures that
may occur
4How Things Can Go Wrong
- Suppose we have a table of bank accounts which
contains the balance of the account
- An ATM deposit of 50 to account 1234 would be
written as
- This reads and writes the accounts balance
- What if two accountholders make deposits
simultaneously from two ATMs?
update Accounts set balance balance 50 wher
e account 1234
5Concurrent Deposits
- This SQL update code is represented as a
sequence of read and write operations on data
items (which for now should be thought of as
individual accounts) - where X is the data item representing the
account with account 1234.
Deposit 1 Deposit
2 read(X.bal) read(X.bal)
X.bal X.bal 50 X.bal X.bal
10 write(X.bal) write(X.ba
l)
6A Bad Concurrent Execution
- Only one action (e.g. a read or a write) can
actually happen at a time, and we can interleave
deposit operations in many ways
Deposit 1 Deposit
2 read(X.bal)
read(X.bal)
X.bal X.bal 50
X.bal
X.bal 10 write(X.bal)
write(X.bal)
7A Good Execution
- Previous execution would have been fine if the
accounts were different (i.e. one were X and one
were Y), i.e., transactions were independent
- The following execution is a serial execution,
and executes one transaction after the other
Deposit 1 Deposit
2 read(X.bal) X.bal X
.bal 50 write(X.bal)
read(X.bal)
X.bal
X.bal 10
write(X.bal)
8Good Executions
- An execution is good if it is serial
(transactions are executed atomically and
consecutively) or serializable (i.e. equivalent
to some serial execution) - Equivalent to executing Deposit 1 then 3, or vice
versa
- Why is this good?
Deposit 1 Deposit
3 read(X.bal)
read(Y.bal)
X.bal X.bal 50
Y.bal
Y.bal 10 write(X.bal)
write(Y.bal)
9Atomicity
- Problems can also occur if a crash occurs in the
middle of executing a transaction
- Need to guarantee that the write to X does not
persist (ABORT)
- Default assumption if a transaction doesnt commit
10Transactions in SQL
- A transaction begins when any SQL statement that
queries the db begins.
- To end a transaction, the user issues a COMMIT or
ROLLBACK statement.
Transfer UPDATE Accounts SET balanc
e balance - 100 WHERE account 1234 UPDA
TE Accounts SET balance balance 100 WHERE
account 5678 COMMIT
11Read-Only Transactions
- When a transaction only reads information, we
have more freedom to let the transaction execute
in parallel with other transactions.
- We signal this to the system by stating
SET TRANSACTION READ ONLY SELECT FROM Account
s
WHERE account1234 ...
12Read-Write Transactions
- If we state read-only, then the transaction
cannot perform any updates.
- Instead, we must specify that the transaction may
update (the default)
SET TRANSACTION READ ONLY UPDATE Accounts SET
balance balance - 100 WHERE account 1234
...
ILLEGAL!
SET TRANSACTION READ WRITE update Accounts set
balance balance - 100 where account 1234
...
13Dirty Reads
- Dirty data is data written by an uncommitted
transaction a dirty read is a read of dirty data
- Sometimes we can tolerate dirty reads other
times we cannot
-
Deposit 1 Deposit
3 read(X.bal) X.bal X
.bal 50 write(X.bal) read(
Y.bal) Y
.bal Y.bal 10 ABORT
write(Y.bal) COMMIT
14Bad Dirty Read
EXEC SQL select balance into bal
from Accounts
where account1234
if (bal 100) EXEC SQL update Accounts
set balance balance - 100
where account 1234
EXEC SQL update Accounts set b
alance balance 100 where a
ccount 5678 EXEC SQL COMMIT
If the initial read (italics) were dirty, the
balance
could become negative!
15Acceptable Dirty Read
- If we are just checking availability of an
airline seat, a dirty read might be fine! (Why is
that?)
- Reservation transaction
EXEC SQL select occupied into occ
from Flights
where Num 123 and
date11-03-99 and sea
t23f if (!occ) EXEC SQL upd
ate Flights set occupiedtrue
where Num 123 and date11-03-99
and seat23f
else notify user that seat is unavailable
16Other Undesirable Phenomena
- Unrepeatable read a transaction reads the same
data item twice and gets different values
- Phantom problem a transaction retrieves a
collection of tuples twice and sees different
results
17Phantom Problem Example
- T1 find the students with best grades who Take
either cis550-f03 or cis570-f02
- T2 insert new entries for student 1234 in the
Takes relation, with grade A for cis570-f02 and
cis550-f03
- Suppose that T1 consults all students in the
Takes relation and finds the best grades for
cis550-f03
- Then T2 executes, inserting the new student at
the end of the relation, perhaps on a page not
seen by T1
- T1 then completes, finding the students with best
grades for cis570-f02 and now seeing student 1234
18Isolation
- The problems weve seen are all related to
isolation
- General rules of thumb w.r.t. isolation
- Fully serializable isolation is more expensive
than no isolation
- We cant do as many things concurrently (or we
have to undo them frequently)
- For performance, we generally want to specify the
most relaxed isolation level thats acceptable
- Note that were slightly violating a
correctness constraint to get performance!
19Specifying Acceptable Isolation Levels
- The default isolation level is SERIALIZABLE (as
for the transfer example).
- To signal to the system that a dirty read is
acceptable,
- In addition, there are
SET TRANSACTION READ WRITE ISOLATION LEVEL READ U
NCOMMITTED
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SET TRANSACTION
ISOLATION LEVEL REPEATABLE READ
20READ COMMITTED
- Forbids the reading of dirty (uncommitted) data,
but allows a transaction T to issue the same
query several times and get different answers
- No value written by T can be modified until T
completes
- For example, the Reservation example could also
be READ COMMITTED the transaction could
repeatably poll to see if the seat was available,
hoping for a cancellation
21REPEATABLE READ
- What it is NOT a guarantee that the same query
will get the same answer!
- However, if a tuple is retrieved once it will be
retrieved again if the query is repeated
- For example, suppose Reservation were modified to
retrieve all available seats
- If a tuple were retrieved once, it would be
retrieved again (but additional seats may also
become available)
22Summary of Isolation Levels
Level Dirty Read
Unrepeatable Read Phantoms
READ UN- Maybe Maybe Maybe COMMITTED REA
D No Maybe Maybe COMMITTED REPEATABLE N
o No Maybe READ SERIALIZABLE No N
o No
23Implementing Isolation Levels
- One approach use locking at some level (tuple,
page, table, etc.)
- each data item is either locked (in some mode,
e.g. shared or exclusive) or is available (no
lock)
- an action on a data item can be executed if the
transaction holds an appropriate lock
- consider granularity of locks how big of an
item to lock
- Larger granularity fewer locking operations but
more contention!
- Appropriate locks
- Before a read, a shared lock must be acquired
- Before a write, an exclusive lock must be acquired
24Lock Compatibility Matrix
- Locks on a data item are granted based on a lock
compatibility matrix
- When a transaction requests a lock, it must wait
(block) until the lock is granted
25Locks Prevent Bad Execution
- If the system used locking, the first bad
execution could have been avoided
Deposit 1 Deposit
2 xlock(X) read(X.bal)
xlock(X) is not granted X.bal X.bal 50
write(X.bal) release(X)
xlock(X)
read(X.bal)
X.bal X.bal 10
write(X.bal)
release(X)
26Locks are not enough
Deposit 1 Deposit
2 slock(X) read(X.bal)
slock(X) read(X.bal) release(X) rele
ase(X) X
.bal X.bal 10 X.bal X.bal 50 xlock(
X) write(X.bal) release(X) xl
ock(X)
write(X.bal)
release(X)
27Lock Types and Read/Write Modes
- When we specify read-only, the system only uses
shared-mode locks
- Any transaction that attempts to update will be
illegal
- When we specify read-write, the system may also
acquire locks in exclusive mode
- Obviously, we can still query in this mode
28Isolation Levels and Locking
- READ UNCOMMITTED allows queries in the
transaction to read data without acquiring any
lock
- For updates, exclusive locks must be obtained
and held to end of transaction
- READ COMMITTED requires a read-lock to be
obtained for all tuples touched by queries, but
it releases the locks immediately after the read
- Exclusive locks must be obtained for updates and
held to end of transaction
29Isolation levels and locking, cont.
- REPEATABLE READ places shared locks on tuples
retrieved by queries, holds them until the end of
the transaction
- Exclusive locks must be obtained for updates and
held to end of transaction
- SERIALIZABLE places shared locks on tuples
retrieved by queries as well as the index, holds
them until the end of the transaction
- Exclusive locks must be obtained for updates and
held to end of transaction
- Holding locks to the end of a transaction is
called strict locking
30Theory of Serializability
- A schedule of a set of transactions is a linear
ordering of their actions
- e.g. for the simultaneous deposits example
- R1(X.bal) R2(X.bal) W1(X.bal) W2(X.bal)
- A serial schedule is one in which all the steps
of each transaction occur consecutively
- A serializable schedule is one which is
equivalent to some serial schedule (i.e. given
any initial state, the final state is the same as
one produced by some serial schedule) - The example above is neither serial nor
serializable
31Questions to Address
- Given a schedule S, is it serializable?
- How can we "restrict" transactions in progress to
guarantee that only serializable schedules are
produced?
32When Actions Conflict
- Consider a schedule S in which there are two
consecutive actions Ii and Ij of transactions Ti
and Tj respectively
- If Ii and Ij refer to different data items, then
swapping (i.e. reordering) Ii and Ij does not
matter
- If Ii and Ij refer to the same data item Q, then
swapping Ii and Ij matters if and only if one of
the actions is a write
- Ri(Q) Wj(Q) produces a different final value for
Q than Wj(Q) Ri(Q)
33Testing for Serializability
- Given a schedule S, we can construct a directed
graph G(V,E) called a precedence graph
- V all transactions in S
- E Ti ? Tj whenever an action of Ti precedes
and conflicts with an action of Tj in S
- Theorem A schedule S is conflict serializable
if and only if its precedence graph contains no
cycles
- Note that testing for a cycle in a digraph can be
done in time O(V2)
34An Example
35Locking and Serializability
- We said that a transaction must hold all locks
until it terminates (a condition called strict
locking)
- It turns out that this is crucial to guarantee
serializability
- Note that the first (bad) example could have been
produced if transactions acquired and immediately
released locks.
36Well-Formed, Two-Phased Transactions
- A transaction is well-formed if it acquires at
least a shared lock on Q before reading Q or an
exclusive lock on Q before writing Q and doesnt
release the lock until the action is performed - Locks are also released by the end of the
transaction
- A transaction is two-phased if it never acquires
a lock after unlocking one
- i.e., there are two phases a growing phase in
which the transaction acquires locks, and a
shrinking phase in which locks are released
37Two-Phased Locking Theorem
- If all transactions are well-formed and
two-phase, then any schedule in which conflicting
locks are never granted ensures serializability
- i.e., there is a very simple scheduler!
- However, if some transaction is not well-formed
or two-phase, then there is some schedule in
which conflicting locks are never granted but
which fails to be serializable - i.e., one bad apple spoils the bunch
38Summary
- Transactions are all-or-nothing units of work
guaranteed despite concurrency or failures in the
system.
- Theoretically, the correct execution of
transactions is serializable (i.e. equivalent to
some serial execution).
- Practically, this may adversely affect throughput
? isolation levels.
- With isolation levels, users can specify the
level of incorrectness they are willing to
tolerate.