Title: Transactions and Wrap-Up
1Transactions and Wrap-Up
- Zachary G. Ives
- University of Pennsylvania
- CIS 550 Database Information Systems
- December 9, 2004
Some slide content derived from Ramakrishnan
Gehrke
2Reminders
- Please be sure youre signed up for a project
demo - Due at that time 8-15 page report describing
- What your project goals were
- What you implemented
- Basic architecture and design
- Division of labor
- And the code!
- Also please email me an assessment of how well
your group worked group members contributions
your contributions - Final examination Dec. 17th, Meyerson Hall B3,
830AM
3Recall Good Executions
- An execution is good if it is serial
(transactions are executed atomically and
consecutively) or serializable (i.e. equivalent
to some serial execution) - Equivalent to executing Deposit 1 then 3, or vice
versa
Deposit 1 Deposit
3 read(X.bal)
read(Y.bal) X.bal
X.bal 50
Y.bal Y.bal 10 write(X.bal)
write(Y.bal)
4Atomicity
- Problems can also occur if a crash occurs in the
middle of executing a transaction - Need to guarantee that the write to X does not
persist (ABORT) - Default assumption if a transaction doesnt commit
5Transactions in SQL
- A transaction begins when any SQL statement that
queries the db begins. - To end a transaction, the user issues a COMMIT or
ROLLBACK statement.
Transfer UPDATE Accounts SET balance
balance - 100 WHERE account 1234 UPDATE
Accounts SET balance balance 100 WHERE
account 5678 COMMIT
6Read-Only vs. Read-Write Transactions
- We can tell the DBMS that we wont be performing
any updates (What does this allow the DBMS to
do?) - If we are going to modify the DBMS, we need
SET TRANSACTION READ ONLY SELECT FROM
Accounts WHERE account1234
SET TRANSACTION READ WRITE UPDATE Accounts SET
balance balance - 100 WHERE account 1234
...
7Dirty Reads
- Dirty data is data written by an uncommitted
transaction a dirty read is a read of dirty data
(WR conflict) - Sometimes we can tolerate dirty reads other
times we cannot - e.g., if we wished to ensure balances never went
negative in the transfer example, we should test
that there is enough money first!
8Bad Dirty Read
EXEC SQL select balance into bal
from Accounts where
account1234 if (bal gt 100) EXEC SQL
update Accounts set balance
balance - 100 where account
1234 EXEC SQL update Accounts
set balance balance 100
where account 5678 EXEC SQL COMMIT
If the initial read (italics) were dirty, the
balance could become negative!
9Acceptable Dirty Read
- If we are just checking availability of an
airline seat, a dirty read might be fine! (Why is
that?) - Reservation transaction
EXEC SQL select occupied into occ
from Flights where Num
123 and date11-03-99
and seat23f if (!occ) EXEC SQL
update Flights set
occupiedtrue where Num 123 and
date11-03-99 and
seat23f else notify user that seat is
unavailable
10Other Undesirable Phenomena
- Unrepeatable read a transaction reads the same
data item twice and gets different values (RW
conflict) - Phantom problem a transaction retrieves a
collection of tuples twice and sees different
results
11Phantom Problem Example
- T1 find the students with best grades who Take
either cis550-f03 or cis570-f04 - T2 insert new entries for student 1234 in the
Takes relation, with grade A for cis570-f04 and
cis550-f03 - Suppose that T1 consults all students in the
Takes relation and finds the best grades for
cis550-f03 - Then T2 executes, inserting the new student at
the end of the relation, perhaps on a page not
seen by T1 - T1 then completes, finding the students with best
grades for cis570-f04 and now seeing student 1234
12Isolation
- The problems weve seen are all related to
isolation - General rules of thumb w.r.t. isolation
- Fully serializable isolation is more expensive
than no isolation - We cant do as many things concurrently (or we
have to undo them frequently) - For performance, we generally want to specify the
most relaxed isolation level thats acceptable - Note that were slightly violating a
correctness constraint to get performance!
13Specifying Acceptable Isolation Levels
- The default isolation level is SERIALIZABLE (as
for the transfer example) - To signal to the system that a dirty read is
acceptable,
SET TRANSACTION READ WRITE ISOLATION LEVEL READ
UNCOMMITTED
14READ COMMITTED
- Forbids the reading of dirty (uncommitted) data,
but allows a transaction T to issue the same
query several times and get different answers - No value written by T can be modified until T
completes - For example, the Reservation example could also
be READ COMMITTED the transaction could
repeatably poll to see if the seat was available,
hoping for a cancellation
15REPEATABLE READ
- What it is NOT a guarantee that the same query
will get the same answer! - However, if a tuple is retrieved once it will be
retrieved again if the query is repeated - For example, suppose Reservation were modified to
retrieve all available seats - If a tuple were retrieved once, it would be
retrieved again (but additional seats may also
become available)
16Summary of Isolation Levels
Level Dirty Read
Unrepeatable Read Phantoms READ
UN- Maybe Maybe Maybe COMMITTED READ
No Maybe Maybe COMMITTED REPEATABLE No
No Maybe READ SERIALIZABLE No No No
17Implementing Isolation Levels
- One approach use locking at some level
- each data item is either locked (in some mode,
e.g. shared or exclusive) or is available (no
lock) - an action on a data item can be executed if the
transaction holds an appropriate lock - consider granularity of locks how big of an
item to lock - Larger granularity fewer locking operations but
more contention! - tuple, page, table, etc.
- Appropriate locks
- Before a read, a shared lock must be acquired
- Before a write, an exclusive lock must be acquired
18Lock Compatibility Matrix
- Locks on a data item are granted based on a lock
compatibility matrix - When a transaction requests a lock, it must wait
(block) until the lock is granted
19Locks Prevent Bad Execution
- If the system used locking, the first bad
execution could have been avoided
Deposit 1 Deposit
2 xlock(X) read(X.bal)
xlock(X)
is not granted X.bal X.bal 50
write(X.bal) release(X)
xlock(X)
read(X.bal)
X.bal X.bal 10
write(X.bal)
release(X)
20Lock Types and Read/Write Modes
- When we specify read-only, the system only uses
shared-mode locks - Any transaction that attempts to update will be
illegal - When we specify read-write, the system may also
acquire locks in exclusive mode - Obviously, we can still query in this mode
21Isolation Levels and Locking
- Always update with exclusive lock held to end of
transaction - READ UNCOMMITTED
- read data without acquiring any lock
- READ COMMITTED
- read data lock all tuples read, immediately
release locks - REPEATABLE READ
- read data grab shared lock on all tuples read,
hold to end of transaction - SERIALIZABLE
- read data grab shared lock on all tuples read
and the index, hold to end of transaction - Holding locks to the end of a transaction is
called strict locking
22Theory of Serializability
- A schedule of a set of transactions is a linear
ordering of their actions - e.g. for the simultaneous deposits example
- R1(X.bal) R2(X.bal) W1(X.bal) W2(X.bal)
- A serial schedule is one in which all the steps
of each transaction occur consecutively - A serializable schedule is one which is
equivalent to some serial schedule (i.e. given
any initial state, the final state is the same as
one produced by some serial schedule) - The example above is neither serial nor
serializable
23Questions to Address
- Given a schedule S, is it serializable?
- How can we "restrict" transactions in progress to
guarantee that only serializable schedules are
produced?
24When Actions Conflict
- Consider a schedule S in which there are two
consecutive actions Ii and Ij of transactions Ti
and Tj respectively - If Ii and Ij refer to different data items, then
swapping Ii and Ij does not matter - If Ii and Ij refer to the same data item Q, then
swapping Ii and Ij matters if and only if one of
the actions is a write - Ri(Q) Wj(Q) produces a different final value for
Q than Wj(Q) Ri(Q)
25Testing for Serializability
- Given a schedule S, we can construct a di-graph
G(V,E) called a precedence graph - V all transactions in S
- E Ti ? Tj whenever an action of Ti precedes
and conflicts with an action of Tj in S - Theorem A schedule S is conflict serializable
if and only if its precedence graph contains no
cycles - Note that testing for a cycle in a digraph can be
done in time O(V2)
26An Example
27Locking and Serializability
- We said that a transaction must hold all locks
until it terminates (a condition called strict
locking) - It turns out that this is crucial to guarantee
serializability - Note that the first (bad) example could have been
produced if transactions acquired and immediately
released locks.
28Well-Formed, Two-Phased Transactions
- A transaction is well-formed if it acquires at
least a shared lock on Q before reading Q or an
exclusive lock on Q before writing Q and doesnt
release the lock until the action is performed - Locks are also released by the end of the
transaction - A transaction is two-phased if it never acquires
a lock after unlocking one - i.e., there are two phases a growing phase in
which the transaction acquires locks, and a
shrinking phase in which locks are released
29Two-Phased Locking Theorem
- If all transactions are well-formed and
two-phase, then any schedule in which conflicting
locks are never granted ensures serializability - i.e., there is a very simple scheduler!
- However, if some transaction is not well-formed
or two-phase, then there is some schedule in
which conflicting locks are never granted but
which fails to be serializable - i.e., one bad apple spoils the bunch
30Summary
- Transactions are all-or-nothing units of work
guaranteed despite concurrency or failures in the
system - Theoretically, the correct execution of
transactions is serializable (i.e. equivalent to
some serial execution) - Practically, this may adversely affect throughput
? isolation levels - With isolation levels, users can specify the
level of incorrectness they are willing to
tolerate
31What to Look for Down the Road
- well, no one really knows the answer to this
- But here are some hints, ideas, and hot
directions - Sensors and streaming data
- Peer-to-peer meets databases and data integration
- The Semantic Web
32Sensors and Streaming Data
- No databases at all
- Instead we have networks of simple sensors
- queries are in SQL
- data is live and streaming
- we compute aggregates over windows
33Whats Interesting Here
- Were not talking about data on disk were
talking about queries over current readings - Sensors are generally stupid and may be
battery-operated - A lot of challenges are networking-related how
to aggregate data before it gets sent, etc. - Future challenges what happens when we have
lots of different kinds of sensors
34Peer-to-Peer Computing
- Fundamentally, our model of DBMSs tends to be
centralized - Even for data integration theres a single
mediator - What can be gained from borrowing a page from
peer-to-peer systems like Napster, Kazaa, etc.? - A better architecture?
- Solutions to many problems unsolved by
distributed DBMSs? - Replication, object location, distributed
optimization, resiliency to failure, - New types of applications, e.g., in integration?
- Can we exchange data between databases in some
controlled way? - e.g., share parts of your Palm contact list with
someone elses cell phone contact list
35The Semantic Web
- In some ways, a very pie-in-the-sky vision
- A World Wide Web thats machine-understandable
- The ultimate automated librarian
- But some real and concrete problems might be
partly solvable - Goal is really very similar to data integration,
where somehow we have mappings between the
schemas - Need
- Languages for not only describing relationships,
but transformations between formats (e.g., XML
schemas) - Automatic or partly automated ways of discovering
mappings and correspondences - These are all database problems, and the solution
likely must come from the DB community
36My Take on the Future of Data Management
- Weve evolved from a world where data management
is about controlling the data - Instead, data management is about translating and
transforming data using declarative languages - It should ultimately become much like TCP or SOAP
a set of standard services for getting stuff
from one point to another, or from one form to
another - Its the plumbing that connects different
applications using different formats
37A Plug for Next Semester
- CIS 650 focus is on techniques for constructing
data management systems - Databases distributed databases P2P databases
data integration middleware etc. - Well read many of the definitive papers in the
DB field - Meanwhile Best of luck on your projects and
exams and have a wonderful break - I hope you learned a lot in this course and that
it at least for stretches was enjoyable!