Title: Midterm 3 Revision
1Midterm 3 Revision
CS157B Lecture 21
- Professor Sin-Min Lee
- Department of Computer Science
2Query Optimization
- The execution of an SQL query
- Parse and verify the query, create an equivalent
query tree where each node is a relational
algebra operation and each leaf is a table from
the query - Apply heuristics to alter the query tree to find
equivalent queries based on algebraic
equivalences - Generate alternate execution plans by assigning
implementations to operations and orders to joins - Estimate the cost of each operation based on
statistics and available access paths - Choose the lowest cost execution plan
3Query Optimization
- SELECT DISTINCT
- R.A, S.C, S.D, T.D
- FROM R, S, T
- WHERE R.AS.A and S.BT.B AND R.C lt 100 AND S.A
gt 20
Project distinct R.A, S.C, S.D, T.D
Select R.Clt100 and S.Agt20
Join on S.BT.B
Join on R.AS.A
T
R
S
4Algebraic Equivalences
- Selections can be pushed through joins, Cartesian
products - Selections can be joined with Cartesian products
for a join condition - Projections can be pushed through joins,
Cartesian products to reduce the size of the
output
5Query Optimization
- Push selections on each relation down, as close
to the relation as possible.
Project distinct R.A, S.C, S.D, T.D
Join on S.BT.B
Join on R.AS.A
T
Select S.Agt20
Select R.Clt100 and R.Agt20
S
R
6Query Optimization
- Add projections whenever appropriate
Project distinct R.A, S.C, S.D, T.D
Join on S.BT.B
Join on R.AS.A
Project distinct T.B,T.D
Project distinct R.A
Project distinct S.A, S.B, S.C, S.D
T
Select S.Agt20
Select R.Clt100 and R.Agt20
S
R
7Pushing selections down
- SELECT C (R join S) (SELECT C ( R ) ) join S if
C only involves attributes in R - SELECT C1 AND C2 ( R ) SELECT C1 (SELECT C2 ( R
) ) SELECT C2 (SELECT C1 ( R ) ) - Selections can be pushed down the joins often to
produce the size of the joined relation - However this may not always result in a reduction
in the overall cost. - The selection condition may not be very
selective. - The selection may remove an access condition,
sorted order that is particularly useful for the
next step.
8Implementation plans
- Assign implementations to logical operators given
memory limitations - Join mapped to block-sort join, merge-sort join,
etc. - Selection mapped to table scan or index scan,
etc. - Assign join ordering to joins
- R join S join T (R join S) join T R join (S
join T) - (R join T) join S
- For each join, inner/outer relations can be
changed - Estimate the size of each relation and cost of
each operation
9Implementation plans
- Blocking operators require the whole relation to
be present before any output can be computed - For example grouping, sorting, project distinct
- A non-blocking operator can be pipelined
- As soon as a tuple is found to be in the output
of an operator, it can be pipelined to the next
operator - Hence, the output buffer for an operator serves
as the input buffer of the next operator
10Query Optimization
- Scan table R, fill tuples that pass the selection
condition into allocated buffer pages. - When the buffer for R is full, stop scanning, and
join them with S. - When the join is complete,
- continue scan and fill the buffer
- for the next join step.
Sort and project
Project distinct R.A, S.C, S.D
Partially sort and write to disk
Block nested loop join
Join on R.AS.A
pipeline
pipeline
Table scan
Table scan
Select S.Agt20
Select R.Clt100 and R.Agt20
S
R
11Join ordering
- Join ordering depends on the size of the output
and the access paths available for each relation
Table CARD VALUES (attr A)
R 1million 10,000
S 100,000 100,000
T 200,000 5,000
(R JOIN S ON R.AS.A) JOIN T ON S.AT.A Size of
R JOIN S 1,000,000 100,000 1/100,000
1,000,000 Size of (R JOIN S) JOIN T
1,000,000 200,000 1/10,000
20,000,000 R JOIN
ON R.AS.A (S JOIN T ON S.AT.A) Size of S
JOIN T 100,000 200,000 1/100,000 200,000
Size of R JOIN (S JOIN T) 1,000,000 200,000
1/10,000
20,000,000
12Choosing join ordering
- The set of possible join orders is extremely
large. Instead concentrate on left deep join
orders - Left join orders make it possible to pipeline the
output of one join as input to the other join
- To find all possible left-deep join orders
- First find all possible two way joins over the
given relations, estimate the cost of the best
implementation plan - Then, find the next relation to join with the
result, estimate the cost - Remove any joins that are too costly compared to
the others - Keep enumerating all joins!
JOIN
JOIN
V
JOIN
T
R
S
13Relational Calculus
- Based on the predicate calculus of formal logic
- (sound complete).
- Higher level of abstraction for users
- Logically equivalent to more procedural
relational algebra constructs - Declarative form of relational calculus used in
many commercial products
14Relational Calculus (II)
- Of the form
- p.PATIENTS_NAME
- p IN PATIENTS and p.DOB gt 4/1/70
- Target (attributes) and Qualifying Statement
- Generalization of algebraic operations obvious
except join(existential quantifier) and divide
(universal quantifier) - Existential Quantifier ? there exists
- (at least one row)
- p.PATIENTS_NAME p in PATIENTS and exists l in
LABS - (p.pat_num l.pat_num and l.lab_name T4)
- would involve a join in the relational algebra
15Relational Calculus (III)
- Universal Quantifier ? (applies for all)
- p.NAME p in patients and for every d in DOCTORS
- Exists c in CLINIC_VISIT
- (c.PROVIDER_ID d.PROVIDER_ID and
- p.PATIENT_ID c.PATIENT_ID)
- What is returned?
- All patients who have seen every doctor
- Cognitive Studies relational algebra easier to
comprehend than relational calculus
16(No Transcript)
17Transactional Integrity
- A procedure or set of procedures which is
guaranteed to preserve database integrity is a
transaction - Database is consistent before and after a
transaction atomicity (no intermediate state)
18Means to Concurrency Control
- Locking table, row, attribute
- (e.g. select for update)
19Problems with Locking
- In order to bill for a procedure, need to write
to the NOTES table and to the PROCEDURE table. - Transaction 1 reads and locks NOTES for pt 1
- Transaction 2 goes first for PROCEDURE table
- Result deadlock.
- Solutions to deadline ordering, deadlock
detection - Two-phase locking
- All locking (read and write) operations before
first unlock - NOTES Read and PROCEDURE Write locks completed
before NOTES unlock
20Transaction Processing
- Transactions that execute in a database system
must satisfy the following properties - Atomicity each transaction either executes
fully, or does not have any effect - Consistency each transaction is a logical
combination of actions that change the database
from one consistent state to another - Isolation transactions are written and executed
as if they are executing one at a time - Durability the results of successful
transactions are never lost even in the presence
of unforeseen failures
21(No Transcript)
22(No Transcript)
23Serializability
- A transaction is usually written as a sequence of
read, write operations (x,y,z are some data
items, typically tuples) - Transaction 1 r1(x), r1(y), w1(y), commit
- Multiple transaction execute concurrently, their
read, write operations are mixed together in a
schedule - r1(x) r2(z) w2(z) r1(y) w1(y)
- Do the two transactions change each others data?
- SERIAL ORDER 1 r1(x) r1(y) w1(y) r2(z) w2(z)
- SERIAL ORDER 2 r2(z) w2(z) r1(x) r1(y) w1(y)
- The state of the database is going to be the same!
24Serializability
- If a schedule produces the same results as some
serial ordering of transactions, then it is said
the serializable - There are schedules for which no equivalent
serial order exist. - All schedules executing in a database should be
serializable - As long as the schedule does not destroy the
logical sequence of events in the transaction,
then the results of the transaction are
consistent.
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30Schedule anomalies
- Dirty read a transaction reads a value that is
not finalized - w1(x) r2(x) abort1 the value read by T2 is wrong
and will be erased from the database - Nonrepeatable read a transaction reads the same
item at two different times, but finds different
values - r1(x) w2(x) commit2 r1(x) the second read of x
will produce a different value - Lost update the value written by a transaction
is overwritten by another - r1(x) r2(x) w2(x) commit2 w1(x) commit1 the value
of x written by T1 Is based on its old value, it
is as if T2 has never executed
31Serializability
- A serial schedule is equivalent to another
schedule if - The values returned by the read operations are
guaranteed to be the same - Updates to each item occur in the same order
- Is r1(x) r2(x) r2(y) w2(y) w1(x) equivalent to
r1(x) w1(x) r2(x) r2(y) w2(y) ? No! T2 is reading
a value written by T1 in the second schedule. - Is r1(x) r2(x) r2(y) w2(y) w1(x) equivalent to
r2(x) r2(y) w2(y) r1(x) w1(x)? Yes! - Is r1(x) r2(x) r2(y) w2(y) w2(x) w1(x)
equivalent to r2(x) r2(y) w2(y) w2(x) r1(x)
w1(x)?, No!, the value read by T1 is not the same - Is r1(x) r2(x) r2(y) w2(y) w2(x) w1(x)
equivalent to r1(x) w1(x) r2(x) r2(y) w2(y)
w2(x)?, No!, the values by T2 is not the same
32Two phase locking
- To guarantee serializable schedules, the database
maintains locks on items (for example tuples) - Transactions are required to obtain a read lock
to read an item and a write lock to change the
value of an item - Read lock ( R ) if an item is locked with a read
lock, then other transactions may obtain read
locks on it - Write lock ( W ) if an item is locked with a
write lock, no other transaction may obtain any
other lock on it. - If a transaction requests a write (exclusive)
lock, it is granted the lock if there is no other
lock on the item or the same transaction holds a
read lock on it. - If a transaction may not obtain the lock it needs
to continue the operation, then it goes into a
wait mode until the necessary lock is released.
33Two phase locking
- Two phase locking involves
- Growing phase, during which a transaction may
obtain new locks, but may not release any locks
it is holding - Shrinking phase, during which a transaction may
release locks but may not obtain any new locks - Strick two phase locking requires that a
transaction hold all its locks until it
completes. At commit time, the transaction
releases all its locks.
34Two phase locking
- Strict two phase locking eliminates the
possibility of schedule anomalies,
non-serializable schedules - Dirty read w1(x) r2(x) abort1
- Not possible since T2 could not have obtained the
read lock on x before T1 completes! - Non-repeatable read r1(x) w2(x) commit2 r1(x)
- Not possible since T2 could not have obtained the
write lock on x before T1 commits since it is
holding the read lock on x - Lost update r1(x) r2(x) w2(x) commit2 w1(x)
commit1 - Not possible since T2 could not have obtained the
write lock on x before T1 commits
35Deadlocks
- Even though strict two phase locking prevents
non-serializable schedules, it is possible that
two transactions enter a cyclic wait state, a
deadlock. - w1(x) w2(y) Request_w1(y) Request_w2(x)
- T1 is waiting T2 to release the lock on X
- T2 is waiting T1 to release the lock on Y
- No transaction will be able to complete
- Detect deadlocks by checking wait states of
transactions, abort transactions in the reverse
order of time spent.
36Isolation levels
- A transaction may define how the tables and
tuples accessed by that transaction should be
locked. - READ UNCOMMITTED. Dirty reads are possible, I.e.
read tuples that are being modified by other
transactions before these transaction commit.
Read values without a lock, writes are not
possible. - READ COMMITTED. Dirty reads are not permitted.
Lock an item with a read lock shortly while
reading -during which time no other transaction
could be writing the item. However, read locks
are not held for the duration of a transaction,
hence the same value read twice may have
different values -no repeatable reads.
37Isolation levels
- Cont.
- REPEATABLE READS. Obtain and hold read locks
until commit time. All reads are repeatable but
phantom updates are possible. - Example UPDATE employee
- SET salary salary 1.01
- WHERE dept Toy
- All tuples with deptToy are selected and
locked for write. However, it is possible for
another transaction to add a new employee in the
Toy department while this transaction is still
executing, but his salary will not be changed! - SERIALIZABLE. Obtain and hold locks on a
predicate to ensure phantom updates are not
possible (such as deptToy above).
38Lock granularity
- Locking at table or table level reduces
concurrency greatly, but locking at tuple level
has too much overhead - Use multi-level intension locks at higher levels
- IS - intension shared, means the transaction
intends a shared lock on a tuple - IX - intension exclusive, means the transaction
intends to obtain an exclusive (W) lock on a
tuple - SIX - means that transaction will update some
tuples, but will read all of them
39Lock granularity
Granted Mode Granted Mode Granted Mode Granted Mode Granted Mode
Requested Mode IS IX SIX S X
IS X
IX X X X
SIX X X X X
S X X X
X X X X X X
40Lock granularity
- To ensure serializability of a transaction T1
that accesses the relation through a table scan - T1 locks the relation with IS, IX, or SIX
depending on the transaction - No other transaction will be able to obtain an IX
lock on the table to insert a new tuple into the
table, no phantoms! - To ensure serializability of a transaction T1
that accesses the relation through an index scan - T1 locks the index for all nodes accessed for the
scan for some condition C - Any new tuple that will make C true will need to
be inserted into these nodes, but this is not
possible with the locks
41Atomicity and durability
- Transaction failures
- A logical error or a transaction may cause a
transaction to fail, all changes made by this
transaction must be erased from the system - An UNDO of a transaction requires undoing all
updates by that transaction in the reverse order - Disk pages read and changed in memory are not
written immediately to disk since other
transactions may be using them (NO FORCE) - These pages are lost during a power failure
42Atomicity and durability
- Disk pages changed by a non-committed transaction
may be written to disk for buffer management
purposes (STEAL) - In case of a power failure, transactions that are
not yet completed must be rolledback, and
transactions that have committed should have
their results restored. - NO STEAL requires that pages in memory are pinned
(not written to disk) until a transaction
completes, reduces concurrency - NO FORCE requires that when a transaction T
commits pages in memory modified by T are all
written to disk, performs unnecessary disk writes