Midterm 3 Revision

About This Presentation

Title:

Midterm 3 Revision

Description:

Use multi-level intension locks at higher levels ... IX - intension exclusive, means the transaction intends to obtain an exclusive ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 43

Provided by: isaacs6

Category:

more less

Transcript and Presenter's Notes

Title: Midterm 3 Revision

1
Midterm 3 Revision
CS157B Lecture 21

Professor Sin-Min Lee
Department of Computer Science

2
Query Optimization

The execution of an SQL query
Parse and verify the query, create an equivalent
query tree where each node is a relational
algebra operation and each leaf is a table from
the query
Apply heuristics to alter the query tree to find
equivalent queries based on algebraic
equivalences
Generate alternate execution plans by assigning
implementations to operations and orders to joins
Estimate the cost of each operation based on
statistics and available access paths
Choose the lowest cost execution plan

3
Query Optimization

SELECT DISTINCT
R.A, S.C, S.D, T.D
FROM R, S, T
WHERE R.AS.A and S.BT.B AND R.C lt 100 AND S.A
gt 20

Project distinct R.A, S.C, S.D, T.D
Select R.Clt100 and S.Agt20
Join on S.BT.B
Join on R.AS.A
T
R
S
4
Algebraic Equivalences

Selections can be pushed through joins, Cartesian
products
Selections can be joined with Cartesian products
for a join condition
Projections can be pushed through joins,
Cartesian products to reduce the size of the
output

5
Query Optimization

Push selections on each relation down, as close
to the relation as possible.

Project distinct R.A, S.C, S.D, T.D
Join on S.BT.B
Join on R.AS.A
T
Select S.Agt20
Select R.Clt100 and R.Agt20
S
R
6
Query Optimization

Add projections whenever appropriate

Project distinct R.A, S.C, S.D, T.D
Join on S.BT.B
Join on R.AS.A
Project distinct T.B,T.D
Project distinct R.A
Project distinct S.A, S.B, S.C, S.D
T
Select S.Agt20
Select R.Clt100 and R.Agt20
S
R
7
Pushing selections down

SELECT C (R join S) (SELECT C ( R ) ) join S if
C only involves attributes in R
SELECT C1 AND C2 ( R ) SELECT C1 (SELECT C2 ( R
) ) SELECT C2 (SELECT C1 ( R ) )
Selections can be pushed down the joins often to
produce the size of the joined relation
However this may not always result in a reduction
in the overall cost.
The selection condition may not be very
selective.
The selection may remove an access condition,
sorted order that is particularly useful for the
next step.

8
Implementation plans

Assign implementations to logical operators given
memory limitations
Join mapped to block-sort join, merge-sort join,
etc.
Selection mapped to table scan or index scan,
etc.
Assign join ordering to joins
R join S join T (R join S) join T R join (S
join T)
(R join T) join S
For each join, inner/outer relations can be
changed
Estimate the size of each relation and cost of
each operation

9
Implementation plans

Blocking operators require the whole relation to
be present before any output can be computed
For example grouping, sorting, project distinct
A non-blocking operator can be pipelined
As soon as a tuple is found to be in the output
of an operator, it can be pipelined to the next
operator
Hence, the output buffer for an operator serves
as the input buffer of the next operator

10
Query Optimization

Scan table R, fill tuples that pass the selection
condition into allocated buffer pages.
When the buffer for R is full, stop scanning, and
join them with S.
When the join is complete,
continue scan and fill the buffer
for the next join step.

Sort and project
Project distinct R.A, S.C, S.D
Partially sort and write to disk
Block nested loop join
Join on R.AS.A
pipeline
pipeline
Table scan
Table scan
Select S.Agt20
Select R.Clt100 and R.Agt20
S
R
11
Join ordering

Join ordering depends on the size of the output
and the access paths available for each relation

Table CARD VALUES (attr A)
R 1million 10,000
S 100,000 100,000
T 200,000 5,000
(R JOIN S ON R.AS.A) JOIN T ON S.AT.A Size of
R JOIN S 1,000,000 100,000 1/100,000
1,000,000 Size of (R JOIN S) JOIN T
1,000,000 200,000 1/10,000
20,000,000 R JOIN
ON R.AS.A (S JOIN T ON S.AT.A) Size of S
JOIN T 100,000 200,000 1/100,000 200,000
Size of R JOIN (S JOIN T) 1,000,000 200,000
1/10,000
20,000,000
12
Choosing join ordering

The set of possible join orders is extremely
large. Instead concentrate on left deep join
orders
Left join orders make it possible to pipeline the
output of one join as input to the other join

To find all possible left-deep join orders
First find all possible two way joins over the
given relations, estimate the cost of the best
implementation plan
Then, find the next relation to join with the
result, estimate the cost
Remove any joins that are too costly compared to
the others
Keep enumerating all joins!

JOIN
JOIN
V
JOIN
T
R
S
13
Relational Calculus

Based on the predicate calculus of formal logic
(sound complete).
Higher level of abstraction for users
Logically equivalent to more procedural
relational algebra constructs
Declarative form of relational calculus used in
many commercial products

14
Relational Calculus (II)

Of the form
p.PATIENTS_NAME
p IN PATIENTS and p.DOB gt 4/1/70
Target (attributes) and Qualifying Statement
Generalization of algebraic operations obvious
except join(existential quantifier) and divide
(universal quantifier)
Existential Quantifier ? there exists
(at least one row)
p.PATIENTS_NAME p in PATIENTS and exists l in
LABS
(p.pat_num l.pat_num and l.lab_name T4)
would involve a join in the relational algebra

15
Relational Calculus (III)

Universal Quantifier ? (applies for all)
p.NAME p in patients and for every d in DOCTORS
Exists c in CLINIC_VISIT
(c.PROVIDER_ID d.PROVIDER_ID and
p.PATIENT_ID c.PATIENT_ID)
What is returned?
All patients who have seen every doctor
Cognitive Studies relational algebra easier to
comprehend than relational calculus

16
(No Transcript)
17
Transactional Integrity

A procedure or set of procedures which is
guaranteed to preserve database integrity is a
transaction
Database is consistent before and after a
transaction atomicity (no intermediate state)

18
Means to Concurrency Control

Locking table, row, attribute
(e.g. select for update)

19
Problems with Locking

In order to bill for a procedure, need to write
to the NOTES table and to the PROCEDURE table.
Transaction 1 reads and locks NOTES for pt 1
Transaction 2 goes first for PROCEDURE table
Result deadlock.
Solutions to deadline ordering, deadlock
detection
Two-phase locking
All locking (read and write) operations before
first unlock
NOTES Read and PROCEDURE Write locks completed
before NOTES unlock

20
Transaction Processing

Transactions that execute in a database system
must satisfy the following properties
Atomicity each transaction either executes
fully, or does not have any effect
Consistency each transaction is a logical
combination of actions that change the database
from one consistent state to another
Isolation transactions are written and executed
as if they are executing one at a time
Durability the results of successful
transactions are never lost even in the presence
of unforeseen failures

21
(No Transcript)
22
(No Transcript)
23
Serializability

A transaction is usually written as a sequence of
read, write operations (x,y,z are some data
items, typically tuples)
Transaction 1 r1(x), r1(y), w1(y), commit
Multiple transaction execute concurrently, their
read, write operations are mixed together in a
schedule
r1(x) r2(z) w2(z) r1(y) w1(y)
Do the two transactions change each others data?
SERIAL ORDER 1 r1(x) r1(y) w1(y) r2(z) w2(z)
SERIAL ORDER 2 r2(z) w2(z) r1(x) r1(y) w1(y)
The state of the database is going to be the same!

24
Serializability

If a schedule produces the same results as some
serial ordering of transactions, then it is said
the serializable
There are schedules for which no equivalent
serial order exist.
All schedules executing in a database should be
serializable
As long as the schedule does not destroy the
logical sequence of events in the transaction,
then the results of the transaction are
consistent.

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Schedule anomalies

Dirty read a transaction reads a value that is
not finalized
w1(x) r2(x) abort1 the value read by T2 is wrong
and will be erased from the database
Nonrepeatable read a transaction reads the same
item at two different times, but finds different
values
r1(x) w2(x) commit2 r1(x) the second read of x
will produce a different value
Lost update the value written by a transaction
is overwritten by another
r1(x) r2(x) w2(x) commit2 w1(x) commit1 the value
of x written by T1 Is based on its old value, it
is as if T2 has never executed

31
Serializability

A serial schedule is equivalent to another
schedule if
The values returned by the read operations are
guaranteed to be the same
Updates to each item occur in the same order
Is r1(x) r2(x) r2(y) w2(y) w1(x) equivalent to
r1(x) w1(x) r2(x) r2(y) w2(y) ? No! T2 is reading
a value written by T1 in the second schedule.
Is r1(x) r2(x) r2(y) w2(y) w1(x) equivalent to
r2(x) r2(y) w2(y) r1(x) w1(x)? Yes!
Is r1(x) r2(x) r2(y) w2(y) w2(x) w1(x)
equivalent to r2(x) r2(y) w2(y) w2(x) r1(x)
w1(x)?, No!, the value read by T1 is not the same
Is r1(x) r2(x) r2(y) w2(y) w2(x) w1(x)
equivalent to r1(x) w1(x) r2(x) r2(y) w2(y)
w2(x)?, No!, the values by T2 is not the same

32
Two phase locking

To guarantee serializable schedules, the database
maintains locks on items (for example tuples)
Transactions are required to obtain a read lock
to read an item and a write lock to change the
value of an item
Read lock ( R ) if an item is locked with a read
lock, then other transactions may obtain read
locks on it
Write lock ( W ) if an item is locked with a
write lock, no other transaction may obtain any
other lock on it.
If a transaction requests a write (exclusive)
lock, it is granted the lock if there is no other
lock on the item or the same transaction holds a
read lock on it.
If a transaction may not obtain the lock it needs
to continue the operation, then it goes into a
wait mode until the necessary lock is released.

33
Two phase locking

Two phase locking involves
Growing phase, during which a transaction may
obtain new locks, but may not release any locks
it is holding
Shrinking phase, during which a transaction may
release locks but may not obtain any new locks
Strick two phase locking requires that a
transaction hold all its locks until it
completes. At commit time, the transaction
releases all its locks.

34
Two phase locking

Strict two phase locking eliminates the
possibility of schedule anomalies,
non-serializable schedules
Dirty read w1(x) r2(x) abort1
Not possible since T2 could not have obtained the
read lock on x before T1 completes!
Non-repeatable read r1(x) w2(x) commit2 r1(x)
Not possible since T2 could not have obtained the
write lock on x before T1 commits since it is
holding the read lock on x
Lost update r1(x) r2(x) w2(x) commit2 w1(x)
commit1
Not possible since T2 could not have obtained the
write lock on x before T1 commits

35
Deadlocks

Even though strict two phase locking prevents
non-serializable schedules, it is possible that
two transactions enter a cyclic wait state, a
deadlock.
w1(x) w2(y) Request_w1(y) Request_w2(x)
T1 is waiting T2 to release the lock on X
T2 is waiting T1 to release the lock on Y
No transaction will be able to complete
Detect deadlocks by checking wait states of
transactions, abort transactions in the reverse
order of time spent.

36
Isolation levels

A transaction may define how the tables and
tuples accessed by that transaction should be
locked.
READ UNCOMMITTED. Dirty reads are possible, I.e.
read tuples that are being modified by other
transactions before these transaction commit.
Read values without a lock, writes are not
possible.
READ COMMITTED. Dirty reads are not permitted.
Lock an item with a read lock shortly while
reading -during which time no other transaction
could be writing the item. However, read locks
are not held for the duration of a transaction,
hence the same value read twice may have
different values -no repeatable reads.

37
Isolation levels

Cont.
REPEATABLE READS. Obtain and hold read locks
until commit time. All reads are repeatable but
phantom updates are possible.
Example UPDATE employee
SET salary salary 1.01
WHERE dept Toy
All tuples with deptToy are selected and
locked for write. However, it is possible for
another transaction to add a new employee in the
Toy department while this transaction is still
executing, but his salary will not be changed!
SERIALIZABLE. Obtain and hold locks on a
predicate to ensure phantom updates are not
possible (such as deptToy above).

38
Lock granularity

Locking at table or table level reduces
concurrency greatly, but locking at tuple level
has too much overhead
Use multi-level intension locks at higher levels
IS - intension shared, means the transaction
intends a shared lock on a tuple
IX - intension exclusive, means the transaction
intends to obtain an exclusive (W) lock on a
tuple
SIX - means that transaction will update some
tuples, but will read all of them

39
Lock granularity
Granted Mode Granted Mode Granted Mode Granted Mode Granted Mode
Requested Mode IS IX SIX S X
IS X
IX X X X
SIX X X X X
S X X X
X X X X X X
40
Lock granularity

To ensure serializability of a transaction T1
that accesses the relation through a table scan
T1 locks the relation with IS, IX, or SIX
depending on the transaction
No other transaction will be able to obtain an IX
lock on the table to insert a new tuple into the
table, no phantoms!
To ensure serializability of a transaction T1
that accesses the relation through an index scan
T1 locks the index for all nodes accessed for the
scan for some condition C
Any new tuple that will make C true will need to
be inserted into these nodes, but this is not
possible with the locks

41
Atomicity and durability

Transaction failures
A logical error or a transaction may cause a
transaction to fail, all changes made by this
transaction must be erased from the system
An UNDO of a transaction requires undoing all
updates by that transaction in the reverse order
Disk pages read and changed in memory are not
written immediately to disk since other
transactions may be using them (NO FORCE)
These pages are lost during a power failure

42
Atomicity and durability

Disk pages changed by a non-committed transaction
may be written to disk for buffer management
purposes (STEAL)
In case of a power failure, transactions that are
not yet completed must be rolledback, and
transactions that have committed should have
their results restored.
NO STEAL requires that pages in memory are pinned
(not written to disk) until a transaction
completes, reduces concurrency
NO FORCE requires that when a transaction T
commits pages in memory modified by T are all
written to disk, performs unnecessary disk writes