Midterm 3 Revision - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Midterm 3 Revision

Description:

Use multi-level intension locks at higher levels ... IX - intension exclusive, means the transaction intends to obtain an exclusive ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 43
Provided by: isaacs6
Category:

less

Transcript and Presenter's Notes

Title: Midterm 3 Revision


1
Midterm 3 Revision
CS157B Lecture 21
  • Professor Sin-Min Lee
  • Department of Computer Science

2
Query Optimization
  • The execution of an SQL query
  • Parse and verify the query, create an equivalent
    query tree where each node is a relational
    algebra operation and each leaf is a table from
    the query
  • Apply heuristics to alter the query tree to find
    equivalent queries based on algebraic
    equivalences
  • Generate alternate execution plans by assigning
    implementations to operations and orders to joins
  • Estimate the cost of each operation based on
    statistics and available access paths
  • Choose the lowest cost execution plan

3
Query Optimization
  • SELECT DISTINCT
  • R.A, S.C, S.D, T.D
  • FROM R, S, T
  • WHERE R.AS.A and S.BT.B AND R.C lt 100 AND S.A
    gt 20

Project distinct R.A, S.C, S.D, T.D
Select R.Clt100 and S.Agt20
Join on S.BT.B
Join on R.AS.A
T
R
S
4
Algebraic Equivalences
  • Selections can be pushed through joins, Cartesian
    products
  • Selections can be joined with Cartesian products
    for a join condition
  • Projections can be pushed through joins,
    Cartesian products to reduce the size of the
    output

5
Query Optimization
  • Push selections on each relation down, as close
    to the relation as possible.

Project distinct R.A, S.C, S.D, T.D
Join on S.BT.B
Join on R.AS.A
T
Select S.Agt20
Select R.Clt100 and R.Agt20
S
R
6
Query Optimization
  • Add projections whenever appropriate

Project distinct R.A, S.C, S.D, T.D
Join on S.BT.B
Join on R.AS.A
Project distinct T.B,T.D
Project distinct R.A
Project distinct S.A, S.B, S.C, S.D
T
Select S.Agt20
Select R.Clt100 and R.Agt20
S
R
7
Pushing selections down
  • SELECT C (R join S) (SELECT C ( R ) ) join S if
    C only involves attributes in R
  • SELECT C1 AND C2 ( R ) SELECT C1 (SELECT C2 ( R
    ) ) SELECT C2 (SELECT C1 ( R ) )
  • Selections can be pushed down the joins often to
    produce the size of the joined relation
  • However this may not always result in a reduction
    in the overall cost.
  • The selection condition may not be very
    selective.
  • The selection may remove an access condition,
    sorted order that is particularly useful for the
    next step.

8
Implementation plans
  • Assign implementations to logical operators given
    memory limitations
  • Join mapped to block-sort join, merge-sort join,
    etc.
  • Selection mapped to table scan or index scan,
    etc.
  • Assign join ordering to joins
  • R join S join T (R join S) join T R join (S
    join T)
  • (R join T) join S
  • For each join, inner/outer relations can be
    changed
  • Estimate the size of each relation and cost of
    each operation

9
Implementation plans
  • Blocking operators require the whole relation to
    be present before any output can be computed
  • For example grouping, sorting, project distinct
  • A non-blocking operator can be pipelined
  • As soon as a tuple is found to be in the output
    of an operator, it can be pipelined to the next
    operator
  • Hence, the output buffer for an operator serves
    as the input buffer of the next operator

10
Query Optimization
  • Scan table R, fill tuples that pass the selection
    condition into allocated buffer pages.
  • When the buffer for R is full, stop scanning, and
    join them with S.
  • When the join is complete,
  • continue scan and fill the buffer
  • for the next join step.

Sort and project
Project distinct R.A, S.C, S.D
Partially sort and write to disk
Block nested loop join
Join on R.AS.A
pipeline
pipeline
Table scan
Table scan
Select S.Agt20
Select R.Clt100 and R.Agt20
S
R
11
Join ordering
  • Join ordering depends on the size of the output
    and the access paths available for each relation

Table CARD VALUES (attr A)
R 1million 10,000
S 100,000 100,000
T 200,000 5,000
(R JOIN S ON R.AS.A) JOIN T ON S.AT.A Size of
R JOIN S 1,000,000 100,000 1/100,000
1,000,000 Size of (R JOIN S) JOIN T
1,000,000 200,000 1/10,000
20,000,000 R JOIN
ON R.AS.A (S JOIN T ON S.AT.A) Size of S
JOIN T 100,000 200,000 1/100,000 200,000
Size of R JOIN (S JOIN T) 1,000,000 200,000
1/10,000
20,000,000
12
Choosing join ordering
  • The set of possible join orders is extremely
    large. Instead concentrate on left deep join
    orders
  • Left join orders make it possible to pipeline the
    output of one join as input to the other join
  • To find all possible left-deep join orders
  • First find all possible two way joins over the
    given relations, estimate the cost of the best
    implementation plan
  • Then, find the next relation to join with the
    result, estimate the cost
  • Remove any joins that are too costly compared to
    the others
  • Keep enumerating all joins!

JOIN
JOIN
V
JOIN
T
R
S
13
Relational Calculus
  • Based on the predicate calculus of formal logic
  • (sound complete).
  • Higher level of abstraction for users
  • Logically equivalent to more procedural
    relational algebra constructs
  • Declarative form of relational calculus used in
    many commercial products

14
Relational Calculus (II)
  • Of the form
  • p.PATIENTS_NAME
  • p IN PATIENTS and p.DOB gt 4/1/70
  • Target (attributes) and Qualifying Statement
  • Generalization of algebraic operations obvious
    except join(existential quantifier) and divide
    (universal quantifier)
  • Existential Quantifier ? there exists
  • (at least one row)
  • p.PATIENTS_NAME p in PATIENTS and exists l in
    LABS
  • (p.pat_num l.pat_num and l.lab_name T4)
  • would involve a join in the relational algebra

15
Relational Calculus (III)
  • Universal Quantifier ? (applies for all)
  • p.NAME p in patients and for every d in DOCTORS
  • Exists c in CLINIC_VISIT
  • (c.PROVIDER_ID d.PROVIDER_ID and
  • p.PATIENT_ID c.PATIENT_ID)
  • What is returned?
  • All patients who have seen every doctor
  • Cognitive Studies relational algebra easier to
    comprehend than relational calculus

16
(No Transcript)
17
Transactional Integrity
  • A procedure or set of procedures which is
    guaranteed to preserve database integrity is a
    transaction
  • Database is consistent before and after a
    transaction atomicity (no intermediate state)

18
Means to Concurrency Control
  • Locking table, row, attribute
  • (e.g. select for update)

19
Problems with Locking
  • In order to bill for a procedure, need to write
    to the NOTES table and to the PROCEDURE table.
  • Transaction 1 reads and locks NOTES for pt 1
  • Transaction 2 goes first for PROCEDURE table
  • Result deadlock.
  • Solutions to deadline ordering, deadlock
    detection
  • Two-phase locking
  • All locking (read and write) operations before
    first unlock
  • NOTES Read and PROCEDURE Write locks completed
    before NOTES unlock

20
Transaction Processing
  • Transactions that execute in a database system
    must satisfy the following properties
  • Atomicity each transaction either executes
    fully, or does not have any effect
  • Consistency each transaction is a logical
    combination of actions that change the database
    from one consistent state to another
  • Isolation transactions are written and executed
    as if they are executing one at a time
  • Durability the results of successful
    transactions are never lost even in the presence
    of unforeseen failures

21
(No Transcript)
22
(No Transcript)
23
Serializability
  • A transaction is usually written as a sequence of
    read, write operations (x,y,z are some data
    items, typically tuples)
  • Transaction 1 r1(x), r1(y), w1(y), commit
  • Multiple transaction execute concurrently, their
    read, write operations are mixed together in a
    schedule
  • r1(x) r2(z) w2(z) r1(y) w1(y)
  • Do the two transactions change each others data?
  • SERIAL ORDER 1 r1(x) r1(y) w1(y) r2(z) w2(z)
  • SERIAL ORDER 2 r2(z) w2(z) r1(x) r1(y) w1(y)
  • The state of the database is going to be the same!

24
Serializability
  • If a schedule produces the same results as some
    serial ordering of transactions, then it is said
    the serializable
  • There are schedules for which no equivalent
    serial order exist.
  • All schedules executing in a database should be
    serializable
  • As long as the schedule does not destroy the
    logical sequence of events in the transaction,
    then the results of the transaction are
    consistent.

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Schedule anomalies
  • Dirty read a transaction reads a value that is
    not finalized
  • w1(x) r2(x) abort1 the value read by T2 is wrong
    and will be erased from the database
  • Nonrepeatable read a transaction reads the same
    item at two different times, but finds different
    values
  • r1(x) w2(x) commit2 r1(x) the second read of x
    will produce a different value
  • Lost update the value written by a transaction
    is overwritten by another
  • r1(x) r2(x) w2(x) commit2 w1(x) commit1 the value
    of x written by T1 Is based on its old value, it
    is as if T2 has never executed

31
Serializability
  • A serial schedule is equivalent to another
    schedule if
  • The values returned by the read operations are
    guaranteed to be the same
  • Updates to each item occur in the same order
  • Is r1(x) r2(x) r2(y) w2(y) w1(x) equivalent to
    r1(x) w1(x) r2(x) r2(y) w2(y) ? No! T2 is reading
    a value written by T1 in the second schedule.
  • Is r1(x) r2(x) r2(y) w2(y) w1(x) equivalent to
    r2(x) r2(y) w2(y) r1(x) w1(x)? Yes!
  • Is r1(x) r2(x) r2(y) w2(y) w2(x) w1(x)
    equivalent to r2(x) r2(y) w2(y) w2(x) r1(x)
    w1(x)?, No!, the value read by T1 is not the same
  • Is r1(x) r2(x) r2(y) w2(y) w2(x) w1(x)
    equivalent to r1(x) w1(x) r2(x) r2(y) w2(y)
    w2(x)?, No!, the values by T2 is not the same

32
Two phase locking
  • To guarantee serializable schedules, the database
    maintains locks on items (for example tuples)
  • Transactions are required to obtain a read lock
    to read an item and a write lock to change the
    value of an item
  • Read lock ( R ) if an item is locked with a read
    lock, then other transactions may obtain read
    locks on it
  • Write lock ( W ) if an item is locked with a
    write lock, no other transaction may obtain any
    other lock on it.
  • If a transaction requests a write (exclusive)
    lock, it is granted the lock if there is no other
    lock on the item or the same transaction holds a
    read lock on it.
  • If a transaction may not obtain the lock it needs
    to continue the operation, then it goes into a
    wait mode until the necessary lock is released.

33
Two phase locking
  • Two phase locking involves
  • Growing phase, during which a transaction may
    obtain new locks, but may not release any locks
    it is holding
  • Shrinking phase, during which a transaction may
    release locks but may not obtain any new locks
  • Strick two phase locking requires that a
    transaction hold all its locks until it
    completes. At commit time, the transaction
    releases all its locks.

34
Two phase locking
  • Strict two phase locking eliminates the
    possibility of schedule anomalies,
    non-serializable schedules
  • Dirty read w1(x) r2(x) abort1
  • Not possible since T2 could not have obtained the
    read lock on x before T1 completes!
  • Non-repeatable read r1(x) w2(x) commit2 r1(x)
  • Not possible since T2 could not have obtained the
    write lock on x before T1 commits since it is
    holding the read lock on x
  • Lost update r1(x) r2(x) w2(x) commit2 w1(x)
    commit1
  • Not possible since T2 could not have obtained the
    write lock on x before T1 commits

35
Deadlocks
  • Even though strict two phase locking prevents
    non-serializable schedules, it is possible that
    two transactions enter a cyclic wait state, a
    deadlock.
  • w1(x) w2(y) Request_w1(y) Request_w2(x)
  • T1 is waiting T2 to release the lock on X
  • T2 is waiting T1 to release the lock on Y
  • No transaction will be able to complete
  • Detect deadlocks by checking wait states of
    transactions, abort transactions in the reverse
    order of time spent.

36
Isolation levels
  • A transaction may define how the tables and
    tuples accessed by that transaction should be
    locked.
  • READ UNCOMMITTED. Dirty reads are possible, I.e.
    read tuples that are being modified by other
    transactions before these transaction commit.
    Read values without a lock, writes are not
    possible.
  • READ COMMITTED. Dirty reads are not permitted.
    Lock an item with a read lock shortly while
    reading -during which time no other transaction
    could be writing the item. However, read locks
    are not held for the duration of a transaction,
    hence the same value read twice may have
    different values -no repeatable reads.

37
Isolation levels
  • Cont.
  • REPEATABLE READS. Obtain and hold read locks
    until commit time. All reads are repeatable but
    phantom updates are possible.
  • Example UPDATE employee
  • SET salary salary 1.01
  • WHERE dept Toy
  • All tuples with deptToy are selected and
    locked for write. However, it is possible for
    another transaction to add a new employee in the
    Toy department while this transaction is still
    executing, but his salary will not be changed!
  • SERIALIZABLE. Obtain and hold locks on a
    predicate to ensure phantom updates are not
    possible (such as deptToy above).

38
Lock granularity
  • Locking at table or table level reduces
    concurrency greatly, but locking at tuple level
    has too much overhead
  • Use multi-level intension locks at higher levels
  • IS - intension shared, means the transaction
    intends a shared lock on a tuple
  • IX - intension exclusive, means the transaction
    intends to obtain an exclusive (W) lock on a
    tuple
  • SIX - means that transaction will update some
    tuples, but will read all of them

39
Lock granularity
Granted Mode Granted Mode Granted Mode Granted Mode Granted Mode
Requested Mode IS IX SIX S X
IS X
IX X X X
SIX X X X X
S X X X
X X X X X X
40
Lock granularity
  • To ensure serializability of a transaction T1
    that accesses the relation through a table scan
  • T1 locks the relation with IS, IX, or SIX
    depending on the transaction
  • No other transaction will be able to obtain an IX
    lock on the table to insert a new tuple into the
    table, no phantoms!
  • To ensure serializability of a transaction T1
    that accesses the relation through an index scan
  • T1 locks the index for all nodes accessed for the
    scan for some condition C
  • Any new tuple that will make C true will need to
    be inserted into these nodes, but this is not
    possible with the locks

41
Atomicity and durability
  • Transaction failures
  • A logical error or a transaction may cause a
    transaction to fail, all changes made by this
    transaction must be erased from the system
  • An UNDO of a transaction requires undoing all
    updates by that transaction in the reverse order
  • Disk pages read and changed in memory are not
    written immediately to disk since other
    transactions may be using them (NO FORCE)
  • These pages are lost during a power failure

42
Atomicity and durability
  • Disk pages changed by a non-committed transaction
    may be written to disk for buffer management
    purposes (STEAL)
  • In case of a power failure, transactions that are
    not yet completed must be rolledback, and
    transactions that have committed should have
    their results restored.
  • NO STEAL requires that pages in memory are pinned
    (not written to disk) until a transaction
    completes, reduces concurrency
  • NO FORCE requires that when a transaction T
    commits pages in memory modified by T are all
    written to disk, performs unnecessary disk writes
Write a Comment
User Comments (0)
About PowerShow.com