Query Optimization, Concluded and Concurrency Control - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Query Optimization, Concluded and Concurrency Control

Description:

Read and summarize the ARIES paper, Sections 1.1, 3-6. Talk by Morgan Stanley: 'Information Security Challenges in Financial Services IT' ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 35
Provided by: zack4
Category:

less

Transcript and Presenter's Notes

Title: Query Optimization, Concluded and Concurrency Control


1
Query Optimization, Concludedand Concurrency
Control
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 650 Implementing Data Management Systems
  • February 2, 2005

2
Administrivia
  • The next assignment
  • Read and summarize the ARIES paper, Sections 1.1,
    3-6
  • Talk by Morgan Stanley Information Security
    Challenges in Financial Services IT
  • Thursday (tomorrow), 630PM, L101
  • Pizza, plus USB flash drives for 1st 50 students
  • (Mostly interested in recruiting)

3
Todays Trivia Question
4
Optimizer Generators
  • Idea behind EXODUS and StarburstBuild a
    programming language for writing custom
    optimizers!
  • Rule-based or transformational language
  • Describes, using declarative template and
    conditions, an equivalent expression
  • EXODUS compile an optimizer based on a set of
    rules
  • Starburst run a set of rules in the
    interpreter a client could customize the rules

5
Starburst Pros and Cons
  • Pro
  • Stratified search generally works well in
    practice DB2 UDB has perhaps the best query
    optimizer out there
  • Interesting model of separating calculus-level
    and algebra-level optimizations
  • Generally provides fast performance
  • Con
  • Interpreted rules were too slow and no database
    user ever customized the engine!
  • Difficult to assign priorities to transformations
  • Some QGM transformations that were tried were
    difficult to assess without running many
    cost-based optimizations
  • Rules got out of control

6
The EXODUS and Volcano Optimizer Generators
  • Part of a database toolkit approach to building
    systems
  • A set of libraries and languages for building
    databases with custom data models and
    functionalities

(rules in E)
(EXODUS)
(gcc)
(MyDB plan)
(MyQL)
7
EXODUS/Volcano Model
  • Try to unify the notion of logical-logical
    transformations and logical-physical
    transformations
  • No stratification as in Starburst everything is
    transformations
  • Challenge efficient search need a lot of
    pruning
  • EXODUS used many heuristics, something called a
    MESH
  • Volcano branch-and-bound pruning, recursion
    memoization

8
Example Rules
  • Physical operators
  • operator 2 join
  • method 2 hash_join loops_join cartesian_product
  • Logical-logical transformations
  • join (1,2) -gt join(2,1)
  • Logical-physical transformations
  • join (1,2) by hash_join (1,2)
  • Can get quite hairy
  • join 7 (join 8 (1,2), 3) lt-gt join 8(1, join 7
    (2,3))ifdef FORWARDif (NOT cover_predicate
    (OPERATOR_7 oper_argument, INPUT_2 oper_property,
    INPUT_3 oper_property)) REJECT

9
So How Does the Optimizer Work?(EXODUS version)
  • Needs to enumerate all possible transformations
    without repeating
  • Every expression is stored in a MESH
  • Basically, an associative lookup for each
    expression, which can link to other entries in
    the same MESH

10
Search in EXODUS
  • Apply a transformation, see if it produces a new
    node
  • If so
  • Find cheapest implementation rule
  • Also apply all relevant transformation rules, add
    results to OPEN set
  • Propagate revised cost to parents (reanalyze)
  • Check parents for new transformation
    possibilities (rematch)
  • Heuristics to guide the search in the OPEN set
  • Promise an expected cost factor for each
    transformation rule, based on analysis of
    averages of the optimizers cost model results
  • Favor items with high expected payoff over the
    current cost
  • Problem often need to apply 2 rules to get a
    benefit use heuristics
  • Once a full plan is found, optimizer does hill
    climbing, only applying a limited set of rules

11
Pros and Cons of EXODUS
  • Pros
  • Unified model of optimization is powerful,
    elegant
  • Very extensible architecture
  • Cons
  • Combined logical and physical expressions in the
    same MESH
  • equivalent logical plans with different physical
    operators (e.g., merge vs. hash joins) were kept
    twice
  • Physical properties werent handled well
  • sort enforcers were seldom applied since they
    didnt pay off immediately had to hack them
    into sort-merge join
  • Hard-coded transformation, then algorithm
    selection, cost analysis
  • always applied even if not part of the most
    promising expression
  • applied based on perceived benefit biased
    towards larger expressions, which meant repeated
    re-optimization
  • Cost wasnt very generic a concept

12
Volcano, Successor to EXODUS(Shouldnt it be
LEVITICUS?)
  • Re-architected into a top-down, memoized engine
  • Depth-first search allows branch-and-bound
    pruning
  • FindBestPlan takes logical expression, physical
    properties, cost bound
  • If already computed, return
  • Else compute set of possible moves
  • Logical-logical rule
  • Compliant logical-physical rule
  • Enforcer
  • Insert logical expression into lookup table
  • Insert physical op, plan into separate lookup
    table
  • Return best plan and cost
  • More generic notions of properties and enforcers
    (e.g., location, SHIP), cost (an ADT)

13
EXODUS, Revision 3 Cascades
  • Basically, a cleaner, more object-oriented
    version of the Volcano engine
  • Rumor has it that MS SQL Server is currently
    based on a (simplified and streamlined) version
    of the Volcano/Cascades optimizer generator

14
Optimization Evaluation
  • So, which is best?
  • Heuristics plus join-based enumeration (System-R)
  • Stratified, calculus-then-algebraic (Starburst)
  • Con QGM transformations are almost always
    heuristics-based
  • Pro very succinct transformations at QGM level
  • Unified algebraic (Volcano/Cascades)
  • Con many more rules need to be applied to get
    effect of QGM rewrites
  • Pro unified, fully cost-based model

15
Switching Gears
  • You now know how to build a query-only DBMS, from
    physical page layout through query optimization
  • But what if we want to build an updatable DBMS?
  • The major issues concurrency (today) and
    recovery (next time)

16
Recall the Fundamental Concepts of Updatable DBMSs
  • ACID properties
  • atomicity, consistency, isolation, durability
  • Transactions as atomic units of operation
  • always commit or abort
  • can be terminated and restarted by the DBMS an
    essential property
  • typically logged, as well discuss later
  • Serializability of schedules

17
Serializability and Concurrent Deposits
Deposit 1 Deposit
2 read(X.bal)
read(X.bal) X.bal X.bal 50 X.bal
X.bal 10 write(X.bal)
write(X.bal)
18
Violations of Serializability
  • Dirty data data written by an uncommitted
    transaction a dirty read is a read of dirty data
    (WR conflict)
  • Unrepeatable read a transaction reads the same
    data item twice and gets different values (RW
    conflict)
  • Phantom problem a transaction retrieves a
    collection of tuples twice and sees different
    results

19
Two Approaches to Serializability (or other
Consistency Models)
  • Locking a pessimistic strategy
  • First paper (Gray et al.) hierarchical locking,
    plus ways of compromising serializability for
    performance
  • Optimistic concurrency control
  • Second paper (Kung Robinson) allow writes by
    each session in parallel, then try to substitute
    them in (or reapply to merge)

20
Locking
  • A lock manager that grants and releases locks
    on objects
  • Two basic types of locks
  • Shared locks (read locks) allow other shared
    locks to be granted on the same item
  • Exclusive locks (write locks) do not coexist
    with any other locks
  • Generally granted in two phase locking model
  • Growing phase locks are granted
  • Shrinking phase locks are released (no new
    locks granted)
  • Well-formed, two-phase locking guarantees
    serializability
  • (Note that deadlocks are possible in this model!)

21
Gray et al. Granularity of Locks
  • For performance and concurrency, want different
    levels of lock granularity, i.e., a hierarchy of
    locks
  • database
  • extent
  • table
  • page
  • row
  • attribute
  • But a problem arises
  • What if T1 S-locks a row and T2 wants to X-lock a
    table?
  • How do we easily check whether we should give a
    lock to T2?

22
Intention Locks
  • Two basic types
  • Intention to Share (IS) a descendant item will
    be locked with a share lock
  • Intention Exclusive lock a descendant item will
    be locked with an exclusive lock
  • Locks are granted top-down, released bottom-up
  • T1 grabs IS lock on table, page S lock on row
  • T2 cant get X-lock on table until T1 is done
  • But T3 can get an IS or S lock on the table

23
Lock Compatibility Matrix
IS IX S SIX X
IS Y Y Y Y N
IX Y Y N N N
S Y N Y N N
SIX Y N N N N
X N N N N N
24
Lock Implementation
  • Maintain as a hash table based on items to lock
  • Lock/unlock are atomic operations in critical
    sections
  • First-come, first-served queue for each locked
    object
  • All adjacent, compatible items are a compatible
    group
  • The groups mode is the most restrictive of its
    members
  • What if a transaction wants to convert (upgrade)
    its lock? Should we send it to the back of the
    queue?
  • No will almost assuredly deadlock!
  • Handle conversions immediately after the current
    group

25
Degrees of Consistency
  • Full locking, guaranteeing serializability, is
    generally very expensive
  • So they propose several degrees of consistency as
    a compromise (these are roughly the SQL isolation
    levels)
  • Degree 0 T doesnt overwrite dirty data of
    other transactions
  • Degree 1 above, plus T does not commit writes
    before EOT
  • Degree 2 above, plus T doesnt read dirty data
  • Degree 3 above, plus other transactions dont
    dirty any data T read

26
Degrees and Locking
  • Degree 0 short write locks on updated items
  • Degree 1 long write locks on updated items
  • Degree 2 long write locks on updated items,
    short read locks on read items
  • Degree 3 long write and read locks
  • Does Degree 3 prevent phantoms? If not, how do
    we fix this?

27
What If We Dont Want to Lock?
  • Conflicts may be very uncommon so why incur the
    overhead of locking?
  • Typically hundreds of instructions for every
    lock/unlock
  • Examples read-mostly DBs large DB with few
    collisions append-mostly hierarchical data
  • Proposition break lock into three phases
  • Read and write to private copy of each page
    (i.e., copy-on-write)
  • Validation make sure no conflicts between
    transactions
  • Write swap the private copies in for the public
    ones

28
Validation
  • Goal guarantee that only serializable schedules
    result in merging Ti and Tj writes
  • Approach find an equivalent serializable
    schedule
  • Assign each transaction a number
  • Ensure equivalent serializable schedule as
    follows
  • If TN(Ti) lt TN(Tj) then we must satisfy one of
  • Ti finishes writing before Tj starts reading
    (serial)
  • WS(Ti) disjoint from RS(Tj) and Ti finishes
    writing before Tj writes
  • WS(Ti) disjoint from RS(Tj) and WS(Ti) disjoint
    from WS(Tj), and Ti finishes read phase before Tj
    completes its read phase

29
Why Does This Work?
  • Condition 1 obvious since its serial
  • Condition 2
  • No W-R conflicts since disjoint
  • In all R-W conflicts, Ti precedes Tj since Ti
    reads before it writes (and thats before Tj)
  • In all W-W conflicts, Ti precedes Tj
  • Condition 3
  • No W-R conflicts since disjoint
  • No W-W conflicts since disjoint
  • In all R-W conflicts, Ti precedes Tj since Ti
    reads before it writes (and thats before Tj)

30
The Achilles Heel
  • How do we assign TNs?
  • Not optimistically they get assigned at the end
    of read phase
  • Note that we need to maintain all of the read and
    write sets for transactions that are going on
    concurrently long-lived read phases cause
    difficulty here
  • Solution bound buffer, abort and restart
    transactions when out of space
  • Drawback starvation need to solve by locking
    the whole DB!

31
Serial Validation
  • Simple writes wont be interleaved, so test
  • Ti finishes writing before Tj starts reading
    (serial)
  • WS(Ti) disjoint from RS(Tj) and Ti finishes
    writing before Tj writes
  • Put in critical section
  • Get TN
  • Test 1 and 2 for everyone up to TN
  • Write
  • Long critical section limits parallelism of
    validation, so can optimize
  • Outside critical section, get a TN and validate
    up to there
  • Before write, in critical section, get new TN,
    validate up to that, write
  • Reads no need for TN just validate up to
    highest TN at end of read phase (no critical
    section)

32
Parallel Validation
  • For allowing interleaved writes
  • Save active transactions (finished reading, not
    writing)
  • Abort if intersect current read/write set
  • Validate
  • CRIT Get TN copy active set add self to
    active set
  • Check (1), (2) against everything from start to
    finish
  • Check (3) against all active set
  • If OK, write
  • CRIT Increment TN counter, remove self from
    active
  • Drawback might conflict in condition (3) with
    someone who gets aborted

33
Whos the Top Dog?Optimistic vs. Non-Optimistic
  • Drawbacks of the optimistic approach
  • Generally requires some sort of global state,
    e.g., TN counter
  • If theres a conflict, requires abort and full
    restart
  • Study by Agrawal et al. comparing optimistic vs.
    locking
  • Need load control with low resources
  • Locking is better with moderate resources
  • Optimistic is better with infinite or high
    resources

34
Reminder
  • The next assignment
  • Read and summarize the ARIES paper, Sections 1.1,
    3-6
Write a Comment
User Comments (0)
About PowerShow.com