Query Optimization, Concluded and Concurrency Control - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Query Optimization, Concluded and Concurrency Control

Description:

Read and summarize the ARIES paper, Sections 1.1, 3-6. Talk by Morgan Stanley: 'Information Security Challenges in Financial Services IT' ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 35

Provided by: zack4

Learn more at: https://www.seas.upenn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Query Optimization, Concluded and Concurrency Control

1
Query Optimization, Concludedand Concurrency
Control

Zachary G. Ives
University of Pennsylvania
CIS 650 Implementing Data Management Systems
February 2, 2005

2
Administrivia

The next assignment
Read and summarize the ARIES paper, Sections 1.1,
3-6
Talk by Morgan Stanley Information Security
Challenges in Financial Services IT
Thursday (tomorrow), 630PM, L101
Pizza, plus USB flash drives for 1st 50 students
(Mostly interested in recruiting)

3
Todays Trivia Question
4
Optimizer Generators

Idea behind EXODUS and StarburstBuild a
programming language for writing custom
optimizers!
Rule-based or transformational language
Describes, using declarative template and
conditions, an equivalent expression
EXODUS compile an optimizer based on a set of
rules
Starburst run a set of rules in the
interpreter a client could customize the rules

5
Starburst Pros and Cons

Pro
Stratified search generally works well in
practice DB2 UDB has perhaps the best query
optimizer out there
Interesting model of separating calculus-level
and algebra-level optimizations
Generally provides fast performance
Con
Interpreted rules were too slow and no database
user ever customized the engine!
Difficult to assign priorities to transformations
Some QGM transformations that were tried were
difficult to assess without running many
cost-based optimizations
Rules got out of control

6
The EXODUS and Volcano Optimizer Generators

Part of a database toolkit approach to building
systems
A set of libraries and languages for building
databases with custom data models and
functionalities

(rules in E)
(EXODUS)
(gcc)
(MyDB plan)
(MyQL)
7
EXODUS/Volcano Model

Try to unify the notion of logical-logical
transformations and logical-physical
transformations
No stratification as in Starburst everything is
transformations
Challenge efficient search need a lot of
pruning
EXODUS used many heuristics, something called a
MESH
Volcano branch-and-bound pruning, recursion
memoization

8
Example Rules

Physical operators
operator 2 join
method 2 hash_join loops_join cartesian_product
Logical-logical transformations
join (1,2) -gt join(2,1)
Logical-physical transformations
join (1,2) by hash_join (1,2)
Can get quite hairy
join 7 (join 8 (1,2), 3) lt-gt join 8(1, join 7
(2,3))ifdef FORWARDif (NOT cover_predicate
(OPERATOR_7 oper_argument, INPUT_2 oper_property,
INPUT_3 oper_property)) REJECT

9
So How Does the Optimizer Work?(EXODUS version)

Needs to enumerate all possible transformations
without repeating
Every expression is stored in a MESH
Basically, an associative lookup for each
expression, which can link to other entries in
the same MESH

10
Search in EXODUS

Apply a transformation, see if it produces a new
node
If so
Find cheapest implementation rule
Also apply all relevant transformation rules, add
results to OPEN set
Propagate revised cost to parents (reanalyze)
Check parents for new transformation
possibilities (rematch)
Heuristics to guide the search in the OPEN set
Promise an expected cost factor for each
transformation rule, based on analysis of
averages of the optimizers cost model results
Favor items with high expected payoff over the
current cost
Problem often need to apply 2 rules to get a
benefit use heuristics
Once a full plan is found, optimizer does hill
climbing, only applying a limited set of rules

11
Pros and Cons of EXODUS

Pros
Unified model of optimization is powerful,
elegant
Very extensible architecture
Cons
Combined logical and physical expressions in the
same MESH
equivalent logical plans with different physical
operators (e.g., merge vs. hash joins) were kept
twice
Physical properties werent handled well
sort enforcers were seldom applied since they
didnt pay off immediately had to hack them
into sort-merge join
Hard-coded transformation, then algorithm
selection, cost analysis
always applied even if not part of the most
promising expression
applied based on perceived benefit biased
towards larger expressions, which meant repeated
re-optimization
Cost wasnt very generic a concept

12
Volcano, Successor to EXODUS(Shouldnt it be
LEVITICUS?)

Re-architected into a top-down, memoized engine
Depth-first search allows branch-and-bound
pruning
FindBestPlan takes logical expression, physical
properties, cost bound
If already computed, return
Else compute set of possible moves
Logical-logical rule
Compliant logical-physical rule
Enforcer
Insert logical expression into lookup table
Insert physical op, plan into separate lookup
table
Return best plan and cost
More generic notions of properties and enforcers
(e.g., location, SHIP), cost (an ADT)

13
EXODUS, Revision 3 Cascades

Basically, a cleaner, more object-oriented
version of the Volcano engine
Rumor has it that MS SQL Server is currently
based on a (simplified and streamlined) version
of the Volcano/Cascades optimizer generator

14
Optimization Evaluation

So, which is best?
Heuristics plus join-based enumeration (System-R)
Stratified, calculus-then-algebraic (Starburst)
Con QGM transformations are almost always
heuristics-based
Pro very succinct transformations at QGM level
Unified algebraic (Volcano/Cascades)
Con many more rules need to be applied to get
effect of QGM rewrites
Pro unified, fully cost-based model

15
Switching Gears

You now know how to build a query-only DBMS, from
physical page layout through query optimization
But what if we want to build an updatable DBMS?
The major issues concurrency (today) and
recovery (next time)

16
Recall the Fundamental Concepts of Updatable DBMSs

ACID properties
atomicity, consistency, isolation, durability
Transactions as atomic units of operation
always commit or abort
can be terminated and restarted by the DBMS an
essential property
typically logged, as well discuss later
Serializability of schedules

17
Serializability and Concurrent Deposits
Deposit 1 Deposit
2 read(X.bal)
read(X.bal) X.bal X.bal 50 X.bal
X.bal 10 write(X.bal)
write(X.bal)
18
Violations of Serializability

Dirty data data written by an uncommitted
transaction a dirty read is a read of dirty data
(WR conflict)
Unrepeatable read a transaction reads the same
data item twice and gets different values (RW
conflict)
Phantom problem a transaction retrieves a
collection of tuples twice and sees different
results

19
Two Approaches to Serializability (or other
Consistency Models)

Locking a pessimistic strategy
First paper (Gray et al.) hierarchical locking,
plus ways of compromising serializability for
performance
Optimistic concurrency control
Second paper (Kung Robinson) allow writes by
each session in parallel, then try to substitute
them in (or reapply to merge)

20
Locking

A lock manager that grants and releases locks
on objects
Two basic types of locks
Shared locks (read locks) allow other shared
locks to be granted on the same item
Exclusive locks (write locks) do not coexist
with any other locks
Generally granted in two phase locking model
Growing phase locks are granted
Shrinking phase locks are released (no new
locks granted)
Well-formed, two-phase locking guarantees
serializability
(Note that deadlocks are possible in this model!)

21
Gray et al. Granularity of Locks

For performance and concurrency, want different
levels of lock granularity, i.e., a hierarchy of
locks
database
extent
table
page
row
attribute
But a problem arises
What if T1 S-locks a row and T2 wants to X-lock a
table?
How do we easily check whether we should give a
lock to T2?

22
Intention Locks

Two basic types
Intention to Share (IS) a descendant item will
be locked with a share lock
Intention Exclusive lock a descendant item will
be locked with an exclusive lock
Locks are granted top-down, released bottom-up
T1 grabs IS lock on table, page S lock on row
T2 cant get X-lock on table until T1 is done
But T3 can get an IS or S lock on the table

23
Lock Compatibility Matrix
IS IX S SIX X
IS Y Y Y Y N
IX Y Y N N N
S Y N Y N N
SIX Y N N N N
X N N N N N
24
Lock Implementation

Maintain as a hash table based on items to lock
Lock/unlock are atomic operations in critical
sections
First-come, first-served queue for each locked
object
All adjacent, compatible items are a compatible
group
The groups mode is the most restrictive of its
members
What if a transaction wants to convert (upgrade)
its lock? Should we send it to the back of the
queue?
No will almost assuredly deadlock!
Handle conversions immediately after the current
group

25
Degrees of Consistency

Full locking, guaranteeing serializability, is
generally very expensive
So they propose several degrees of consistency as
a compromise (these are roughly the SQL isolation
levels)
Degree 0 T doesnt overwrite dirty data of
other transactions
Degree 1 above, plus T does not commit writes
before EOT
Degree 2 above, plus T doesnt read dirty data
Degree 3 above, plus other transactions dont
dirty any data T read

26
Degrees and Locking

Degree 0 short write locks on updated items
Degree 1 long write locks on updated items
Degree 2 long write locks on updated items,
short read locks on read items
Degree 3 long write and read locks
Does Degree 3 prevent phantoms? If not, how do
we fix this?

27
What If We Dont Want to Lock?

Conflicts may be very uncommon so why incur the
overhead of locking?
Typically hundreds of instructions for every
lock/unlock
Examples read-mostly DBs large DB with few
collisions append-mostly hierarchical data
Proposition break lock into three phases
Read and write to private copy of each page
(i.e., copy-on-write)
Validation make sure no conflicts between
transactions
Write swap the private copies in for the public
ones

28
Validation

Goal guarantee that only serializable schedules
result in merging Ti and Tj writes
Approach find an equivalent serializable
schedule
Assign each transaction a number
Ensure equivalent serializable schedule as
follows
If TN(Ti) lt TN(Tj) then we must satisfy one of
Ti finishes writing before Tj starts reading
(serial)
WS(Ti) disjoint from RS(Tj) and Ti finishes
writing before Tj writes
WS(Ti) disjoint from RS(Tj) and WS(Ti) disjoint
from WS(Tj), and Ti finishes read phase before Tj
completes its read phase

29
Why Does This Work?

Condition 1 obvious since its serial
Condition 2
No W-R conflicts since disjoint
In all R-W conflicts, Ti precedes Tj since Ti
reads before it writes (and thats before Tj)
In all W-W conflicts, Ti precedes Tj
Condition 3
No W-R conflicts since disjoint
No W-W conflicts since disjoint
In all R-W conflicts, Ti precedes Tj since Ti
reads before it writes (and thats before Tj)

30
The Achilles Heel

How do we assign TNs?
Not optimistically they get assigned at the end
of read phase
Note that we need to maintain all of the read and
write sets for transactions that are going on
concurrently long-lived read phases cause
difficulty here
Solution bound buffer, abort and restart
transactions when out of space
Drawback starvation need to solve by locking
the whole DB!

31
Serial Validation

Simple writes wont be interleaved, so test
Ti finishes writing before Tj starts reading
(serial)
WS(Ti) disjoint from RS(Tj) and Ti finishes
writing before Tj writes
Put in critical section
Get TN
Test 1 and 2 for everyone up to TN
Write
Long critical section limits parallelism of
validation, so can optimize
Outside critical section, get a TN and validate
up to there
Before write, in critical section, get new TN,
validate up to that, write
Reads no need for TN just validate up to
highest TN at end of read phase (no critical
section)