Title: Query%20Compiler
1Query Compiler
2The Query Compiler
- Parses SQL query into parse tree
- Transforms parse tree into expression tree
(logical query plan) - Transforms logical query plan into physical query
plan
3(No Transcript)
4Grammar for simple SQL
- ltQuerygt ltSFWgt
- ltQuerygt (ltQuerygt)
- ltSFWgt SELECT ltSelListgt FROM ltFromListgt WHERE
ltCondgt - ltSelListgt ltAttrgt,ltSelListgt
- ltSelListgt ltAttrgt
- ltFromListgt ltRelationgt, ltFromListgt
- ltFromListgt ltRelationgt
- ltCondgt ltCondgt AND ltCondgt
- ltCondgt ltTuplegt IN ltQuerygt
- ltCondgt ltAttrgt ltAttrgt
- ltCondgt ltAttrgt LIKE ltPatterngt
- ltTuplegt ltAttrgt
- Atoms(constants), ltsyntactic categoriesgt(variable)
,
5Query
- StarsIn(title,year,starName)
- MovieStar(name,address,gender,birthdate)
- Query
- Give titles of movies that have at least one
star born in 1960 - SELECT title
- FROM StarsIn, MovieStar
- WHERE starName name AND birthdate LIKE '1960'
6Parse Tree
ltQuerygt
ltSFWgt
SELECT ltSelListgt FROM ltFromListgt WHERE
ltConditiongt
ltAttributegt ltRelNamegt , ltFromListgt
AND
title StarsIn ltRelNamegt
MovieStar
ltConditiongt ltConditiongt
ltAttributegt ltAttributegt
ltAttributegt LIKE ltPatterngt
starName name
birthdate 1960
7Another query equivalent
- SELECT title
- FROM StarsIn
- WHERE starName IN (
- SELECT name
- FROM MovieStar
- WHERE birthdate LIKE '1960'
- )
8The Preprocessor (expand query semantic
checking)
- Checks against schema definition
- Relation uses
- Attribute uses, resolve names ( A to R.A)
- Use of types (strings, integers, dates, etc)
- and operators arguments type/arity
- These preprocessing functions are called
- semantic checking
- If all tests are passed, then the parse tree is
said to be valid
9Algebraic laws for transforming logical query
plans
- Commutative and associative laws
Above laws are applicable for both sets and bags
10Theta-join
- Commutative
- Not always associative
- On schema R(a,b), S(b,c), T(c,d) the first query
can not be transformed into the second (Why?)
Because, we cant join S and T using the
condition altd since a is an attribute of neither
S nor T.
11Laws Involving Selection (?)
Splitting laws
Only if R is a set. The union is set union
Order is flexible
12Laws Involving Selection (?)
What about intersection?
13Algebraic Laws involving selection
For the binary operators, we push the selection
only if all attributes in the condition C are in
R.
14Example
- Consider relation schemas R(A,B) and S(B,C) and
the expression below - ?(A1 OR A3) AND BltC(R ?? S)
- Splitting AND ?A1 OR A3 (?B lt C(R ?? S))
- Push ? to S ?A1 OR A3 (R ?? ?B lt C(S))
- Push ? to R ?A1 OR A3 (R) ?? ?B lt C(S)
15Pushing selections
- Usually selections are pushed down the
expression tree. - The following example shows that it is sometimes
useful to pull selection up in the tree. - StarsIn(title,year,starName)
- Movie(title,year,length,studioName)
- CREATE VIEW MoviesOf1996 AS
- SELECT FROM MOVIE WHERE year1996
- Query Which stars worked for which studios in
1996? - SELECT starName,studioName
- FROM MoviesOf1996 NATURAL JOIN StarsIN
16pull selection up then push down
17Laws for (bag) Projection
- A simple law Project out attributes that are not
needed later. - i.e. keep only the input attr. and any join
attribute.
18Examples for pushing projection
- Schema R(a,b,c), S(c,d,e)
19Example Pushing Projection
- Schema StarsIn(title,year,starName)
- Query SELECT starName FROM StarsIn
- WHERE year 1996
Should we transform to ?
? Depends! Is StarsIn stored or computed?
20Reasons for not pushing the projection
- If StarsIn is stored, then for the projection we
have to scan the relation. - If the relation is pipelined from some previous
computation, then yes, we better do the
projection (on the fly). - Also, if for example there is an index on year
for StarsIn, such index is useless in the
projected relation ?starName,year(StarsIn) - While such an index is very useful for the
selection on year1996
21Laws for duplicate elimination and grouping
- Try to move ? in a position where it can be
eliminated altogether - E.g. when ? is applied on
- A stored relation with a declared primary key
- A relation that is the result of a ? operation,
since grouping creates a relation with no
duplicates.
? absorbs ?
22Improving logical query plans
- Push ? as far down as possible (sometimes pull
them up first). - Do splitting of complex conditions in ? in order
to push ? even further. - Push ? as far down as possible, introduce new
early ? (but take care for exceptions) - Combine ? with ? to produce ?-joins or equi-joins
- Choose an order for joins
23Example of improvement
SELECT title FROM StarsIn, MovieStar WHERE
starName name AND birthdate LIKE 1960
24And a better plan introducing a projection to
filter out useless attributes