Title: QUERY PROCESSING
1QUERY PROCESSING
- Topic Number 16
- AANAL PATEL
2Objectives
- Needs of Query Processing.
- Overview of Query Processing and Optimization.
- Dynamic versus Static Optimization.
- Phases of Query Decomposition.
- Methods of Query Optimization.
- How to create a relational algebra tree to
represent a query. - Questions.
- Quiz.
3Introduction
- Database grows larger and larger as time passes,
one of the most important characteristics of a
database is its ability to maintain a consistent
and acceptable level of performance. - The principal mechanism to obtain optimal level
of performance of database is database query
processing. - In network and hierarchical DBMSs, low level
procedural language is generally embedded in
high-level programming language. - In languages like SQL, user only specifies the
data to be retrieved but not how to retrieve
that. - Gives DBMS more control over system performance.
4Overview of Query Processing
- Definition
- The activities involved in parsing,
validating, optimizing, and executing a query. -
- Aim
- Transform query written in high-level
language (e.g. SQL), into correct and efficient
execution strategy expressed in low-level
language (implementing Relational Algebra).
High level user query
Query Processor
Low level data manipulation commands
5Query Optimization
- Defination
- The activity of choosing an efficient execution
strategy for processing a query. - Aim
- To choose a transformation that minimizes
resource usage. - Reduce total execution time of query.
- May also reduce response time of query.
- Problem computationally intractable with large
number of relations, so strategy adopted is
reduced to finding near optimum solution.
6Example Comparison of different processing
strategies
- Find all Managers who work at a London branch.
- Query in SQL
- SELECT FROM Staff s, Branch b
- WHERE s.branchNo b.branchNo AND
- (s.position Manager AND b.city London)
- Assumptions
- 1000 tuples in Staff
- 50 tuples in Branch
- 50 Managers
- 5 London branches
- no indexes or sort keys
- results of any intermediate operations stored on
disk - cost of the final write is ignored
- tuples are accessed one at a time.
7- Three equivalent relational algebra queries are
- (1) ?(position'Manager') ? (city'London') ?
(Staff.branchNoBranch.branchNo) (Staff
X Branch) - (2) ?(position'Manager') ? (city'London')(Staff
X Staff.branchNoBranch.branchNo Branch) - (3) (?position'Manager'(Staff)) X
Staff.branchNoBranch.branchNo (?city'London'(Bra
nch)) - Cost (in disk accesses) are
- C1((1000 50) 2(1000 50)) 101 050
- C2(21000 (1000 50)) 3 050
- C3 (1000 250 5 (50 5)) 1 160
- If we compare all three above costs C1, C2, C3,
than best case is C3 with ration of 871
8Dynamic versus Static Optimization
- Dynamic Optimization Static
Optimization
- Information required to select an optimum
strategy is up to date. - Performance of query is affected.
- Time may limit finding optimum strategy.
- The runtime overhead is removed, so gets more
time to find optimum strategy. - Chosen optimum strategy may no longer be optimal
when query is run. - A hybrid approach could be used to overcome this.
9Phases of Query Processing
- Query Processing can be divided into four main
phases - Query Decomposition
- Query Optimization
- Code Generation
- Runtime Query Execution
10Query Decomposition
- Aims are to transform high-level query into
Relational Algebra query and check that query is
syntactically and semantically correct. - Typical stages are
- analysis
- normalization
- semantic analysis
- simplification
- query restructuring
11Analysis
- Analyze query lexically and syntactically using
compiler techniques. - Verify relations and attributes exist.
- Verify operations are appropriate for object
type. - Example
- SELECT studentNo
- FROM Student
- WHERE grade gt 80
- In analysis stage query would be rejected on two
points - 1. In the select list, the attribute studentNo is
not defined for the Student relation. It may be
defined as studentId instead of studentNo. - 2. In the WHERE clause, the comparison is
incompatible with the data type grade, which is a
variable character string like A,B,C,D,Fa
il.
12Query Tree
- A Query tree is a tree structure that corresponds
to a relational algebra expression such that - - A Leaf node created for each base relation.
- - A Non-leaf node created for each
intermediate relation produced ? by RA
operation. - - The Root of tree represents query
result. - - The Sequence is directed from leaves
to root.
13Normalization
- Converts query into a normalized form for easier
manipulation. - Predicate can be converted into one of two forms
- Conjunctive normal form
- (position 'Manager' ? salary gt 20000) ?
(branchNo 'B003') - Disjunctive normal form
- (position 'Manager' ? branchNo 'B003' ) ?
- (salary gt 20000 ? branchNo 'B003')
14Semantic Analysis
- Objective of semantic analysis is to rejects
normalized queries that are incorrectly
formulated or contradictory. - Query is incorrectly formulated if components do
not contribute to generation of result. - Query is contradictory if its predicated can not
be satisfied by any tuple. - Algorithms to determine correctness exits only
for queries that do not contain disjunction and
negation. For these queries we could construct - 1. A relation connection graph.
- 2. Normalized attribute connection graph.
- A Relation Connection Graph
- Create node for each relation and node for
result. - Create edges between two nodes that represent a
join, and edges between nodes that represent
projection. - If not connected, query is incorrectly
formulated.
15- Example Checking Semantic Correctness.
-
- SELECT p.propertyNo, p.street
- FROM Client c, Viewing v, PropertyForRent p
- WHERE c.clientNo v.clientNo AND c.maxRent gt
500 - AND c.prefType Flat AND
p.ownerNo CO93
16- Normalized Attribute Connection Graph
- Create node for each reference to an attribute,
or constant 0. - Create directed edges between nodes that
represent a join, and directed edge between
attribute node and 0 nodes that represents
selection. - Weight edges a ? b with value c, if it represents
inequality condition (a ? b c) weight edges 0
? a with c, if it represents inequality
condition (a ? c). - If graph has cycle for which valuation sum is
negative , query is contradictory. - Example Checking Semantic Correctness.
- SELECT p.propertyNo, p.street
- FROM Client c, Viewing v, PropertyForRent p
- WHERE c.maxRent gt 500 AND c.clientNo
v.clientNo AND v.propertyNo p.propertyNo
AND - c.prefType Flat AND c.maxRent lt 200
17(No Transcript)
18Simplification
- The objectives of the simplification stage are to
deduct redundant qualifications, eliminate common
subexperssions, and transform the query into more
easily and efficiently computed form. - Typically, access restrictions, view definitions,
and integrity constraints are considered. - Assuming user has appropriate access privileges,
first apply well-known idempotency rules of
Boolean algebra. - Query restructuring
- The query is restructured to provide a more
efficient implementation of query processing.
19Methods of Query Optimization
- 1. Heuristic based query optimization.
- Perform Selection operations as early as
possible. - Combine Cartesian product with subsequent
selection whose predicate represents join
condition into a Join operation. - Use associatively of binary operations to
rearrange leaf nodes so leaf nodes with most
restrictive Selection operations executed first. - Perform Projections operations as early as
possible. - Eliminate duplicate computations.
20- 2. Cost based query optimization.
- Objective of Cost-based query optimization is
estimate the cost of different equivalent query
expressions and chose the execution plan with the
lowest cost. - Cost of query is concerned on disk access, which
are slow compared to memory access. - Many of cost estimations are based on the
cardinality of the relation. - DBMS holds the statistical information in its
system catalog. - DBMS updates the statistical information daily
or periodically as per organization requirements.
21Transformation rules for the RA operations
- Conjunctive Selection operations can cascade into
individual Selection operations (and vice
versa). - ?p?q?r(R) ?p(?q(?r(R)))
- Sometimes referred to as cascade of Selection.
- Example ?branchNo'B003' ? salarygt15000(Staff)
?branchNo'B003'(?salarygt150
00(Staff)) - Commutativity of Selection.
- ?p(?q(R)) ?q(?p(R))
- Example ?branchNo'B003'(?salarygt15000(Staff))
?salarygt15000(?branchNo'B003'(Staff))
22- In a sequence of Projection operations, only the
last in the sequence is required. - ?L?M ?N(R) ?L (R)
-
- Example ?lName?branchNo, lName(Staff) ?lName
(Staff) - Commutativity of Selection and Projection.
- If predicate p involves only attributes in
projection list, Selection and Projection
operations commute - ?Ai, , Am(?p(R)) ?p(?Ai, , Am(R)) ,where p?
A1, A2, , Am - Example ?fName, lName(?lNameSmith'(Staff))
?lNameSmith'(?fName,lName(Staff)) - Commutativity of Theta Join
-
- R Xp S S Xp R
- R X S S X R
23- Rule also applies to Equijoin and Natural join.
- Example Staff X staff.branchNobranch.branchNo
Branch Branch X staff.branchNobranch.branchN
o Staff - Commutativity of Selection and Theta join (or
Cartesian product). - If selection predicate involves only attributes
of one of join
relations, Selection
and Join (or Cartesian product) operations
commute as - ?p(R Xr S) (?p(R)) Xr S
- ?p(R X S) (?p(R)) X S where p? A1,
A2, , An
24- Commutativity of Projection and Theta join (or
Cartesian product). - If projection list is of form L L1 ? L2, where
L1 only has attributes of R, and L2 only has
attributes of S, provided join condition only
contains attributes of L, Projection and Theta
join commute - ?L1?L2(R Xr S) (?L1(R)) Xr (?L2(S))
- If join condition contains additional attributes
not in L (M M1 ? M2 where M1 only has
attributes of R, and M2 only has attributes of
S), a final projection operation is require
?L1?L2(R Xr S) ?L1?L2( (?L1?M1(R)) Xr
(?L2?M2(S)))
Example?position,city,branchNo(StaffXStaff.bran
chNoBranch.branchNo Branch) (?position,
branchNo(Staff)) XStaff.branchNo
Branch.branchNo (?city, branchNo (Branch))
25Questions
26- 1. According to presentation how we can define
Query Processing? -
- a) the Parsing, Planning, Optimizing, and
Translating of a query - b) the Parsing, Analyzing, Customizing, and
Execution of a query - c) the Parsing, Validating, Optimizing, and
Execution of a query - d) the Formulating, Entering, Optimizing, and
Translating of a query - e) none of above
- 2. In which Phase of Query Processing are
Database Statistics are used? -
- a) Runtime Query Execution
- b) Query Optimization
- c) Query Decomposition
- d) Code Generation
- e) both b) and c)
- 3. Objective of Semantic Analysis is to reject
normalized queries that are
27- 1. According to presentation how we can define
Query Processing? -
- a) the Parsing, Planning, Optimizing, and
Translating of a query - b) the Parsing, Analyzing, Customizing, and
Execution of a query - c) the Parsing, Validating, Optimizing, and
Execution of a query - d) the Formulating, Entering, Optimizing, and
Translating of a query - e) none of above
- 2. In which Phase of Query Processing are
Database Statistics are used? -
- a) Runtime Query Execution
- b) Query Optimization
- c) Query Decomposition
- d) Code Generation
- e) both b) and c)
- 3. Objective of Semantic Analysis is to reject
normalized queries that are