L05: Distributed Query Processing - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

L05: Distributed Query Processing

Description:

Any high-level query (SQL) on a database must be processed, ... p (Ai)(R S) ( p (Ai)(R)) S. p (Ai)(R S) ( p (Ai)(R)) S. p (Ai)(R S) p (Ai)(R) p (Ai)(S) ... – PowerPoint PPT presentation

Number of Views:158

Avg rating:3.0/5.0

Slides: 38

Provided by: Lu47

Category:

Tags: ai | distributed | l05 | processing | query

more less

Transcript and Presenter's Notes

Title: L05: Distributed Query Processing

1
L05 Distributed Query Processing Optimization

Query Processing
Query Decomposition
Data Localization
Query Optimization

2
Query Processing

Any high-level query (SQL) on a database must be
processed, optimized and executed by the DBMS
The high-level query is scanned, and parsed to
check for syntactic correctness
An internal representation of a query is created,
which is either a query tree or a query graph
The DBMS then devises an execution strategy for
retrieving the result of the query. (An execution
strategy is a plan for executing the query by
accessing the data, and storing the intermediate
results)
The process of choosing one out of the many
execution strategies is known as query
optimization

3
Query Processor

A query processor is a module in the DBMS that
performs the tasks to process, to optimize, and
to generate execution strategy for a high-level
query
For a DDBMS, the QP also does data localization
for the query based on the fragmentation scheme
and generates the execution strategy that
incorporates the communication operations
involved in processing the query

4
Query Optimizer

Queries expressed in SQL can have multiple
equivalent relational algebra query expressions
The distributed query optimizer must select the
ordering of relational algebra operations, sites
to process data, and possibly the way data should
be transferred. This makes distributed query
processing significantly more difficult

5
Complexity of Relational Algebra Operations

The relational algebra is used to express the
output of the query. The complexity of relational
algebra operations play a role in defining some
of the principles of query optimization. All
complexity measures are based on the cardinality
of the relation
Operations Complexity
Select, Project (w/o duplicate elimination) O(n)
Project (with duplicate elimination), Group O(n
logn)
Join, Semi-join, Division, Set Operators O(n
logn)
Cartesian Product O(n2 )
This was given in the book (p194). It is over
simplified.

6
Characteristics of Query Processors

Languages
Input language can be relational algebra or
calculus output language is relational algebra
(annotated with communication primitives). The
query processor must efficiently map input
language to output language
Types of Optimization
The output language specification represents the
execution strategy. There can be many such
strategies, the best one can be selected through
exhaustive search, or by applying heuristic
(minimize size of intermediate relations). For
distributed databases semijoins can be applied
to reduce data transfer.

7
When to Optimize

Static done before executing the query (at
compilation time), cost of optimization amortized
over multiple executions, mostly based on
exhaustive search. Since sizes of intermediate
relations need to be estimated, it can result in
sub-optimal strategies.
Dynamic done at run time every time the query
is executed, can make use of exact sizes of
intermediate relations, expensive, based on
heuristics
Hybrid mixes static and dynamic approaches the
approach is mainly static, but dynamic query
optimization may take place when high difference
between predicted and actual sizes are detected

8
Characteristics of Query Processors

Statistics
fragment cardinality and size
size and number of distinct values for each
attribute. detailed histograms of attribute
values for better selectivity estimation.
Decision Sites
one site or several sites participate in
selection of strategy
Exploitation of network topology
wide area network communication cost
local area network parallel execution

9
Characteristics of Query Processors

Exploitation of replicated fragments
larger number of possible strategies
Use of Semijoins
reduce size of data transfer
increase of messages and local processing
good for fast or slow networks?

10
Layers of Query Processing
Calculus Query on Distributed Relations
QUERY DECOMPOSITION
CONTROL SITE
Algebra Query on Distributed Relations
DATA LOCALIZATION
Fragment Query
GLOBAL OPTIMIZATION
Optimized Fragment Query With Communication
Operations
LOCAL SITE
LOCAL OPTIMIZATION
Optimized Local Queries
11
L05 Distributed Query Processing Optimization

Query Processing
Query Decomposition
Data Localization
Query Optimization

12
Query Decomposition

Normalization
The calculus query is written in a normalized
form (CNF or DNF) for subsequent manipulation
Analysis
The query is analyzed for semantic correctness
Simplification
Redundant predicates are eliminated to obtain
simplified queries
Restructuring
The calculus query is translated to optimal
algebraic query representation

13
Query Decomposition Normalization

Lexical and syntactic analysis
check validity
check for attributes and relations
type checking on the qualification
There are two possible forms of representing the
predicates in query qualification Conjunctive
Normal Form (CNF) or Disjunctive Normal Form
(DNF)
CNF (p11 ? p12 ?... ? p1n) ? ... ? (pm1 ? pm2
?... ? pmn)
DNF (p11 ? p12 ?... ? p1n) ? ... ? (pm1 ? pm2
?... ? pmn)
OR's mapped into union
AND's mapped into join or selection

14
Query Decomposition Analysis

Queries are rejected because
the attributes or relations are not defined in
the global schema or
operations used in qualifiers are semantically
incorrect
For only those queries that do not use
disjunction or negation semantic correctness can
be determined by using query graph
One node of the query graph represents result
sites, others operand relations, edge between
nodes operand nodes represent joins, and edge
between operand node and result node represents
project

15
Query Graph and Join Graph
SELECT Ename, Resp FROM E, G, J WHERE E. ENo
G. ENO AND G.JNO J.JNO AND JNAME CAD''
AND DUR gt 36 AND Title Prog''
E. ENo G. ENO
G.JNO J.JNO
Ename
Resp
JNAME CAD''
Title Prog''
E. ENo G. ENO
G.JNO J.JNO
DUR gt 36
16
Disconnected Query Graph

Semantically incorrect conjunctive multivariable
query without negation have query graphs which
are not connected

SELECT Ename, Resp FROM E, G, J WHERE E. ENo
G. ENO AND JNAME CAD'' AND DUR gt 36 AND
Title Prog''
Ename
Resp
Title Prog''
E. ENo G. ENO
DUR gt 36
JNAME CAD''
17
Simplification Eliminating Redundancy

Elimination of redundant predicates using well
known idempotency rules
p ? p p p ? p p p ? true true
p ? false p p ? true p p ? false
false
p1 ? (p1 ? p 2 ) p1
p1 ? (p1 ? p 2 ) p1
Such redundant predicates arise when user query
is enriched with several predicates to
incorporate view relation correspondence, and
ensure semantic integrity and security

18
Eliminating Redundancy-- An Example
SELECT TITLE FROM E WHERE (NOT (TITLE
Programmer'') AND (TITLE Programmer'' OR
TITLE Elec.Engr'') AND NOT (TITLE
Elec.Engr'')) OR ENAME J.Doe''
SELECT TITLE FROM E WHERE ENAME J.Doe''
19
Eliminating Redundancy-- An Example
p1 ltTITLE Programmer''gt p2 ltTITLE
Elec. Engr''gt p3 ltENAME J.Doe''gt
Let the query qualification is ( p1 ? (p1 ?
p2) ? p2) ? p3
The disjunctive normal form of the query is (
p1 ? p1 ? p2) ? ( p1 ? p2 ? p2) ? p3
(false ? p2) ? ( p1 ? false) Ú p3 false ?
false ? p3 p3
20
Query Decomposition Rewriting

Rewriting calculus query in relational algebra
straightforward transformation from relational
calculus to relational algebra, and
restructuring relational algebra expression to
improve performance

21
Rewriting -- Transformation Rules (I)

Commutativity of binary operations
R ? S ? S ? R
R ? S ? S ? R
Associativity of binary operations
(R ? S) ? T ? R ? ( S ? T )
Idempotence of unary operations grouping of
projections and selections
?A ( ? A (R )) ? ? A (R ) for A?A? A
?p1(A1) ( ? p2(A2) (R )) ? ? p1(A1) ?p2(A2) (R )

22
Rewriting -- Transformation Rules (II)

Commuting selection with projection
?A1, , An ( ? p (Ap) (R )) ? ?A1, , An ( ? p
(Ap) ( ?A1, , An, Ap(R )))
Commuting selection with binary operations
? p (Ai)(R ? S) ? (? p (Ai)(R)) ? S
? p (Ai)(R S) ? (? p (Ai)(R)) S
? p (Ai)(R ? S) ? ? p (Ai)(R) ? ? p (Ai)(S)
Commuting projection with binary operations
?C(R ? S) ? ?A(R) ? ?B (S) C A ? B
?C(R S) ? ?C(R) ?C (S)
?C (R ? S) ? ?C (R) ? ?C (S)

23
An SQL Query and Its Query Tree
?ENAME
?(ENAMEltgtJ.DOE )?(JNAMECAD/CAM )? (Dur12 ?
Dur24)
SELECT Ename FROM J, G, E WHERE G.EnoE.ENo
AND G.JNo J.JNo AND ENAME ltgt J.Doe'
AND JName CAD' AND (Dur12 or
Dur24)
PROJ
ASG
EMP
24
Query Decomposition Rewriting
?ENAME
? JNO, ENAME
? JNO
?JNAMECAD/CAM
? JNO, ENO
? ENO, ENAME
?Dur12 ? Dur24
?ENAMEltgtJ.DOE
PROJ
ASG
EMP
25
L05 Distributed Query Processing Optimization

Query Processing
Query Decomposition
Data Localization
Query Optimization

26
Data Localization

Localization program
Given an algebraic query on global schema
Determine which fragments are involved
A naïve way to localize a distributed query
Substitute each global relation with its
localization program ? generic query
Use reduction techniques for efficiency

27
Reduction for HF

Remove empty relations generated by contradicting
selection on horizontal fragments
Remove useless relations generated by projections
on vertical fragments
Distribute joins over unions in order to isolate
and remove useless joins

28
Data Localization-- An Example
EMP is fragmented into EMP1 ?ENO? E3
(EMP) EMP2 ? E3 lt ENO? E6 (EMP) EMP3 ?ENO
gtE6 (EMP)
?ENAME
?Dur12 ? Dur24
?ENAMEltgtJ.DOE
ASG is fragmented into ASG1 ?ENO? E3
(ASG) ASG2 ?ENO gtE3 (ASG)
?JNAMECAD/CAM
PROJ
?
?
ASG1
ASG1
ASG2
EMP1
EMP1
EMP1
29
Reduction with Selection
EMP is fragmented into EMP1 ?ENO? E3
(EMP) EMP2 ? E3 lt ENO? E6 (EMP) EMP3 ?ENO
gtE6 (EMP)
SELECT FROM EMP WHERE ENOE5
Given Relation R, FRR1, R2, , Rn where Rj
?pj(R) ?pj(Rj) ? if ?x ? R ?(pi(x)?pj(x))
30
Reduction with join
SELECT FROM EMP, ASG WHERE EMP.ENOASG.ENO
EMP is fragmented into EMP1 ?ENO? E3
(EMP) EMP2 ? E3 lt ENO? E6 (EMP) EMP3 ?ENO
gtE6 (EMP)
ASG is fragmented into ASG1 ?ENO? E3
(ASG) ASG2 ?ENO gtE3 (ASG)
31
Reduction with Join (I)
(R1 ? R2) S ? (R1 S) ? (R2
S)
32
Reduction with Join (II)
?
Given Ri ?pi(R) and Rj ?pj(R) Ri Rj ? if
?x ? Ri , ?y? Rj ?(pi(x)?pj(y))
Reduction with join 1. Distribute join over
union 2. Eliminate unnecessary work
33
Reduction for VF

Find useless intermediate relations
Relation R defined over attributes A A1, A2,
, An vertically fragmented as Ri ?A (R) where
A? A
?K,D (Ri) is useless if the set of projection
attributes D is not in A

EMP1 ?ENO,ENAME (EMP) EMP2 ?ENO,TITLE (EMP)
?ENAME
SELECT ENAME FROM EMP
EMP1
34
Reduction for DHF
Distribute joins over union Apply the join
reduction for horizontal fragmentation
EMP1 ?TITLEProgrammer (EMP) EMP2
?TITLE?Programmer (EMP) ASG1 ASG ENO
EMP1 ASG2 ASG ENO EMP2
SELECT FROM EMP, ASG WHERE ASG.ENO
EMP.ENO AND EMP.TITLE Mech. Eng.
35
Reduction for DHF (II)
Joins over union
?
36
Reduction for HF --An Example
EMP1 ?ENO?E4 (?ENO,ENAME (EMP)) EMP2
?ENOgtE4 (?ENO,ENAME (EMP)) EMP3 ?ENO,TITLE
(EMP) QUERY SELECT ENAME FROM EMP WHERE ENO
E5
37
Summary