L05: Distributed Query Processing - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

L05: Distributed Query Processing

Description:

Any high-level query (SQL) on a database must be processed, ... p (Ai)(R S) ( p (Ai)(R)) S. p (Ai)(R S) ( p (Ai)(R)) S. p (Ai)(R S) p (Ai)(R) p (Ai)(S) ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 38
Provided by: Lu47
Category:

less

Transcript and Presenter's Notes

Title: L05: Distributed Query Processing


1
L05 Distributed Query Processing Optimization
  • Query Processing
  • Query Decomposition
  • Data Localization
  • Query Optimization

2
Query Processing
  • Any high-level query (SQL) on a database must be
    processed, optimized and executed by the DBMS
  • The high-level query is scanned, and parsed to
    check for syntactic correctness
  • An internal representation of a query is created,
    which is either a query tree or a query graph
  • The DBMS then devises an execution strategy for
    retrieving the result of the query. (An execution
    strategy is a plan for executing the query by
    accessing the data, and storing the intermediate
    results)
  • The process of choosing one out of the many
    execution strategies is known as query
    optimization

3
Query Processor
  • A query processor is a module in the DBMS that
    performs the tasks to process, to optimize, and
    to generate execution strategy for a high-level
    query
  • For a DDBMS, the QP also does data localization
    for the query based on the fragmentation scheme
    and generates the execution strategy that
    incorporates the communication operations
    involved in processing the query

4
Query Optimizer
  • Queries expressed in SQL can have multiple
    equivalent relational algebra query expressions
  • The distributed query optimizer must select the
    ordering of relational algebra operations, sites
    to process data, and possibly the way data should
    be transferred. This makes distributed query
    processing significantly more difficult

5
Complexity of Relational Algebra Operations
  • The relational algebra is used to express the
    output of the query. The complexity of relational
    algebra operations play a role in defining some
    of the principles of query optimization. All
    complexity measures are based on the cardinality
    of the relation
  • Operations Complexity
  • Select, Project (w/o duplicate elimination) O(n)
  • Project (with duplicate elimination), Group O(n
    logn)
  • Join, Semi-join, Division, Set Operators O(n
    logn)
  • Cartesian Product O(n2 )
  • This was given in the book (p194). It is over
    simplified.

6
Characteristics of Query Processors
  • Languages
  • Input language can be relational algebra or
    calculus output language is relational algebra
    (annotated with communication primitives). The
    query processor must efficiently map input
    language to output language
  • Types of Optimization
  • The output language specification represents the
    execution strategy. There can be many such
    strategies, the best one can be selected through
    exhaustive search, or by applying heuristic
    (minimize size of intermediate relations). For
    distributed databases semijoins can be applied
    to reduce data transfer.

7
When to Optimize
  • Static done before executing the query (at
    compilation time), cost of optimization amortized
    over multiple executions, mostly based on
    exhaustive search. Since sizes of intermediate
    relations need to be estimated, it can result in
    sub-optimal strategies.
  • Dynamic done at run time every time the query
    is executed, can make use of exact sizes of
    intermediate relations, expensive, based on
    heuristics
  • Hybrid mixes static and dynamic approaches the
    approach is mainly static, but dynamic query
    optimization may take place when high difference
    between predicted and actual sizes are detected

8
Characteristics of Query Processors
  • Statistics
  • fragment cardinality and size
  • size and number of distinct values for each
    attribute. detailed histograms of attribute
    values for better selectivity estimation.
  • Decision Sites
  • one site or several sites participate in
    selection of strategy
  • Exploitation of network topology
  • wide area network communication cost
  • local area network parallel execution

9
Characteristics of Query Processors
  • Exploitation of replicated fragments
  • larger number of possible strategies
  • Use of Semijoins
  • reduce size of data transfer
  • increase of messages and local processing
  • good for fast or slow networks?

10
Layers of Query Processing
Calculus Query on Distributed Relations
QUERY DECOMPOSITION
CONTROL SITE
Algebra Query on Distributed Relations
DATA LOCALIZATION
Fragment Query
GLOBAL OPTIMIZATION
Optimized Fragment Query With Communication
Operations
LOCAL SITE
LOCAL OPTIMIZATION
Optimized Local Queries
11
L05 Distributed Query Processing Optimization
  • Query Processing
  • Query Decomposition
  • Data Localization
  • Query Optimization

12
Query Decomposition
  • Normalization
  • The calculus query is written in a normalized
    form (CNF or DNF) for subsequent manipulation
  • Analysis
  • The query is analyzed for semantic correctness
  • Simplification
  • Redundant predicates are eliminated to obtain
    simplified queries
  • Restructuring
  • The calculus query is translated to optimal
    algebraic query representation

13
Query Decomposition Normalization
  • Lexical and syntactic analysis
  • check validity
  • check for attributes and relations
  • type checking on the qualification
  • There are two possible forms of representing the
    predicates in query qualification Conjunctive
    Normal Form (CNF) or Disjunctive Normal Form
    (DNF)
  • CNF (p11 ? p12 ?... ? p1n) ? ... ? (pm1 ? pm2
    ?... ? pmn)
  • DNF (p11 ? p12 ?... ? p1n) ? ... ? (pm1 ? pm2
    ?... ? pmn)
  • OR's mapped into union
  • AND's mapped into join or selection

14
Query Decomposition Analysis
  • Queries are rejected because
  • the attributes or relations are not defined in
    the global schema or
  • operations used in qualifiers are semantically
    incorrect
  • For only those queries that do not use
    disjunction or negation semantic correctness can
    be determined by using query graph
  • One node of the query graph represents result
    sites, others operand relations, edge between
    nodes operand nodes represent joins, and edge
    between operand node and result node represents
    project

15
Query Graph and Join Graph
SELECT Ename, Resp FROM E, G, J WHERE E. ENo
G. ENO AND G.JNO J.JNO AND JNAME CAD''
AND DUR gt 36 AND Title Prog''
E. ENo G. ENO
G.JNO J.JNO
Ename
Resp
JNAME CAD''
Title Prog''
E. ENo G. ENO
G.JNO J.JNO
DUR gt 36
16
Disconnected Query Graph
  • Semantically incorrect conjunctive multivariable
    query without negation have query graphs which
    are not connected

SELECT Ename, Resp FROM E, G, J WHERE E. ENo
G. ENO AND JNAME CAD'' AND DUR gt 36 AND
Title Prog''
Ename
Resp
Title Prog''
E. ENo G. ENO
DUR gt 36
JNAME CAD''
17
Simplification Eliminating Redundancy
  • Elimination of redundant predicates using well
    known idempotency rules
  • p ? p p p ? p p p ? true true
  • p ? false p p ? true p p ? false
    false
  • p1 ? (p1 ? p 2 ) p1
  • p1 ? (p1 ? p 2 ) p1
  • Such redundant predicates arise when user query
    is enriched with several predicates to
    incorporate view relation correspondence, and
    ensure semantic integrity and security

18
Eliminating Redundancy-- An Example
SELECT TITLE FROM E WHERE (NOT (TITLE
Programmer'') AND (TITLE Programmer'' OR
TITLE Elec.Engr'') AND NOT (TITLE
Elec.Engr'')) OR ENAME J.Doe''
SELECT TITLE FROM E WHERE ENAME J.Doe''
19
Eliminating Redundancy-- An Example
p1 ltTITLE Programmer''gt p2 ltTITLE
Elec. Engr''gt p3 ltENAME J.Doe''gt
Let the query qualification is ( p1 ? (p1 ?
p2) ? p2) ? p3
The disjunctive normal form of the query is (
p1 ? p1 ? p2) ? ( p1 ? p2 ? p2) ? p3
(false ? p2) ? ( p1 ? false) Ú p3 false ?
false ? p3 p3
20
Query Decomposition Rewriting
  • Rewriting calculus query in relational algebra
  • straightforward transformation from relational
    calculus to relational algebra, and
  • restructuring relational algebra expression to
    improve performance

21
Rewriting -- Transformation Rules (I)
  • Commutativity of binary operations
  • R ? S ? S ? R
  • R ? S ? S ? R
  • Associativity of binary operations
  • (R ? S) ? T ? R ? ( S ? T )
  • Idempotence of unary operations grouping of
    projections and selections
  • ?A ( ? A (R )) ? ? A (R ) for A?A? A
  • ?p1(A1) ( ? p2(A2) (R )) ? ? p1(A1) ?p2(A2) (R )

22
Rewriting -- Transformation Rules (II)
  • Commuting selection with projection
  • ?A1, , An ( ? p (Ap) (R )) ? ?A1, , An ( ? p
    (Ap) ( ?A1, , An, Ap(R )))
  • Commuting selection with binary operations
  • ? p (Ai)(R ? S) ? (? p (Ai)(R)) ? S
  • ? p (Ai)(R S) ? (? p (Ai)(R)) S
  • ? p (Ai)(R ? S) ? ? p (Ai)(R) ? ? p (Ai)(S)
  • Commuting projection with binary operations
  • ?C(R ? S) ? ?A(R) ? ?B (S) C A ? B
  • ?C(R S) ? ?C(R) ?C (S)
  • ?C (R ? S) ? ?C (R) ? ?C (S)

23
An SQL Query and Its Query Tree
?ENAME
?(ENAMEltgtJ.DOE )?(JNAMECAD/CAM )? (Dur12 ?
Dur24)
SELECT Ename FROM J, G, E WHERE G.EnoE.ENo
AND G.JNo J.JNo AND ENAME ltgt J.Doe'
AND JName CAD' AND (Dur12 or
Dur24)
PROJ
ASG
EMP
24
Query Decomposition Rewriting
?ENAME
? JNO, ENAME
? JNO
?JNAMECAD/CAM
? JNO, ENO
? ENO, ENAME
?Dur12 ? Dur24
?ENAMEltgtJ.DOE
PROJ
ASG
EMP
25
L05 Distributed Query Processing Optimization
  • Query Processing
  • Query Decomposition
  • Data Localization
  • Query Optimization

26
Data Localization
  • Localization program
  • Given an algebraic query on global schema
  • Determine which fragments are involved
  • A naïve way to localize a distributed query
  • Substitute each global relation with its
    localization program ? generic query
  • Use reduction techniques for efficiency

27
Reduction for HF
  • Remove empty relations generated by contradicting
    selection on horizontal fragments
  • Remove useless relations generated by projections
    on vertical fragments
  • Distribute joins over unions in order to isolate
    and remove useless joins

28
Data Localization-- An Example
EMP is fragmented into EMP1 ?ENO? E3
(EMP) EMP2 ? E3 lt ENO? E6 (EMP) EMP3 ?ENO
gtE6 (EMP)
?ENAME
?Dur12 ? Dur24
?ENAMEltgtJ.DOE
ASG is fragmented into ASG1 ?ENO? E3
(ASG) ASG2 ?ENO gtE3 (ASG)
?JNAMECAD/CAM
PROJ
?
?
ASG1
ASG1
ASG2
EMP1
EMP1
EMP1
29
Reduction with Selection
EMP is fragmented into EMP1 ?ENO? E3
(EMP) EMP2 ? E3 lt ENO? E6 (EMP) EMP3 ?ENO
gtE6 (EMP)
SELECT FROM EMP WHERE ENOE5
Given Relation R, FRR1, R2, , Rn where Rj
?pj(R) ?pj(Rj) ? if ?x ? R ?(pi(x)?pj(x))
30
Reduction with join
SELECT FROM EMP, ASG WHERE EMP.ENOASG.ENO
EMP is fragmented into EMP1 ?ENO? E3
(EMP) EMP2 ? E3 lt ENO? E6 (EMP) EMP3 ?ENO
gtE6 (EMP)
ASG is fragmented into ASG1 ?ENO? E3
(ASG) ASG2 ?ENO gtE3 (ASG)
31
Reduction with Join (I)
(R1 ? R2) S ? (R1 S) ? (R2
S)
32
Reduction with Join (II)
?
Given Ri ?pi(R) and Rj ?pj(R) Ri Rj ? if
?x ? Ri , ?y? Rj ?(pi(x)?pj(y))
Reduction with join 1. Distribute join over
union 2. Eliminate unnecessary work
33
Reduction for VF
  • Find useless intermediate relations
  • Relation R defined over attributes A A1, A2,
    , An vertically fragmented as Ri ?A (R) where
    A? A
  • ?K,D (Ri) is useless if the set of projection
    attributes D is not in A

EMP1 ?ENO,ENAME (EMP) EMP2 ?ENO,TITLE (EMP)
?ENAME
SELECT ENAME FROM EMP
EMP1
34
Reduction for DHF
Distribute joins over union Apply the join
reduction for horizontal fragmentation
EMP1 ?TITLEProgrammer (EMP) EMP2
?TITLE?Programmer (EMP) ASG1 ASG ENO
EMP1 ASG2 ASG ENO EMP2
SELECT FROM EMP, ASG WHERE ASG.ENO
EMP.ENO AND EMP.TITLE Mech. Eng.
35
Reduction for DHF (II)
Joins over union
?
36
Reduction for HF --An Example
EMP1 ?ENO?E4 (?ENO,ENAME (EMP)) EMP2
?ENOgtE4 (?ENO,ENAME (EMP)) EMP3 ?ENO,TITLE
(EMP) QUERY SELECT ENAME FROM EMP WHERE ENO
E5
37
Summary
  • Query decomposition is done as in a centralized
    DBMS
  • A lot of logic rules about transformations
  • Data localization happens only in a distributed
    DBMS
  • Use fragmentation characteristics to simplify the
    queries on fragments
Write a Comment
User Comments (0)
About PowerShow.com