Title: Optimization of Distributed Queries
1Optimization of Distributed Queries
- Univ.-Prof. Dr. Peter Brezany
- Institut für Scientific Computing
- Universität Wien
- Tel. 4277 39425
- Sprechstunde Di, 13.00-14.00
- LV-Portal www.par.univie.ac.at/brezany/teach/gck
fk/300658.html
2Layers of Query Processing
3Query Optimization Process
4Search Space
- For a given query, the search space can be
defined as the set of equivalent operator trees,
that can be produced using transformation rules. - It is useful to concentrate on join trees,
operator trees whose operators are join or
Cartesian product. - This is because permutations of the join order
have the most important effect on performance of
relational queries. - Next example illustrates 3 equivalent join trees,
which are obtained by exploiting the
associativity of binary operators. Join tree (c)
which starts with a Cartesian product may have a
much higher cost than other join trees.
5Search Space - Example
Example SELECT ENAME, RESP FROM EMP,
ASG, PROJ WHERE EMP.ENOASG.ENO AND ASG.PNOP
ROJ.PNO
6Search Space Shape of the Join Tree
A linear tree at least one operand of each
operand node is a base relation. A bushy tree is
more general and may have operators whose both
operands are intermediate operators. In a
distributed environment, bushy trees are useful
in exhibiting parallelism.
7Distributed Cost Model
- An optimizers cost model includes
- Cost functions to predict the cost of operators
- Statistics and base data and formulas to evaluate
the sizes of intermediate results. - Cost Functions can be expressed with respect to
either the total time or the response time.The
total time is the sum of all time (cost)
components, the response time is the elapsed time
from the initiation to the completion of the
query. - Total_time TCPU insts TI/O I/Os TMSG
msgs TTR bytes - TCPU the time of a CPU instruction
- TI/O - the time of a disk I/O
- TMSG - the fixed time of initiating and receiving
a message - TTR - the time it takes to transmit a data unit
from one site to another - Costs are generally expressed in terms of time
units, which in turn, can be translated into
other units (e.g., dollars).
8Cost Function (cont.)
- When the response time of the query is the
objective function of the optimizer, parallel
local processing and parallel communications must
also be considered. - Response_time TCPU seq_insts TI/O
seq_I/Os TMSG seq_msgs TTR seq_bytes - Example
Site 1
x units
Site 3
Site 2
y units
Most early distributed DBMSs designed for wide
area networks have ignored the local processing
cost and concentrate on minimizing the
communication cost. Total_time 2 TMSG TTR
(x y) Respone_time max TMSG TTR x,
TMSG TTR y since the transfers can be done
in parallel.
9Cost Function (cont.)
- Minimizing response time is achieved by
increasing the degree of parallel execution. - This does not imply that the total time is also
minimized. - On contrary, it can increase the total time, for
example by having more parallel local processing
(often includes synchronization overhead) and
transmissions. - Minimizing the total time implies that the
utilization of the resources improves, thus
increasing the system throughput. - In practice, a compromise between the total and
response times is desired.
10Database Statistics
- The main factor affecting the performance is the
size of the intermediate relations that are
produced during the execution. - When a subsequent operation is located at a
different site, the intermediate relation must be
transmitted over the network. ? It is of prime
interest to estimate the size of the intermediate
results in order to minimize the size of data
transfers. - The estimation is based on statistical
information about the base relations and formulas
to predict the cardinalities of the results of
the relational operations. ? the more precise
statistics being the more costly. - For a relation R defined over the attributes A
A1, A2, ..., An and fragmented as R1, R2, ...,
Rr, the statistical data are the following
11Database Statistics (cont.)
12Database Statistics (cont.)
13Database Statistics (cont.)
14Database Statistics (cont.)
15Centralized Query Optimization- INGRES Algorithm
- Why reviewing centralized optimizations?
- A distributed query is is translated into local
queries, each of which is processed in a
centralized way. - Distributed techniques are extensions of
centralized ones - Centralized optimization is a simpler problem
the minimization of communication costs makes
distributed query optimization more complex. - INGRES is a popular relational DB system and it
has a distributed version whose optimization
algorithms are extensions of the centralized
version.
16INGRES Algorithm
- It uses a dynamic query optimization algorithm
that recursively breaks-up a calculus query into
smaller pieces. - It combines calculus-algebra decomposition and
optimization. - A query is first decomposed into a subsequence of
queries having a unique relation in common. Then
each monorelation query is processed by a
one-variable query processor (OVQP). - The OVQP optimizes the access to a single
relation by selecting the best access method to
that relation (e.g., index, sequential scan).
17INGRES Algorithm (cont.)
- By q qi-1 ? qi we denote a query q decomposed
into 2 subqueries qi-1 and qi, where qi-1 is
executed first and its result is consumed by qi. - Given an n-relation query q, the INGRES query
processor decomposes q into n subqueries q1 ? q2
? ... ? qi. This decomposition uses two basic
techniques detachment and substitution. - Detachment is used first it breaks q into q
? q, based on a common relation that is the
result of q. - A more detailed explanation and examples
follow.
18INGRES Algorithm - Detachment
If the query q is expressed in SQL is of the form
19Database Example
20INGRES Algorithm Detachment - Example
Running Example for INGRES To illustrate the
detachment technique, we apply it to the
following query Names of employees working
on the CAD/CAM project This query can be
expressed by the following query q1 on our
example engineering DB.
21INGRES Algorithm - Substitution
22INGRES Algorithm Substitution - Example
OVQP one-variable query processor
23INGRES Algorithm (formalized)
24JOIN ORDERING IN FRAGMENT QUERIES
- Ordering joins is an important aspect of
centralized query optimization. - Join ordering in a distributed context is even
more important since joins between fragments may
increase the communication time. - 2 basic approaches exist to order joins in
fragment queries - direct optimization of the ordering of joins
(e.g. in the Distributed INGRES algorithm). - replacement of joins by combination of semijoins
in order to minimize communication costs.
25Join Ordering
- Note To simplify notation, we use the term
relation to designate a fragment stored at a
particular site. - Lets the query is R ? S, where R and S are
relations stored at different sites and ? denotes
the join operator. The obvious choice is to send
the smaller relation to the site of the larger
one.
if size(R) lt size(S)
R
S
if size(R) gt size(S)
More interesting is the case where there are more
than 2 relations to join. The objective of the
join ordering algorithm is to transmit smaller
operands. The difficulty the join operations may
reduce or increase the size of intermediate
results ? estimating the size of joint results
is mandatory, but difficult.
26Join Ordering - Example
Example Consider the following query expressed
in relat. alg. PROJ ?PNO
EMP ?ENO ASG whose join graph is below
Site 2
ASG
ENO
PNO
EMP
PROJ
Site 3
Site 1
This query can be executed in at least 5
different ways. We describe them by the programs
introduced in the next slide.
27Join Ordering Example (cont.)
28Semijoin Based Algorithms
- The join of 2 relations R and S over attribute A,
stored at sites 1 and 2 respectively, can be
computed by replacing one or both operand
relations by a semijoin with the other relation,
using the following rules
Strategy 1
Strategy 1
29Horizontal fragmentation
30Vertical fragmentation
31Hybrid (Mixed) Fragmentation
In most cases a simple horizontal or vertical
fragmentation will not be sufficient to satisfy
the requirements of user applications. In this
case a vertical fragmentation may be followed by
a horizontal one, or vice versa, producing a
tree-structured partitioning.
Hybrid fragmentation
Reconstruction of hybrid fragmentation
32Hybrid Fragmentation (cont.)
Relation A
Fragment H1
Fragment H2V2
Fragment H2V1H1
Fragment H2V1H2
Fragment H2V1H3
33Distributed INGRES Algorithm
- Only horizontal fragmentation is handled.
- General and broadcast (the same data unit can be
transmitted from one site to all the other sites
in a simple transfer) network topology
considered. - E.g., broadcasting is used to replicate fragments
and then to maximize the degree of parallelism. - The input is a query expressed in tuple
relational calculus and schema information the
network type the location and size of each
fragment. - The algorithm is executed by the site, called the
master site, where the query is initiated. - The algorithm called D-INGRES-QOA is given in
Algorithm 9.3. in the next slide.
34Distributed INGRES Algorithm
MRQ_list
35Distributed INGRES Algorithm (cont.)
- All monorelation queries (e.g. selection and
projection) that can be detached are first
processed locally Step (1) - Reduction algorithm is applied to the original
query Step (2). (Reduction is a technique that
isolates all irreducible subqueries and
monorelation subqueries by detachment.)
Monorelation queries are ignored because they
have already been processed in step (1). Thus the
REDUCE procedure produces a sequence of
irreducible subqueries q1 ? q2 ? ... ? qn, with
at most one relation in common between two
consecutive subqueries. Our Running Example in
the previous slides, which illustrated the
detachment technique, also illustrates what the
REDUCE procedure would produce. - Based on the list of irreducible queries isolated
in step (2) and the size of each fragment, the
next subquery MRQ, which has at least 2
variables, is chosen at Step (3.1) and Steps
(3.2), (3.3), and 3.4 are applied to it. - Step 3.2 selects the best strategy to process the
query MRQ. This strategy is described by a list
of pairs (F, S), in which F is a fragment to
transfer to the processing site S. - Step 3.3 transfers all the fragments to their
processing sites. - Step 3.4 executes MRQ.
- If there are remaining subqueries, the algorithm
goes back to step (3) and performs the next
iteration. Otherwise, the algorithm terminates.
36Distributed INGRES Algorithm (cont.)
- Optimization occurs in steps (3.1) and (3.2). The
algorithm has produced subqueries with several
components and their dependency order (similar to
one given by a relational algebra tree). - At step (3.1) a simple choice for the next
subquery is to take the next one having no
predecessor and involving the smaller fragments.
This minimize the size of the intermediate
results. - E.g., if a query q has the subqueries q1, q2, and
q3, with dependencies q1 ? q3, q2 ? q3, and if
the fragments referred to by q1 are smaller than
those referred to by q2, then q1 is selected. - The subquery selected must then be executed.
Since the relation involved in a subquery may be
stored at different sites and even fragmented,
the subquery may nevertheless be further
subdivided.
37Distributed INGRES Algorithm (cont.) - Example
Assume that relations EMP, ASG, and PROJ of the
query of our Running Example are stored as
follows, where relation EMP is fragmented.
38Distributed INGRES Algorithm (cont.)
- At step (3.2), the next optimization problem is
to determine how to execute the subquery by
selecting the fragments that will be moved and
the sites where the processing will take place. - For an n-relation subquery, fragments from n-1
relations must be moved to the site(s) of
fragments of the remaining relation, say Rp, and
then replicated there. Also, the remaining
relation may be further partitioned into k
equalized fragments in order to increase
parallelism. This method is called
fragment-and-replicate and performs a
substitution of fragments rather than of tuples
as in centralized INGRES. - The selection of the remaining relation and of
the number of processing sites k on which it
should be partitioned is based on the objective
function and the topology of the network.
(Replication is cheaper in broadcast networks
than in point-to-point networks).
39Distributed INGRES Algorithm (cont.) - Example
40Architecture of Distributed DBMS Revisited
Detailed Model of the Distributed
Execution Monitor
Components of a Distributed DBMS
41Architecture Revisited (cont.)
- The Transaction Manager (TM) is responsible for
coordinating the execution of the DB operations
on behalf of an application. - The Scheduler (SC) is responsible for the
implementation of a specific concurrency control
algorithm for synchronizing access to the
database. - Each transaction originates at one site its
originating site. The execution of the database
operations of a transaction is coordinated by the
TM at that transactions originating site. - The TMs implement an interface for the
applications programs which consists of 5
commands - Begin_transaction. This is an indicator to the TM
that a new transaction is starting. The TM does
some bookkeeping, such as recording the
transactions name, the originating application,
etc. - Read. If the data item x is stored locally, ist
value is read and returned to the transaction.
Otherwise, the TM selects one copy of x and
requests ist copy to be returned. - Write. The TM coordinates the updating ofn xs
value at each site where it resides. - Commit. The TM coordinates the physical updating
of all databases that contain copies of each data
item for which a previous write was issued. - Abort. The TM makes sure that no effects of the
transaction are reflected in the DB. - In providing these services, a TM can
communicate with SCs and data processors at the
same or at different sites.