Title: Designing Distributed Requests
1Chapter 4 Designing Distributed Requests
2Designing Distributed Requests
- Global Request Optimization
- Local Query Optimization
- Summary
3Global Request Optimization
4Global Request Optimization Choices
- Not done Inefficient request execution
- Programmer optimization High personnel costs
- Automatic optimization Additional computer time
5Objectives
- Maximize throughput
- Minimize response time
- Minimize query response time
- Maximize parallelism
- Minimize transmission costs
- Minimize number of data transmissions
- Minimize amount of data transmitted
6Three Query Optimization Tasks
- Choose query tree
- Choose sites containing data to be accessed
- Assign query tree operations to sites
- Variations
- Order of performing the three tasks
- When the tasks are performed (compile time vs.
runtime) - Algorithms and heuristics used to perform each
task
7Query Optimization Illustration
- Basically
- Join Customer, Product, and Order tables
- Select rows with "widget" and "June 2004"
- Project CustomerName
8Notation for Query Trees
select from Productwhere ProductName
"widget" Select CustomerNamefrom
Customer select from Customer.Orderwhere
Customer.CustomerNumberOrder.CustomerNumber Scien
tist union Engineer
Productselect (ProductName "widget") Customer
project (CustomerName) customer Orderjoin
(CustomerNumberCustomerNumber) Scientist Enginee
r Union
9Query Tree Alternatives
10Order of Performing Tasks
- Approach A Choose query tree first
- Choose query tree
- Choose sites containing data to be accessed
- Assign query tree operations to sites
- Approach B Choose sites first
- Choose sites containing data to be accessed
- Choose query tree
- Assign query tree operations to sites
- Approach C Choose sites and query trees at the
same time - Choose sites containing data and choose query
tree - Assign query tree operations to sites
- Other approaches are possible
11When to Perform Optimization Tasks
- All three tasks at compile time
- All three tasks at runtime
- Some tasks at compile time and some at runtime
12Algorithms for Choosing Query Trees
- Generate all possible query trees and choose the
best one. - Generate one query tree and repeatedly transform
it into a better one.
13Push Select and Project Toward Leaves
More
14Push Select and Project Toward Leaves, cont.
Customer Product Order join
(ProductNumberProductNumber) join
(CustomerNumberCustomerNumber) select
(ProductName "widget " and Date "June
2004") Push select toward
leaves Customer Product Order select
(ProductName "widget") select(Date "June
2004") join (ProductNumberProductNumber) join
(CustomerNumberCustomerNumber)
15Incorporate Fragments into Query
- Relational tables
- SQL
- Table reconstruction from fragments
More
16Incorporate Fragments Into Query, cont.
17Merge Common Subexpressions
- Relational tables
- SQL request
More
18Merge Common Subexpressions, cont.
19Trim Branches
- Relational table
- SQL request
- Fragments
- Table reconstruction from fragments
- Query tree
Department Select (DeptId1)
Incorporate Fragments
More
20Trim Branches, cont.
21Distribute Aggregate Functions
- Relational table
- SQL request
- Table reconstruction from fragments
- Query tree
More
22Distribute AggregateFunctions, cont.
23Other Aggregate Functions
- How can each of the following aggregate functions
be distributed? - Min
- Max
- Count
- Sum
24Exercises on Algebraic Transformations
- Using the Book and Employee tables on the next
page, do the following - Write the SQL request
- Draw the equivalent query tree
- Incorporate fragments
- Optimize the resulting operation tree
25Employee Table Definition
Employee1 Select from Employee where Location
ltgt LA Employee2 Select from Employee where
Location LA Reconstruction Employee
Employee1 union Employee2
Book1 Select LCNumber, Type, Title from
Book Book2 Select LCNumber,
Cost Reconstruction Book Book1 join
(LCNumberLCNumber) Book 2
26Decrease the Number of Table Accesses
- Combine select and project operations on the same
table into a single access - Combine select and project with a join involving
the same table
27Combining select and project
28Combinbing select and join
Order Product Join (ProductIdProductId) sele
ct (Cost14.57) Order Product Select-join
(ProductIdProductId and Cost14.57)
29Using Semijoin
- Joining two tables at different sites requires
that one table be transmitted to the other site - Expensive
- Time consuming
- Make the transmitted table smaller by
transmitting only the rows that will be joined - To select the rows to be transmitted, get the
values used in the join from the other table
30Semijoin Definition
DepartmentDepartment Name Department
BudgetCar 450000000
- A semijoin is used to reduce a relation in the
same way that select and project are used to
reduce a relation.
TempEmployee semijoin (departmentNameDepartment
Name) Department
31Using Semijoin to Performa Join Across Sites
- To perform select
- from A, B where A.xB.x where A and B are
different sites - Temp1 project (x) on Table B at site of B
- Send Temp1 to the site where A is located
- Temp2 A semijoin (xx) Temp1 at site of A
- Send Temp2 to the site where B is located
- Compute Answer Temp2 join (xx) at site of B
32Optimizing Joins Across Sites
DepartmentDepartment Name Department
BudgetCar 450000000
- Which is best?
- (Employee semijoin Department) join Department
- (Department semijoin Employee) join Employee
- Employee join Department at the site of Employee
- Employee join Department at the site of Department
33Using the Semijoin Approach
- How can the semijoin concept be applied to each
of the following operations?
- Table whose rows are in both Tables A and B (A
intersect B). - Table whose rows are in Table A but not in Table
B (A minus B). - Table whose rows are in Table A or Table B but
not in both Tables A and B. - Table whose rows are in Table A or Table B or in
both Tables A and B (A union B).
34Additional Complexities Introduced by the
Heterogeneous Environment
- Cost of performing a local query may differ
greatly from site to site. - The ability to receive and reference moved data
may vary from site to site many DBMSs do not
support creation and loading of temporary
databases.
- Local query capabilities may be limited.
- Local DBMSs vary in their abilities to optimize.
35Global RequestOptimization Summary
- In distributed DBMSs, optimization often produces
savings. - Programmers and users may optimize requests.
- Automatic optimizers hide fragments and their
locations from programmers and users.
36Designing Distributed Requests
- Global Request Optimization
- Local Query Optimization
- Summary
37Local QueryOptimization
38Introduction to Local Query Optimization
- Definition Local query optimization is the
selection of an access procedure for obtaining
data from tables stored at a single site
- If a local optimizer is available, then a local
optimizer (usually the local DBMS) selects the
local access procedures - Global optimizer is relatively simple
- If a local optimizer is not available, then a
global optimizer selects the local access
procedures - Global optimizer is relatively complex
39Local Optimization of a Simple select Command
40Cases
- Linear search
- Records not ordered by DepartmentName
- Binary search
- Records ordered by DepartmentName
- Direct index
- DepartmentName is an index on Employee table.
- Indirect index
- DepartmentName is an index to the list of
pointers to EmployeeName.
41Local Optimization of select With and Condition
Select from Employee where DepartmentName"Car"
and Salary4500
42Cases
- Linear search
- Neither attribute is an index.
- Use single index and check
- If DepartmentName is an index, use it to locate
all Car records check each record to determine
if Salary4500. - Use two indexes and intersect
- Use DepartmentName index to locate addresses of
Car records - Use Salary index to locate addresses of 4500
records - Intersect the two sets of addresses
- Retrieve records
43Local Optimization of select With or Condition
44Cases
- Linear search
- If none or one of the attributes is an index.
- Use two indexes and union
- Use DepartmentName index to locate addresses of
Car records - Use Salary index to locate addresses of 4500
records - Union the two sets of addresses
- Retrieve records
45Local Optimization of Binary Operations Example
Join
Select from Department, Employee where
Department.DepartmentName Employee.DepartmentNam
e
46Cases
- Nested (inner-outer) loop
- For each record in Employee, retrieve every
record in Department and test whether they match. - Use access structure to retrieve matching records
- For each record in Employee, use the index to
DepartmentName to retrieve matching records in
Department.
- Sort-merge join
- If both Employee and Department are physically
sorted by DepartmentName, then scan both files at
the same time for matching records
More
47Cases, cont.
- Hash join
- Hash both Employee and Department to the same
hash file using the same hashing function applied
to DepartmentName - Examine buckets for Employee and Department
records that match
- Chase links
- For each Department record, follow a circularly
linked list of Employee records
48Designing Distributed Requests
- Global Request Optimization
- Local Query Optimization
- Summary
49Summary
- Many options are usually possible for each query.
- Take advantage of local access structures
(indexes, links, etc.) whenever possible. - Complex algorithms and heuristics have been
implemented in most centralized DBMSs use them
whenever possible.