Designing Distributed Requests - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Designing Distributed Requests

Description:

... the rows to be transmitted, get the values used in the join from the other table ... Use DepartmentName index to locate addresses of Car records ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 50
Provided by: Mik7253
Category:

less

Transcript and Presenter's Notes

Title: Designing Distributed Requests


1
Chapter 4 Designing Distributed Requests
2
Designing Distributed Requests
  • Global Request Optimization
  • Local Query Optimization
  • Summary

3
Global Request Optimization
4
Global Request Optimization Choices
  • Not done Inefficient request execution
  • Programmer optimization High personnel costs
  • Automatic optimization Additional computer time

5
Objectives
  • Maximize throughput
  • Minimize response time
  • Minimize query response time
  • Maximize parallelism
  • Minimize transmission costs
  • Minimize number of data transmissions
  • Minimize amount of data transmitted

6
Three Query Optimization Tasks
  • Choose query tree
  • Choose sites containing data to be accessed
  • Assign query tree operations to sites
  • Variations
  • Order of performing the three tasks
  • When the tasks are performed (compile time vs.
    runtime)
  • Algorithms and heuristics used to perform each
    task

7
Query Optimization Illustration
  • Basically
  • Join Customer, Product, and Order tables
  • Select rows with "widget" and "June 2004"
  • Project CustomerName

8
Notation for Query Trees
select from Productwhere ProductName
"widget" Select CustomerNamefrom
Customer select from Customer.Orderwhere
Customer.CustomerNumberOrder.CustomerNumber Scien
tist union Engineer
Productselect (ProductName "widget") Customer
project (CustomerName) customer Orderjoin
(CustomerNumberCustomerNumber) Scientist Enginee
r Union
9
Query Tree Alternatives
10
Order of Performing Tasks
  • Approach A Choose query tree first
  • Choose query tree
  • Choose sites containing data to be accessed
  • Assign query tree operations to sites
  • Approach B Choose sites first
  • Choose sites containing data to be accessed
  • Choose query tree
  • Assign query tree operations to sites
  • Approach C Choose sites and query trees at the
    same time
  • Choose sites containing data and choose query
    tree
  • Assign query tree operations to sites
  • Other approaches are possible

11
When to Perform Optimization Tasks
  • All three tasks at compile time
  • All three tasks at runtime
  • Some tasks at compile time and some at runtime

12
Algorithms for Choosing Query Trees
  • Generate all possible query trees and choose the
    best one.
  • Generate one query tree and repeatedly transform
    it into a better one.

13
Push Select and Project Toward Leaves
  • Relational Tables
  • SQL

More
14
Push Select and Project Toward Leaves, cont.
  • Query Tree

Customer Product Order join
(ProductNumberProductNumber) join
(CustomerNumberCustomerNumber) select
(ProductName "widget " and Date "June
2004") Push select toward
leaves Customer Product Order select
(ProductName "widget") select(Date "June
2004") join (ProductNumberProductNumber) join
(CustomerNumberCustomerNumber)
15
Incorporate Fragments into Query
  • Relational tables
  • SQL
  • Table reconstruction from fragments

More
16
Incorporate Fragments Into Query, cont.
17
Merge Common Subexpressions
  • Relational tables
  • SQL request

More
18
Merge Common Subexpressions, cont.
19
Trim Branches
  • Relational table
  • SQL request
  • Fragments
  • Table reconstruction from fragments
  • Query tree

Department Select (DeptId1)
Incorporate Fragments
More
20
Trim Branches, cont.
21
Distribute Aggregate Functions
  • Relational table
  • SQL request
  • Table reconstruction from fragments
  • Query tree

More
22
Distribute AggregateFunctions, cont.
23
Other Aggregate Functions
  • How can each of the following aggregate functions
    be distributed?
  • Min
  • Max
  • Count
  • Sum

24
Exercises on Algebraic Transformations
  • Using the Book and Employee tables on the next
    page, do the following
  • Write the SQL request
  • Draw the equivalent query tree
  • Incorporate fragments
  • Optimize the resulting operation tree

25
Employee Table Definition
Employee1 Select from Employee where Location
ltgt LA Employee2 Select from Employee where
Location LA Reconstruction Employee
Employee1 union Employee2
Book1 Select LCNumber, Type, Title from
Book Book2 Select LCNumber,
Cost Reconstruction Book Book1 join
(LCNumberLCNumber) Book 2
26
Decrease the Number of Table Accesses
  • Combine select and project operations on the same
    table into a single access
  • Combine select and project with a join involving
    the same table

27
Combining select and project
28
Combinbing select and join
Order Product Join (ProductIdProductId) sele
ct (Cost14.57) Order Product Select-join
(ProductIdProductId and Cost14.57)
29
Using Semijoin
  • Joining two tables at different sites requires
    that one table be transmitted to the other site
  • Expensive
  • Time consuming
  • Make the transmitted table smaller by
    transmitting only the rows that will be joined
  • To select the rows to be transmitted, get the
    values used in the join from the other table

30
Semijoin Definition
DepartmentDepartment Name Department
BudgetCar 450000000
  • A semijoin is used to reduce a relation in the
    same way that select and project are used to
    reduce a relation.

TempEmployee semijoin (departmentNameDepartment
Name) Department
31
Using Semijoin to Performa Join Across Sites
  • To perform select
  • from A, B where A.xB.x where A and B are
    different sites
  • Temp1 project (x) on Table B at site of B
  • Send Temp1 to the site where A is located
  • Temp2 A semijoin (xx) Temp1 at site of A
  • Send Temp2 to the site where B is located
  • Compute Answer Temp2 join (xx) at site of B

32
Optimizing Joins Across Sites
DepartmentDepartment Name Department
BudgetCar 450000000
  • Which is best?
  • (Employee semijoin Department) join Department
  • (Department semijoin Employee) join Employee
  • Employee join Department at the site of Employee
  • Employee join Department at the site of Department

33
Using the Semijoin Approach
  • How can the semijoin concept be applied to each
    of the following operations?
  • Table whose rows are in both Tables A and B (A
    intersect B).
  • Table whose rows are in Table A but not in Table
    B (A minus B).
  • Table whose rows are in Table A or Table B but
    not in both Tables A and B.
  • Table whose rows are in Table A or Table B or in
    both Tables A and B (A union B).

34
Additional Complexities Introduced by the
Heterogeneous Environment
  • Cost of performing a local query may differ
    greatly from site to site.
  • The ability to receive and reference moved data
    may vary from site to site many DBMSs do not
    support creation and loading of temporary
    databases.
  • Local query capabilities may be limited.
  • Local DBMSs vary in their abilities to optimize.

35
Global RequestOptimization Summary
  • In distributed DBMSs, optimization often produces
    savings.
  • Programmers and users may optimize requests.
  • Automatic optimizers hide fragments and their
    locations from programmers and users.

36
Designing Distributed Requests
  • Global Request Optimization
  • Local Query Optimization
  • Summary

37
Local QueryOptimization
38
Introduction to Local Query Optimization
  • Definition Local query optimization is the
    selection of an access procedure for obtaining
    data from tables stored at a single site
  • If a local optimizer is available, then a local
    optimizer (usually the local DBMS) selects the
    local access procedures
  • Global optimizer is relatively simple
  • If a local optimizer is not available, then a
    global optimizer selects the local access
    procedures
  • Global optimizer is relatively complex

39
Local Optimization of a Simple select Command
40
Cases
  • Linear search
  • Records not ordered by DepartmentName
  • Binary search
  • Records ordered by DepartmentName
  • Direct index
  • DepartmentName is an index on Employee table.
  • Indirect index
  • DepartmentName is an index to the list of
    pointers to EmployeeName.

41
Local Optimization of select With and Condition
Select from Employee where DepartmentName"Car"
and Salary4500
42
Cases
  • Linear search
  • Neither attribute is an index.
  • Use single index and check
  • If DepartmentName is an index, use it to locate
    all Car records check each record to determine
    if Salary4500.
  • Use two indexes and intersect
  • Use DepartmentName index to locate addresses of
    Car records
  • Use Salary index to locate addresses of 4500
    records
  • Intersect the two sets of addresses
  • Retrieve records

43
Local Optimization of select With or Condition
44
Cases
  • Linear search
  • If none or one of the attributes is an index.
  • Use two indexes and union
  • Use DepartmentName index to locate addresses of
    Car records
  • Use Salary index to locate addresses of 4500
    records
  • Union the two sets of addresses
  • Retrieve records

45
Local Optimization of Binary Operations Example
Join
Select from Department, Employee where
Department.DepartmentName Employee.DepartmentNam
e
46
Cases
  • Nested (inner-outer) loop
  • For each record in Employee, retrieve every
    record in Department and test whether they match.
  • Use access structure to retrieve matching records
  • For each record in Employee, use the index to
    DepartmentName to retrieve matching records in
    Department.
  • Sort-merge join
  • If both Employee and Department are physically
    sorted by DepartmentName, then scan both files at
    the same time for matching records

More
47
Cases, cont.
  • Hash join
  • Hash both Employee and Department to the same
    hash file using the same hashing function applied
    to DepartmentName
  • Examine buckets for Employee and Department
    records that match
  • Chase links
  • For each Department record, follow a circularly
    linked list of Employee records

48
Designing Distributed Requests
  • Global Request Optimization
  • Local Query Optimization
  • Summary

49
Summary
  • Many options are usually possible for each query.
  • Take advantage of local access structures
    (indexes, links, etc.) whenever possible.
  • Complex algorithms and heuristics have been
    implemented in most centralized DBMSs use them
    whenever possible.
Write a Comment
User Comments (0)
About PowerShow.com