Costbased Query Scrambling for Initial Delays - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Costbased Query Scrambling for Initial Delays

Description:

... a query plan containing only a single join using two unblocked relations. ... initial delay of a remote source vs. the response time achieved using scrambling ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 20
Provided by: dimitri66
Category:

less

Transcript and Presenter's Notes

Title: Costbased Query Scrambling for Initial Delays


1
Cost-based Query Scrambling for Initial Delays
  • Tolga Urhan
  • Michael J.Franklin
  • Laurent Amsaleg
  • Advanced DB
  • AUEB MScIS

Pres.Giatrakos Nikos M3060007
2
Introduction
  • Problem response time unpredictability in wide-
  • area distributed information systems
  • Large number of remote data sources
  • Intermediate sites
  • Communication links

Vulnerable to congestion, failures which cause
random delays
Static a priori approaches of traditional
execution plans break down
3
Query Scrambling Solution
  • Key Idea hide unexpected delays by rescheduling
    on the fly the operations of a query so as to
    perform other useful work
  • Focus on Initial Delays
  • - delays in receiving the first tuple from a
    particular remote source
  • Decision Making Approaches
  • - reduce total work
  • - reduce response time

4
Query Scrambling
Query Result
Site1
Communication Link
C
Select
Join
Site2
Site3
A
Site4
D
E
B
  • Rescheduling execution plan of a query is
    dynamically rescheduled when delay is detected
  • Operator Synthesis new operators can be created
    when there are no other operators that can
    execute.

5
Query Scrambling - Scenario
  • Query stalls while retrieving tuples of A
  • Rescheduling Phase
  • -retrieve tuples of B
  • -Check A. Still not available
  • -Then D E and
  • C (D E)
  • -Check A. Still unavailable
  • Operator Synthesis Phase
  • - C (D E) B

waiting
Site1
Query Result
C
A
D
E
B
Site2
Site3
Site4
  • Remark Should a delay occurs in scrambling
    operation, then scrambling is invoked further

6
Cost-based Rescheduling
  • Identify runnable subtrees subtrees made up
    entirely of nonbocked operators.
  • Selection of runnable subtrees to execute
  • Traditional way choose maximal one.
  • MR The cost of reading the materialized
    temporary result
  • MW The cost of writing the materialized
    temporary result
  • P The cost of executing the subtree
  • Choose the one with Maximal efficiency (P -
    MR)/(P MW)

How much work will be saved in the future by
scheduling that tree
The duration of the scrambled operation
7
Cost-based Operator Synthesis
  • Second phase starts when no more progress can be
    made in phase 1.
  • Three approaches of optimization strategies
  • -Pair
  • -(IN) Include Delayed
  • -(ED) Estimated Delay

8
Cost-based Operator Synthesis - Pair
  • Construct a query plan containing only a single
    join using two unblocked relations.
  • Analyzes each pair of unblocked relations sharing
    a join predicate.
  • Chooses the join with the least total cost to
    execute.
  • Materialize the results of the join to disk.
  • Avoids Cartesian products, joins whose produced
    results take longer to read from disk than to
    compute from scratch.

9
Cost-based Operator Synthesis - Pair
  • At the end of each join, checks for the arrival
    of delayed data. If not arrived, do another
    iteration
  • If no qualified joins exist, wait for delayed
    data to arrive
  • Reconstruction phase
  • when all blocked relations become available, need
    to construct a single query tree
  • necessary, since Pair policy works only on pairs
    of relations and does not maintain a complete
    query plan

10
Cost-based Operator Synthesis - IN
  • Each iteration generates a complete alternative
    plan
  • Chooses a very long delay duration (relative to
    response time) to postpone any access to the
    delayed data.
  • Chooses a plan with the greatest benefit
    (potential improvement in response time) whose
    risk (duration of the optimization step) can be
    overlapped with the expected delay duration.

11
Cost-based Operator Synthesis - IN
  • Use risk/benefit knob (Rbknob) to prevent
    optimizer from choosing high-risk plans for
    relatively small potential gains over low risk
    plans.
  • Rbknob ratio of the amount of benefit the
    optimizer is willing to give up for a given
    savings in risk.
  • Increasing Rbknob - more conservative plans.

12
Cost-based Operator Synthesis - ED
  • Delay estimates successively increase when
    necessary to make more progress
  • Motivation Use low risk plans when delays are
    short, use high risk/high pay off plans for
    larger delays.
  • Execution steps
  • Starts by picking an estimated delay value equal
    to 25 of the original query response time
  • Repeat iterations until progress is too small
  • Increase delay value to 50 of response time
  • Increase to 100 of response time if progress is
    still too small.

13
Experimental Setup
  • Two-phase randomized query optimizer
  • Workload based on queries from TPC-D benchmarks
  • Single query site, six remote data source sites.
  • Experimental methodology plots the duration of
    initial delay of a remote source vs. the response
    time achieved using scrambling

14
Experiment1
memorygt1000 memory300
15
Experiment2
memorygt1000 memory300
memory10000
memory1000
16
Experiment3
memory10000
17
Conclusions
  • With sufficient memory, all cost-based approaches
    can effectively hide initial delays
  • Cost-based scrambling tradeoff between
    conservative approaches and aggressive ones
  • As memory available for scrambling is reduced,
    scrambling plans are more expensive
  • Aggressiveness of IN and ED policies can be
    adjusted using Rbknob
  • Pair (total work-based optimizer) may perform
    unnecessary work. Hence, response time based
    optimizer should be preferred

18
(No Transcript)
19
Thank you
Write a Comment
User Comments (0)
About PowerShow.com