Costbased Query Scrambling for Initial Delays - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Costbased Query Scrambling for Initial Delays

Description:

... a query plan containing only a single join using two unblocked relations. ... initial delay of a remote source vs. the response time achieved using scrambling ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 20

Provided by: dimitri66

Category:

more less

Transcript and Presenter's Notes

Title: Costbased Query Scrambling for Initial Delays

1
Cost-based Query Scrambling for Initial Delays

Tolga Urhan
Michael J.Franklin
Laurent Amsaleg
Advanced DB
AUEB MScIS

Pres.Giatrakos Nikos M3060007
2
Introduction

Problem response time unpredictability in wide-
area distributed information systems

Large number of remote data sources
Intermediate sites
Communication links

Vulnerable to congestion, failures which cause
random delays
Static a priori approaches of traditional
execution plans break down
3
Query Scrambling Solution

Key Idea hide unexpected delays by rescheduling
on the fly the operations of a query so as to
perform other useful work
Focus on Initial Delays
- delays in receiving the first tuple from a
particular remote source
Decision Making Approaches
- reduce total work
- reduce response time

4
Query Scrambling
Query Result
Site1
Communication Link
C
Select
Join
Site2
Site3
A
Site4
D
E
B

Rescheduling execution plan of a query is
dynamically rescheduled when delay is detected
Operator Synthesis new operators can be created
when there are no other operators that can
execute.

5
Query Scrambling - Scenario

Query stalls while retrieving tuples of A
Rescheduling Phase
-retrieve tuples of B
-Check A. Still not available
-Then D E and
C (D E)
-Check A. Still unavailable
Operator Synthesis Phase
- C (D E) B

waiting
Site1
Query Result
C
A
D
E
B
Site2
Site3
Site4

Remark Should a delay occurs in scrambling
operation, then scrambling is invoked further

6
Cost-based Rescheduling

Identify runnable subtrees subtrees made up
entirely of nonbocked operators.
Selection of runnable subtrees to execute
Traditional way choose maximal one.
MR The cost of reading the materialized
temporary result
MW The cost of writing the materialized
temporary result
P The cost of executing the subtree
Choose the one with Maximal efficiency (P -
MR)/(P MW)

How much work will be saved in the future by
scheduling that tree
The duration of the scrambled operation
7
Cost-based Operator Synthesis

Second phase starts when no more progress can be
made in phase 1.
Three approaches of optimization strategies
-Pair
-(IN) Include Delayed
-(ED) Estimated Delay

8
Cost-based Operator Synthesis - Pair

Construct a query plan containing only a single
join using two unblocked relations.
Analyzes each pair of unblocked relations sharing
a join predicate.
Chooses the join with the least total cost to
execute.
Materialize the results of the join to disk.
Avoids Cartesian products, joins whose produced
results take longer to read from disk than to
compute from scratch.

9
Cost-based Operator Synthesis - Pair

At the end of each join, checks for the arrival
of delayed data. If not arrived, do another
iteration
If no qualified joins exist, wait for delayed
data to arrive
Reconstruction phase
when all blocked relations become available, need
to construct a single query tree
necessary, since Pair policy works only on pairs
of relations and does not maintain a complete
query plan

10
Cost-based Operator Synthesis - IN

Each iteration generates a complete alternative
plan
Chooses a very long delay duration (relative to
response time) to postpone any access to the
delayed data.
Chooses a plan with the greatest benefit
(potential improvement in response time) whose
risk (duration of the optimization step) can be
overlapped with the expected delay duration.

11
Cost-based Operator Synthesis - IN

Use risk/benefit knob (Rbknob) to prevent
optimizer from choosing high-risk plans for
relatively small potential gains over low risk
plans.
Rbknob ratio of the amount of benefit the
optimizer is willing to give up for a given
savings in risk.
Increasing Rbknob - more conservative plans.

12
Cost-based Operator Synthesis - ED

Delay estimates successively increase when
necessary to make more progress
Motivation Use low risk plans when delays are
short, use high risk/high pay off plans for
larger delays.
Execution steps
Starts by picking an estimated delay value equal
to 25 of the original query response time
Repeat iterations until progress is too small
Increase delay value to 50 of response time
Increase to 100 of response time if progress is
still too small.

13
Experimental Setup

Two-phase randomized query optimizer
Workload based on queries from TPC-D benchmarks
Single query site, six remote data source sites.
Experimental methodology plots the duration of
initial delay of a remote source vs. the response
time achieved using scrambling

14
Experiment1
memorygt1000 memory300
15
Experiment2
memorygt1000 memory300
memory10000
memory1000
16
Experiment3
memory10000
17
Conclusions

With sufficient memory, all cost-based approaches
can effectively hide initial delays
Cost-based scrambling tradeoff between
conservative approaches and aggressive ones
As memory available for scrambling is reduced,
scrambling plans are more expensive
Aggressiveness of IN and ED policies can be
adjusted using Rbknob
Pair (total work-based optimizer) may perform
unnecessary work. Hence, response time based
optimizer should be preferred