Title: Network-Aware Query Processing for Stream-based Application
1Network-Aware Query Processing for Stream-based
Application
- Yanif Ahmad, Ugur Cetintemel
- Brown University
- VLDB 2004
2One-line Comments
- This paper is addressing the operator placement
problem in distributed query processing by using
network latency information
3Contents
- Motivation
- Problem
- Solution Approach
- Central Version of Algorithm
- Edge
- Edge
- In-Network
- latency Constrained
- Distributed Version of Algorithm
- Experiment
- Critique
4Motivation
- Small scale query processing system Not-scalable
- A lot of data stream query request
- Widely-distributed query processing
5Problem
- Operator placement problem
- Operators in query processing trees should be
dispersed into the network
operator
node
Application node
O00
O00
O20
O21
O10
O10
O11
O11
O22
O23
O24
O20
O22
O21
O23
O25
O24
O26
O26
O25
Processing tree (query plan)
IP network
6Problem formalized version
- Operator placement problem
- For efficient operator placement
- Cost Bandwidth
O operators A their connected inputs
outputs V nodes E their links C() link cost,
bandwidth
m
a
c(a)
n
7Solution Approach
- Network-aware operator placement algorithms
- Edge
- Consider only sources and the proxy location
- Edge
- Edge with pair-wise server communication
latencies - In-Network
- Sources, proxy, a subset of all locations
- Latency-bound algorithm
8Contents
- Motivation
- Problem
- Solution Approach
- Central Version of Algorithm
- Distributed Version of Algorithm
- Experiment
- Critique
9Algorithm Design Principle
- Naïve algorithm for operator placement
- Calculate all the combination of possible mapping
- gt Too complex
- Greedy algorithm
- Calculate only for the locations of having high
possibility - Locate operators in post-order
- When we put a operator at a location, we can move
by its children
operator
node
Application node
O00
O10
O11
S0
O20
O22
O21
O23
O25
O24
O26
S1
Processing tree
IP network
10Mapping Function
11Edge
- Location candidate sources, proxy
- Candidate with high possibility
- (1) One of childrens locations
- (2) A common location
- (3) Proxys location
- Link cost
12Edge (1) One of childrens locations
- A location that maximizes the total tree cost
between the operator and all of its children
O00
O10
O10
O12
O11
O20
O21
O22
O20
O22
O21
O23
O25
O24
O26
O27
O29
O28
S0
S1
S0
S0
S1
S0
S1
S1
S2
S0
S1
S1
Processing tree
13Edge (2) A common location
- Idea
- Placing an operator and its children at a common
location - -gt zero overlay cost between the operator and its
children - Common location (cl)
- Good place for all its children
- -gt an intersection of each childs dl (the set of
descendant leaf locations)
O00
dl(O11)S0, S1, S2 cl(O00)S0, S1
O10
O12
O11
O20
O22
O21
O23
O25
O24
O26
O27
O29
O28
S0
S0
S1
S0
S1
S1
S2
S0
S1
S1
14Edge (3) Proxys location
- Idea
- If tree costs are higher near the root
- -gt proxy location, r
O00
O10
O12
O11
O20
O22
O21
O23
O25
O24
O26
O27
O29
O28
S0
S0
S1
S0
S1
S1
S2
S0
S1
S1
15Edge Summary
16Edge
- Location candidate sources, proxy
- Edge with network latency (d) between two
locations - Link cost
17In-Network Placement
- Location candidate arbitrary locations
(including sources and proxy) - Overlay cost and mapping function is the same as
Edge
- Problem reducing the candidate location set
18In-Network Placement
- Approach
- Remove the location unless its distance to all
current child placements is less than all
pairwise distances between child placements
N2
O11
O00
40
20
N7
60
O00
N8
O10
O12
O11
30
O12
30
N4
50
O10
19Latency-Constrained Placement
- Find the configuration satisfying the
latency-constrained - Latency-constrained
P a set of leaf-to-root paths
o
If l75
N7
O
ci
30
N4
ci
N5
30
O
O20
O21
20
50
S0
O20
O22
O21
O22
S0
S0
S1
S0
S1
S1
S2
S0
S1
S1
S1
20Contents
- Motivation
- Problem
- Solution Approach
- Central Version of Algorithm
- Distributed Version of Algorithm
- Experiment
- Critique
21Distributed Query Placement
- Reason
- Centralized approach not scalable
- Substantial network state
- Algorithm complexity
22Distributed Query Placement
- Application proxy
- Partition a processing tree into subtrees (zones)
- Assign each zone to a coordinator node
O1
O2
O3
O4
Processing tree
23Distributed Query Placement
Tree Overlay
24Experiment
- Experimental Setup
- Processing Tree
- Binary tree
- Depth 3 5
- Network Topology
- Max pair-wise path delay 500ms
- Server and proxy location
- Uniform APD ASD
- Star APD 0.5ASD
- Cluster APD 2ASD
APD Average Proxy Distance ASD Average Server
Distance
Server
Proxy, Uniform
Proxy, Cluster
Proxy, Star
25Experiment
- Latency constraints
- 120ms (0.9nd, tight delay) vs. 300ms (2.2nd,
loose delay) - Direct comparison
- Baseline case all operators are located at the
proxy - Result
Bandwidth consumption
Latency stretch
26Critique
- Pros
- Operator placement problem
- Focus on network-related cost not processing cost
(BW, latency) - Cons
- High complexity algorithm possible to apply?
- Heavy processing
- Too much time taken to complete the placement
- Latency information of many places is needed
- Sequential convergence in a bottom-up manner
- gt impossible to use in case of complex query
plan topology - gt more simple algorithm is appropriate
- Dynamic?
- Unresilient to Dynamic topology change
- In case of node leave, latency change