Network-Aware Query Processing for Stream-based Application - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Network-Aware Query Processing for Stream-based Application

Description:

This paper is addressing the operator placement problem in ... ASD: Average Server Distance. Server. Proxy, Uniform. Proxy, Cluster. Proxy, Star. Experiment ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 27
Provided by: nclabK
Category:

less

Transcript and Presenter's Notes

Title: Network-Aware Query Processing for Stream-based Application


1
Network-Aware Query Processing for Stream-based
Application
  • Yanif Ahmad, Ugur Cetintemel
  • Brown University
  • VLDB 2004

2
One-line Comments
  • This paper is addressing the operator placement
    problem in distributed query processing by using
    network latency information

3
Contents
  • Motivation
  • Problem
  • Solution Approach
  • Central Version of Algorithm
  • Edge
  • Edge
  • In-Network
  • latency Constrained
  • Distributed Version of Algorithm
  • Experiment
  • Critique

4
Motivation
  • Small scale query processing system Not-scalable
  • A lot of data stream query request
  • Widely-distributed query processing

5
Problem
  • Operator placement problem
  • Operators in query processing trees should be
    dispersed into the network

operator
node
Application node
O00
O00
O20
O21
O10
O10
O11
O11
O22
O23
O24
O20
O22
O21
O23
O25
O24
O26
O26
O25
Processing tree (query plan)
IP network
6
Problem formalized version
  • Operator placement problem
  • For efficient operator placement
  • Cost Bandwidth

O operators A their connected inputs
outputs V nodes E their links C() link cost,
bandwidth
m
a
c(a)
n
7
Solution Approach
  • Network-aware operator placement algorithms
  • Edge
  • Consider only sources and the proxy location
  • Edge
  • Edge with pair-wise server communication
    latencies
  • In-Network
  • Sources, proxy, a subset of all locations
  • Latency-bound algorithm

8
Contents
  • Motivation
  • Problem
  • Solution Approach
  • Central Version of Algorithm
  • Distributed Version of Algorithm
  • Experiment
  • Critique

9
Algorithm Design Principle
  • Naïve algorithm for operator placement
  • Calculate all the combination of possible mapping
  • gt Too complex
  • Greedy algorithm
  • Calculate only for the locations of having high
    possibility
  • Locate operators in post-order
  • When we put a operator at a location, we can move
    by its children

operator
node
Application node
O00
O10
O11
S0
O20
O22
O21
O23
O25
O24
O26
S1
Processing tree
IP network
10
Mapping Function
11
Edge
  • Location candidate sources, proxy
  • Candidate with high possibility
  • (1) One of childrens locations
  • (2) A common location
  • (3) Proxys location
  • Link cost

12
Edge (1) One of childrens locations
  • A location that maximizes the total tree cost
    between the operator and all of its children

O00
O10
O10
O12
O11
O20
O21
O22
O20
O22
O21
O23
O25
O24
O26
O27
O29
O28
S0
S1
S0
S0
S1
S0
S1
S1
S2
S0
S1
S1
Processing tree
13
Edge (2) A common location
  • Idea
  • Placing an operator and its children at a common
    location
  • -gt zero overlay cost between the operator and its
    children
  • Common location (cl)
  • Good place for all its children
  • -gt an intersection of each childs dl (the set of
    descendant leaf locations)

O00
dl(O11)S0, S1, S2 cl(O00)S0, S1
O10
O12
O11
O20
O22
O21
O23
O25
O24
O26
O27
O29
O28
S0
S0
S1
S0
S1
S1
S2
S0
S1
S1
14
Edge (3) Proxys location
  • Idea
  • If tree costs are higher near the root
  • -gt proxy location, r

O00
O10
O12
O11
O20
O22
O21
O23
O25
O24
O26
O27
O29
O28
S0
S0
S1
S0
S1
S1
S2
S0
S1
S1
15
Edge Summary
  • Summary

16
Edge
  • Location candidate sources, proxy
  • Edge with network latency (d) between two
    locations
  • Link cost
  • Mapping function

17
In-Network Placement
  • Location candidate arbitrary locations
    (including sources and proxy)
  • Overlay cost and mapping function is the same as
    Edge
  • Problem reducing the candidate location set

18
In-Network Placement
  • Approach
  • Remove the location unless its distance to all
    current child placements is less than all
    pairwise distances between child placements

N2
O11
O00
40
20
N7
60
O00
N8
O10
O12
O11
30
O12
30
N4
50
O10
19
Latency-Constrained Placement
  • Find the configuration satisfying the
    latency-constrained
  • Latency-constrained

P a set of leaf-to-root paths
o
If l75
N7
O
ci
30
N4
ci
N5
30
O
O20
O21
20
50
S0
O20
O22
O21
O22
S0
S0
S1
S0
S1
S1
S2
S0
S1
S1
S1
20
Contents
  • Motivation
  • Problem
  • Solution Approach
  • Central Version of Algorithm
  • Distributed Version of Algorithm
  • Experiment
  • Critique

21
Distributed Query Placement
  • Reason
  • Centralized approach not scalable
  • Substantial network state
  • Algorithm complexity

22
Distributed Query Placement
  • Application proxy
  • Partition a processing tree into subtrees (zones)
  • Assign each zone to a coordinator node

O1
O2
O3
O4
Processing tree
23
Distributed Query Placement
Tree Overlay
24
Experiment
  • Experimental Setup
  • Processing Tree
  • Binary tree
  • Depth 3 5
  • Network Topology
  • Max pair-wise path delay 500ms
  • Server and proxy location
  • Uniform APD ASD
  • Star APD 0.5ASD
  • Cluster APD 2ASD

APD Average Proxy Distance ASD Average Server
Distance
Server
Proxy, Uniform
Proxy, Cluster
Proxy, Star
25
Experiment
  • Latency constraints
  • 120ms (0.9nd, tight delay) vs. 300ms (2.2nd,
    loose delay)
  • Direct comparison
  • Baseline case all operators are located at the
    proxy
  • Result

Bandwidth consumption
Latency stretch
26
Critique
  • Pros
  • Operator placement problem
  • Focus on network-related cost not processing cost
    (BW, latency)
  • Cons
  • High complexity algorithm possible to apply?
  • Heavy processing
  • Too much time taken to complete the placement
  • Latency information of many places is needed
  • Sequential convergence in a bottom-up manner
  • gt impossible to use in case of complex query
    plan topology
  • gt more simple algorithm is appropriate
  • Dynamic?
  • Unresilient to Dynamic topology change
  • In case of node leave, latency change
Write a Comment
User Comments (0)
About PowerShow.com