On Efficient Content Matching in Distributed PubSub Sytems - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

On Efficient Content Matching in Distributed PubSub Sytems

Description:

Unlike the traditional network, no specific dest address; ... a set of attribute, val pairs over ALL attributes; Subscription Filters: ... – PowerPoint PPT presentation

Number of Views:298

Avg rating:5.0/5.0

Slides: 24

Provided by: weixio

Category:

more less

Transcript and Presenter's Notes

Title: On Efficient Content Matching in Distributed PubSub Sytems

1
On Efficient Content Matching in Distributed
Pub/Sub Sytems

Weixiong Rao (The Chinese University of HK)
Lei Chen (Hong Kong University of Sci. and Tech.)
Ada Wai-Chee Fu (The Chinese University of HK)

2
Outline

Motivation
Background
Interval Tree a geometric data structure
Mercury a structured P2P supporting range query
Cobas Framework
CobasTree
3 techniques
Selective multicast
Interval division
Merging CobasTrees
Performance Study
Conclusion Future work

3
Overview of Content based Pub/Sub Systems in
distributed Environment
Data Schema Attribute a 0,1) Attribute b
1,10
Content lta0.7gt ltb1gt
Publishers
D
A
Subscribers
publish
Broker
disseminate
E
Broker
register
Broker
Broker
C
F1 agt0.6
B
Broker
Unique Properties Unlike the traditional
network, no specific dest address Contents reach
the destination (subscribers) by (1) its data
contents and (2) subscribers filtering
conditions.

Two key Metrics
Low Communication Cost
Timely Forwarding

4
Our Observations of Content based Pub/Sub

Data Schema
The dimensionality could be high more than 10
Publication Content
a set of ltattribute, valgt pairs over ALL
attributes
Subscription Filters
Predicates over SEVERAL attributes, NOT
necessarily ALL attributes
dimensionality mismatch
the number of attributes in filters is NOT
necessarily equal to that of contents

5
Cobas a pub/sub framework for structured contents

Motivation
The indexing structure is important for
structured content based pub/sub.
Contents/filters follow the predefined data
schema.
Existing approaches
A Multi-dimensional index.
the problem of dimensionality curse due to the
high dimensionality.? high traffics and latency.
Multiple one-dimensional indexes.
A copy for each one-dimensional index.? high
traffic latency.

6
Cobas basic idea
Data Schema Attribute a 0,1) Attribute b
1,10

Predefined Data schema
Publication Content
content value as a data point.
Subscription Filter
each predicate in a subscription filter as an
interval
all filters (intervals) are organized as a
geometric data structure (interval tree or
segment tree ?CobasTree)
Matching
matching contents against filters may be treated
as stabbing queries over the geometric data
structure

point (0.7,0.7)
Content lta0.7gt
F1 agt0.6
Interval (0.6, 1)
2 Intervals
Stabbing Query
7
Cobas overview

In P2P like distributed environment
a new matching tree structure borrow the idea
from Interval tree/Segment tree
Bottom-up operations ? no overloading
3 techniques
selective multicast. ? fast matching
interval division ? less message cost
Merging ? less message cost

8
Outline

Motivation
Background
Interval Tree a geometric data structure
Mercury a structured P2P supporting range query
Cobas Framework
CobasTree
3 techniques
Selective multicast
Interval division
Merging CobasTrees
Performance Study
Conclusion Future work

9
Background
U(w)
L(w)
primary structure balanced binary search tree

A segment is expanded by at most 2logn intervals
Union of all node intervals in each level is
identical

No redundancy
An interval l u is registered at the highest
node it covers

10
BackgroundP2P network and Mercury

P2P network
Support both exact query and range query
Semantic maintenance and Load balancing
Mercury

creating a routing hub for each attribute
O(log2 Nk) hops per hub

Rx
Ry
Copy from Mercury SIGCOMM04 slides
11
Outline

Motivation
Background
Interval Tree a geometric data structure
Mercury a structured P2P supporting range query
Cobas Framework
CobasTree
3 techniques
Selective multicast
Interval division
Merging CobasTrees
Performance Study
Conclusion Future work

12
Cobas System overview
13
Cobas basic operations
overloading
b 0
b 0
b 0
b 0
b 0
No overloading because each leaf node can be the
starting point.
14
Cobas selective Multicast
b 0
b 0
b 0
b 0
2 copies, 2 units of latency
2 copies, 1 units of latency
15
Cobas Interval Division
F22,4)
F20,2)
b 0
1 copies, 1 units of latency
F22,4)
F20,1)
F21,2)
the network traffics Vs the maintenance cost

b 0
Local matching in node 0 with no copies
16
Cobas Merging
Data Schema Attribute I 0,1) Attribute J
1,100 Attribute K 0,50

Before merging, there are d one-dimensional
CobasTrees.
A content is forwarded to these one-dimensional
CobasTrees with d copies.
Thus we need to reinsert the filters of some
CobasTree into other CobasTrees.

Attribute J
Attribute I
Attribute K
Fij Igt0.1, Jlt5
Fjk 30gtkgt20, Jgt10
FjJ 10
Filters having Attribute I J
Filters having Attribute K J
Fj 1gt Igt0, J 10
Filters only having Attribute J
Domain range of Attribute I.
17
Outline

Motivation
Background
Interval Tree a geometric data structure
Mercury a structured P2P supporting range query
Cobas Framework
CobasTree
3 techniques
Selective multicast
Interval division
Merging CobasTrees
Performance Study
Conclusion Future work

18
Simulation Results
Publisher content side
Subscriber filter side
high popular
high popular dense
high dense
co-occurrence
No co-occurrence
19
Cobas Experiment

Cobas without merging too many content copies (
D)
Cobas without division too many long intervals ?
producing high load in the root ? many nodes
responsible for such interval.
DRTree suffer from the curse of high
dimensionality

20
Simulation Results

Count with D P2P instance, so caching in count
may achieve the most lookup reduction, but still
highest traffics
RTree with only 1 P2P instance, with the least
lookup messages
Cobas with caching in-between RTree and Count

Caching is useful to reduce the lookup hops in
Mercury P2P.
21
Simulation Results

The distribution of Storage load is relatively
even, all inside upper, lower
The matching load of a very few pnodes are high,
most is inside the balancing range upper, lower
When no caching, the load is unbalanced.

More maintenance cost when
more nodes fail
More filters insertion/deletion

22
PlanetLab Results

Latency of Cobas may decrease by time
Latency of RTree and Count is relatively high.

Content Forwarding Cost Cost(C) may decrease due
to merging cobastrees and interval division
Filter Maintenance Cost Cost(F) may increase due
to dividing more intervals

23
Conclusion and Future Work