Title: On Efficient Content Matching in Distributed PubSub Sytems
1On Efficient Content Matching in Distributed
Pub/Sub Sytems
- Weixiong Rao (The Chinese University of HK)
- Lei Chen (Hong Kong University of Sci. and Tech.)
- Ada Wai-Chee Fu (The Chinese University of HK)
2Outline
- Motivation
- Background
- Interval Tree a geometric data structure
- Mercury a structured P2P supporting range query
- Cobas Framework
- CobasTree
- 3 techniques
- Selective multicast
- Interval division
- Merging CobasTrees
- Performance Study
- Conclusion Future work
3Overview of Content based Pub/Sub Systems in
distributed Environment
Data Schema Attribute a 0,1) Attribute b
1,10
Content lta0.7gt ltb1gt
Publishers
D
A
Subscribers
publish
Broker
disseminate
E
Broker
register
Broker
Broker
C
F1 agt0.6
B
Broker
Unique Properties Unlike the traditional
network, no specific dest address Contents reach
the destination (subscribers) by (1) its data
contents and (2) subscribers filtering
conditions.
- Two key Metrics
- Low Communication Cost
- Timely Forwarding
4Our Observations of Content based Pub/Sub
- Data Schema
- The dimensionality could be high more than 10
- Publication Content
- a set of ltattribute, valgt pairs over ALL
attributes - Subscription Filters
- Predicates over SEVERAL attributes, NOT
necessarily ALL attributes - dimensionality mismatch
- the number of attributes in filters is NOT
necessarily equal to that of contents
5Cobas a pub/sub framework for structured contents
- Motivation
- The indexing structure is important for
structured content based pub/sub. - Contents/filters follow the predefined data
schema. - Existing approaches
- A Multi-dimensional index.
- the problem of dimensionality curse due to the
high dimensionality.? high traffics and latency. - Multiple one-dimensional indexes.
- A copy for each one-dimensional index.? high
traffic latency.
6Cobas basic idea
Data Schema Attribute a 0,1) Attribute b
1,10
- Predefined Data schema
- Publication Content
- content value as a data point.
- Subscription Filter
- each predicate in a subscription filter as an
interval - all filters (intervals) are organized as a
geometric data structure (interval tree or
segment tree ?CobasTree) - Matching
- matching contents against filters may be treated
as stabbing queries over the geometric data
structure
point (0.7,0.7)
Content lta0.7gt
F1 agt0.6
Interval (0.6, 1)
2 Intervals
Stabbing Query
7Cobas overview
- In P2P like distributed environment
- a new matching tree structure borrow the idea
from Interval tree/Segment tree - Bottom-up operations ? no overloading
- 3 techniques
- selective multicast. ? fast matching
- interval division ? less message cost
- Merging ? less message cost
8Outline
- Motivation
- Background
- Interval Tree a geometric data structure
- Mercury a structured P2P supporting range query
- Cobas Framework
- CobasTree
- 3 techniques
- Selective multicast
- Interval division
- Merging CobasTrees
- Performance Study
- Conclusion Future work
9Background
U(w)
L(w)
primary structure balanced binary search tree
- A segment is expanded by at most 2logn intervals
- Union of all node intervals in each level is
identical
- No redundancy
- An interval l u is registered at the highest
node it covers
10BackgroundP2P network and Mercury
- P2P network
- Support both exact query and range query
- Semantic maintenance and Load balancing
- Mercury
- creating a routing hub for each attribute
- O(log2 Nk) hops per hub
Rx
Ry
Copy from Mercury SIGCOMM04 slides
11Outline
- Motivation
- Background
- Interval Tree a geometric data structure
- Mercury a structured P2P supporting range query
- Cobas Framework
- CobasTree
- 3 techniques
- Selective multicast
- Interval division
- Merging CobasTrees
- Performance Study
- Conclusion Future work
12Cobas System overview
13Cobas basic operations
overloading
b 0
b 0
b 0
b 0
b 0
No overloading because each leaf node can be the
starting point.
14Cobas selective Multicast
b 0
b 0
b 0
b 0
2 copies, 2 units of latency
2 copies, 1 units of latency
15Cobas Interval Division
F22,4)
F20,2)
b 0
1 copies, 1 units of latency
F22,4)
F20,1)
F21,2)
the network traffics Vs the maintenance cost
b 0
Local matching in node 0 with no copies
16Cobas Merging
Data Schema Attribute I 0,1) Attribute J
1,100 Attribute K 0,50
- Before merging, there are d one-dimensional
CobasTrees. - A content is forwarded to these one-dimensional
CobasTrees with d copies. - Thus we need to reinsert the filters of some
CobasTree into other CobasTrees.
Attribute J
Attribute I
Attribute K
Fij Igt0.1, Jlt5
Fjk 30gtkgt20, Jgt10
FjJ 10
Filters having Attribute I J
Filters having Attribute K J
Fj 1gt Igt0, J 10
Filters only having Attribute J
Domain range of Attribute I.
17Outline
- Motivation
- Background
- Interval Tree a geometric data structure
- Mercury a structured P2P supporting range query
- Cobas Framework
- CobasTree
- 3 techniques
- Selective multicast
- Interval division
- Merging CobasTrees
- Performance Study
- Conclusion Future work
18Simulation Results
Publisher content side
Subscriber filter side
high popular
high popular dense
high dense
co-occurrence
No co-occurrence
19Cobas Experiment
- Cobas without merging too many content copies (
D) - Cobas without division too many long intervals ?
producing high load in the root ? many nodes
responsible for such interval. - DRTree suffer from the curse of high
dimensionality
20Simulation Results
- Count with D P2P instance, so caching in count
may achieve the most lookup reduction, but still
highest traffics - RTree with only 1 P2P instance, with the least
lookup messages - Cobas with caching in-between RTree and Count
Caching is useful to reduce the lookup hops in
Mercury P2P.
21Simulation Results
- The distribution of Storage load is relatively
even, all inside upper, lower - The matching load of a very few pnodes are high,
most is inside the balancing range upper, lower - When no caching, the load is unbalanced.
- More maintenance cost when
- more nodes fail
- More filters insertion/deletion
22PlanetLab Results
- Latency of Cobas may decrease by time
- Latency of RTree and Count is relatively high.
- Content Forwarding Cost Cost(C) may decrease due
to merging cobastrees and interval division - Filter Maintenance Cost Cost(F) may increase due
to dividing more intervals
23Conclusion and Future Work
- Cobas Framework
- A new Data Structure CobasTree
- 3 techniques
- Selective multicast
- interval Division
- CobasTree Merging
- Future work
- Selectivity filtering ? stateful filtering
- A single Publication Source ? Multiple Sources