Title: Sven Bittner, 28 November 2006
1Talk at the 3rd International Middleware Doctoral
Symposium (MDS 2006) Supporting Arbitrary
Boolean Subscriptions in Distributed
Publish/Subscribe Systems
- Sven Bittner, 28 November 2006
- Department of Computer Science
- The University of Waikato, New Zealand
This research is partially funded by the NZ
Government under the New Zealand International
Doctoral Research Scholarships (NZIDRS) programme.
2Structure of Talk
- Motivation Publish/Subscribe
- Problem Description
- Filtering in Central Components
- Routing in the Distributed System
- Summary and Outlook
Sven Bittner Supporting Arbitrary Boolean
Subscriptions in Distributed Pub/Sub Systems
3Publish/Subscribe Systems
Pub/sub system
Publishers
Subscribers
B4
B5
B3
B6
B2
B1
B7
B9
B8
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
4Messages Subscriptions
- Event messages
- Describe a state change/real-world event
- Attribute-value pairs
- Subscriptions
- Describe interests
- Arbitrary Boolean combination of predicates
title Harry Potter endingWithin 6 hours
condition new price 15.00
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
5Context Filtering
- Filtering algorithm
- Determination of all subscriptions matching an
incoming event message (messages not stored) - Indexation of subscriptions and predicates
- Support of required subscription language
(Boolean)
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
6Context Routing
- Routing algorithm
- Determination of all brokers with matching
subscriptions - Distribution of subscriptions to build event
routing tables - Subscriptions as routing entries (where to route
messages)
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
7Context Routing Optimization
- Optimization goal
- Improvement of routing process, e.g.,
- Higher throughput
- Less memory for routing tables
- Manipulation of
- routing entries
Routing table
B2
S1
B3
S9
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
8Structure of Talk
- Motivation Publish/Subscribe
- Problem Description
- Filtering in Central Components
- Routing in the Distributed System
- Summary and Outlook
Sven Bittner Supporting Arbitrary Boolean
Subscriptions in Distributed Pub/Sub Systems
9Current Approaches (1)
- Observations
- Current systems only support conjunctive
subscriptions - Restrictions exploited in
- Filtering algorithms
- Routing optimizations
- ? No consideration of other operators in
subscriptions
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
10Current Approaches (2)
- Motivation
- Arbitrary Boolean subscriptions can be converted
to DNF (exponential in size) - Every conjunction is handled as separate
subscription - ? Approach as in database management systems
(conversion of query restrictions)
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
11Publish/Subscribe vs. DBMS (1)
- Important differences
- Number of simultaneous data requests
- DBMSs relatively small number of queries
- Pub/sub systems large number of subscriptions
- ? Even higher load after conversion
- Query processing
- DBMSs query optimization on canonical form based
on known data (access plans, cost estimation,
etc.) - Pub/sub systems events are unknown, no
optimization applied
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
12Publish/Subscribe vs. DBMS (2)
- Considering data storage
- Subscriptions ? queries
- Data (base) ? subscription (base)
- Queries ? event messages
- Messages are in canonical form (attribute-value
pairs) - So, why converting subscriptions as well?
- ? Questionable whether to take conversion
approach in pub/sub (problem size explosion)
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
13Hypothesis
-
-
- The internal support of arbitrary Boolean
subscriptions - reduces the memory requirements
-
- compared to current conjunctive solutions
- without degrading the system efficiency.
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
14Steps to Take
- Development and analysis of
- Filtering algorithm (central broker components)
- Routing optimization (distributed system)
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
15Structure of Talk
- Motivation Publish/Subscribe
- Problem Description
- Filtering in Central Components
- Routing in the Distributed System
- Summary and Outlook
Sven Bittner Supporting Arbitrary Boolean
Subscriptions in Distributed Pub/Sub Systems
16Steps Undertaken (1)
- 1. Application scenario analysis BH06b online
auctions - Analysis of distributions on eBay
- Identification of typical subscription classes
- Semi-realistic data set
- Used in later analysis
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
17Steps Undertaken (2)
- 2. Filtering algorithm for arbitrary Boolean
subscriptions BH05a - Generic solution
- Extends general-purpose conjunctive counting
algorithm YGM94, AJL02 - Filters on conjunctions the same way as counting
approach
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
18Steps Undertaken (3)
- 3. Characterization scheme and memory analysis
BH05b - Description of subscription patterns
- Analysis of counting, cluster HCKW90, FJL01
and Boolean approach - Determination of point where Boolean approach
requires less memory - Already one disjunction might favor Boolean
approach
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
19Steps Undertaken (4)
- 4. Practical Verification/Efficiency Analysis
- Confirmation of theoretical results
- Efficiency is similar to counting approach
- Summary Boolean solution
- More space efficient filtering
- Similar time efficiency properties
- Scheme helps with decision Boolean/conjunctive
algorithm
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
20Structure of Talk
- Motivation Publish/Subscribe
- Problem Description
- Filtering in Central Components
- Routing in the Distributed System
- Summary and Outlook
Sven Bittner Supporting Arbitrary Boolean
Subscriptions in Distributed Pub/Sub Systems
21Subscription Pruning (1)
- Remove parts ofsubscription trees
- ? Creates more general subscription
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
22Subscription Pruning (2)
- Less complex (time and space) subscriptions ()
- More events forwarded (-)
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
23Application of Pruning (1)
- Pruning of routing entries
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
24Application of Pruning (2)
- Pruning of routing entries
No pruning in local broker ? Ensure correct
filtering
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
25Application of Pruning (3)
- Pruning of routing entries
Subscriber
- Less complex subscriptions
- More time and space efficient routing
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
26Application of Pruning (4)
- Pruning of routing entries
Subscriber
- But more general subscriptions
- More forwarded event messages (false positives)
- More event messages to route/process
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
27Practical Pruning
- Question
- What subscription and what part of its
subscription tree should be pruned first? - Answer
- Four heuristics BH06c based on influence on
- Memory usage
- Filter efficiency
- Network load (selectivity)
- Network load (selectivity popularity)
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
28Experiments (In Progress)
- Evaluation in online auction setting
- Different distributions in subscriptions
- Applicability to conjunctive subscriptions
- Comparison to conjunctive routing optimization
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
29Structure of Talk
- Motivation Publish/Subscribe
- Problem Description
- Filtering in Central Components
- Routing in the Distributed System
- Summary and Outlook
Sven Bittner Supporting Arbitrary Boolean
Subscriptions in Distributed Pub/Sub Systems
30Summary
- Motivation
- Publish/subscribe systems
- Online auction scenario
- Need for arbitrary Boolean subscriptions
- Problem definition
- Only conjunctions supported
- Conversion adopted from DBMSs
- Does conversion make sense?
- Hypothesis No, if disjunctions occur!
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
31Summary
- Central broker components
- Boolean Filtering algorithm
- Characterization scheme
- Analysis, comparison, and verification
- ? Boolean approach is favorable
- Distributed system
- Novel optimization subscription pruning
- First Experiments valuable optimization
- Memory requirements ?, throughput ?
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
32Future Work
- Future work
- Writing up
- Finish experiments
- Heuristic based on multicriteria optimization
- Future work (not within PhD)
- Analyze other applications
- Optimize solutions
- Open source prototype
Motivation Problem Definition Central
Filtering Routing Optimizations Summary
33Thank you for your attention!
Selected further reading BH05a S. Bittner and
A. Hinze. On the Benefits of Non-Canonical
Filtering in Publish/Subscribe Systems. In
Proceedings of the 25th IEEE International
Conference on Distributed Computing Systems
Workshops (ICDCSW '05), Columbus, USA, June 2005.
BH05b S. Bittner and A. Hinze. A Detailed
Investigation of Memory Requirements for Pub/Sub
Filtering Algorithms. In Proceedings of the 13th
International Conference on Cooperative
Information Systems (CoopIS 2005), Agia Napa,
Cyprus, 31 October-4 November, 2005. BH06a S.
Bittner and A. Hinze. Pruning Subscriptions in
Distributed Pub/Sub Systems. In Proc. of the 29th
Austral. Computer Science Conference (ACSC 2006),
Hobart, Australia, 16-19 January, 2006. BH06b
S. Bittner and A. Hinze. Event Distributions in
Online Book Auctions. Technical Report 03/2006.
Computer Science Department, Waikato University,
New Zealand, February 2006. BH06c S. Bittner
and A. Hinze. Dimension-Based Subscription
Pruning for Publish/Subscribe Systems. In
Proceedings of the 26th IEEE International
Conference on Distributed Computing Systems
Workshops (ICDCSW '06), Lisbon, Portugal, July
2006. BH06d S. Bittner and A. Hinze.
Optimizing Pub/Sub Systems by Advertisement
Pruning. In Proceedings of the 8th International
Symposium on Distributed Objects and Applications
(DOA 2006), Montpellier, France, 30 October-1
November 2006.
Sven Bittner, s.bittner_at_cs.waikato.ac.nz Talk
Supporting Arbitrary Boolean Subscriptions
in Distributed Publish/Subscribe Systems
34Selected Other References
- AJL02 G. Ashayer, H.-A. Jacobsen, and H. Leung.
Predicate Matching and Subscription Matching in
Publish/Subscribe Systems. In Proceedings of the
22nd IEEE International Conference on Distributed
Computing Systems Workshops (ICDCSW '02), Vienna,
Austria, July 2-5 2002. - CRW01 A. Carzaniga, D. S. Rosenblum, and A. L.
Wolf. Design and Evaluation of a Wide-Area Event
Notification Service. ACM Transactions on
Computer Systems (TOCS), 19(3)332-383, 2001. - FJL01 F. Fabret, A. Jacobsen, F. Llirbat, J.
Pereira, K. Ross, and D. Shasha. Filtering
Algorithms and Implementation for Very Fast
Publish/Subscribe Systems. In Proc. of the 2001
ACM SIGMOD Intern. Conference on Management of
Data (SIGMOD 2001), USA, May 2001. - HCKW90 E. N. Hanson, M. Chaabouni, C.-H. Kim,
and Y.-W. Wang. A Predicate Matching Algorithm
for Database Rule Systems. In Proceedings of the
1990 ACM SIGMOD International Conference on
Management of Data (SIGMOD 1990), Atlantic City,
USA, May 23-25 1990. - LHJ05 G. Li, S. Hou, and H.-A. Jacobsen. A
Unified Approach to Routing, Covering and Merging
in Publish/Subscribe Systems based on Modified
Binary Decision Diagrams. In Proc. of the 25th
IEEE Intern. Conference on Distributed Computing
Systems (ICDCS '05), USA, June 2005. - MF01 G. Muehl and L. Fiege. Supporting Covering
and Merging in Content-Based Publish/Subscribe
Systems Beyond Name/Value Pairs. IEEE
Distributed Systems Online (DSOnline), 2(7),
2001. - TE04 P. Triantafillou and A. Economides.
Subscription Summarization A New Paradigm for
Efficient Publish/Subscribe Systems. In
Proceedings of the 24th IEEE International
Conference on Distributed Computing Systems
(ICDCS '04), Tokyo, Japan, March, 2004. - WQV04 Y.-M. Wang, L. Qiu, C. Verbowski, D.
Achlioptas, G. Das, and P. Larson. Summary-based
Routing for Content-based Event Distribution
Networks. ACM SIGCOMM Computer Communication
Review, 34(5)59-74, 2004. - YGM94 T. W. Yan and H. Garcia-Molina. Index
Structures for Selective Dissemination of
Information Under the Boolean Model. ACM
Transactions on Database Systems (TODS),
19(2)332-364, 1994.