Mercury: Building Distributed Applications with PublishSubscribe

About This Presentation

Title:

Mercury: Building Distributed Applications with PublishSubscribe

Description:

2. Hub Selectivity. Recall: subscription is sent to one 'randomly' chosen hub! Ideally, it should be sent to the 'highest selective' hub ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 22

Provided by: Ash8

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Mercury: Building Distributed Applications with PublishSubscribe

1
Mercury Building Distributed Applications with
Publish-Subscribe

Ashwin Bharambe
Carnegie Mellon University
Monday Seminar Talk

2
Quick Terminology Recap

Basics
Publishers inject data/events/publications
Subscribers register interests/subscriptions
Brokers match subscriptions with publications
and deliver to subscribers
Mercury distributed publish-subscribe system
Performs matching and content routing in
distributed fashion
Data model

Name ashwin Age 23 X 192.3 Y 223.4
Name Age gt 35 X gt 100 X lt 180
Publication
Subscription
3
Virtual reality example
Events
(50,250)
(100,200)
User
Arena
(150,150)
Interests
Virtual World
4
Mercury goals

Implement distributed publish-subscribe
Support range queries
Avoid hot-spots in the system
Flooding anything is bad
Avoid publication flooding completely
Avoid subscription flooding as much as is
possible
Consider queries like SELECT from RECORDS
Peer-to-peer scenario
No dedicated brokers
Highly dynamic network

5
Talk Contents

Mercury Architecture
Overlay construction
Routing guarantees
Overlay properties
How randomness is useful
Load balancing histogram maintenance
Application Design

6
Attribute Hubs

Each attribute range is divided into bins
A node responsible for range of attribute values
Assigned when the node joins can change
dynamically

7
Routing
y
Generating point
S
age
S
name
Name X gt 100 X lt 180
Subscription ?
x

Send a subscription to one hub
Which one? Interesting question in itself!
Determine query selectivity send to highest
selective hub

8
Routing (contd.)
age
Generating point
x
P
P
P
name
y
Name ashwin Age 23
Publication ?

We must send publications to all hubs
Ensures matching

9
Routing illustrated
10
Hub structure and routing (Symphony)

Naïve routing along the circle scales linearly
Utilize the small-world phenomenon Kleinberg
2000
Know thy neighbors and one random person and you
can contact anybody quickly
Routing policy choose the link which gets you
closest to destn
Performance
Average hop length O(log2 (n)/k) with k
random links

Need to be careful when node ranges are not
uniform
11
Caching

O(log2 (n)) is good, but each hop is still an
application level hop
Latency can be quite large if overlay
non-optimized
For distributed applications like games, this is
way off from optimal
Exploit locality in the access patterns of an
application
In addition to k random links, have cached
links
Store nodes which were the rendezvous points for
recent publications

12
Performance (Uniform workload)
long links 6 cache links log(n)
Publications were generated from a uniform
distribution
13
Performance (Skewed workload)
long links 6 cache links log(n)
Publications were generated from a high skew Zipf
distribution
14
Performance (Memory reference trace)
long links 6 cache links log(n)
Publications were generated from memory
references of SPEC2000 benchmark
15
Two Problems

1. Load Balancing
Concern because publication values need not
follow a uniform, or a priori known, distribution
Node ranges are assigned when the nodes join

16
Problems (contd.)

2. Hub Selectivity
Recall subscription is sent to one randomly
chosen hub!
Ideally, it should be sent to the highest
selective hub
Need to estimate selectivity of a subscription

17
Hail randomness

Randomized construction of the network gives
additional benefits!
Turns out, this network is an Expander with high
probability
Random walks mix rapidly i.e., they approach
the stationary distribution rapidly
Uniform sampling non-trivial
Node ranges are not uniform across nodes
Random walks efficient way of sampling
No explicit hierarchy required (as in RANSUB
USITS 03)
In general, several statistics about a very
dynamic network can be efficiently maintained

18
Hub Selectivity (ideas)

Use sampling to build approximate histograms
Approach 1 (Push)
Each Rendezvous point selects publications with
a certain probability and sends them off with
specific TTL
log2(n) length random walk ensures good mixing
Traffic overhead / publications
Approach 2 (Pull)
Perform uniform random sampling periodically
Each sample histogram of sampled node
Question how to combine histograms?

19
Load balancing (ideas)