Publish and Subscribe - PowerPoint PPT Presentation

About This Presentation
Title:

Publish and Subscribe

Description:

Matching Events in a Content-based Subscription System ... the content which controls subscription is a 'subject' string ... Attribute-based subscription model ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 35
Provided by: rice65
Learn more at: http://dsandler.org
Category:

less

Transcript and Presenter's Notes

Title: Publish and Subscribe


1
Publish and Subscribe
  • The Information Bus
  • An Architecture for Extensible Distributed
    Systems
  • Oki, Pfluegl, Siegel, Skeen. 1993.
  • Matching Events in a Content-based Subscription
    System
  • Aguilera, Strom, Sturman, Astley, Chandra. 1999.

Dan Sandler COMP 520 October 7, 2004
2
Distributed Systems in the Real World
  • So far Tools for building distributed systems
  • Focused on certain problems
  • Redundancy
  • Distribution
  • Marshalling and communication
  • Less attention paid to others
  • Discoverable systems
  • Maintainable, upgradeable systems

3
Generative Programming in Linda
  • Review Linda
  • Typed data organized into tuples
  • Stored indefinitely in global tuple space
  • Tuples requested by partial specification
  • Anonymous communication

TUPLE SPACE
4
Problems in Tuple Space
  • Open Issues
  • Unbounded storage requirements of tuple space
  • Tuple contents weak on flexibility, metadata,
    discoverability
  • General tuple-searching can be complex, slow

TUPLE CLUTTER
5
Take-aways from Linda
  • The content itself connects senders to receivers
  • Participants have no other formal relationship
  • Lets explore this model further

6
Publish and Subscribe
  • Recall Lindas simple in/out operators
  • If there is an in() pending when a matching out()
    is invoked, the scenario resembles what we now
    call Publish and Subscribe
  • The Information Bus is such a system

Producer
out(ltgt)
Consumer
in(ltgt)
7
The Information Bus
  • Goal develop real-time, 24/7 systems
  • Circuit fabrication
  • Securities trading systems
  • Specific requirements derived from these
    situations
  • Continuous operation
  • Legacy systems integration
  • Dynamic system evolution

8
Evolution is hard
  • Capacity for change must be planned from the
    beginning
  • Systems may need to evolve in many ways
  • New kinds of data
  • New applications (services, clients)
  • Fault recovery and scalability can be considered
    evolution
  • Remember Evolution must occur without
    interruption of service

9
Architecture of the Information Bus
  • Clients may publish data objects under a specific
    subject
  • Clients may subscribe to one or more subjects to
    receive data
  • Note The bus broadcasts all published data to
    all participating hosts

10
A snapshot of the Information Bus

Subject Data ltobjectgt
Subject Data ltobjectgt
Subject Data ltobjectgt
Subject Data ltobjectgt
THE INFORMATION BUS
PUBLISHER
SUBSCRIBER
UNINTERESTED
11
Properties of the Information Bus
  • P1. Minimal core semantics
  • Recall the end to end argument complexity at
    a low level is usually either insufficientor
    overkill
  • Two styles of communication
  • Remote method invocation
  • Publish/subscribe
  • Two kinds of objects
  • Data (things sent on the bus)
  • Services and Clients (things that use the bus)

12
Properties of the Information Bus (cont.)
  • P2. Self-describing objects
  • We might call this introspection today
  • Given an object, we can ask at run-time for
  • object type,
  • property types and values,
  • method signatures, etc.
  • All participants and data play by these rules
  • Effect loose coupling and run-time discovery

13
Properties of the Information Bus (cont.)
  • P3. Dynamic classing
  • A fancy way of expressing the ability of the
    system implementation to be changed at run-time
  • Without interruption of the system
  • New classes can be defined
  • New code can be introduced
  • This is clearly necessary for evolvability

14
Properties of the Information Bus (cont.)
  • P4. Anonymous communication
  • The hallmark of publish-and-subscribe
  • Data objects are sent and received based on
    content alone
  • Details of the participants are irrelevant
  • In this system, the content which controls
    subscription is a subject string
  • No other part of the data is involved in
    delivering the object to subscribers
  • Subjects typically organized with hierarchy (cf.
    Usenet groups rice.owlnews.comp520)

15
Other features of the Information Bus
  • What else is going on in the bus?
  • Object discovery
  • Point-to-point remote method invocation
  • Legacy data conversion

16
Discovery protocol
  • Discovering participants in a given subject
  • A, B, D all subscribed to Little Green Apples

A
B
C
D
Subject apples.little.greenData Whos there?
Subject apples.little.greenData Im here, my
name is B
Subject apples.little.greenData Im here, my
name is D
THE INFORMATION BUS
17
RMI brokering
  • Finding a participant to invoke methods
  • Like the discovery protocol

A
B
C
D
1
2
3
4
Subject apples.little.greenData I want to make
a method call.
Subject apples.little.greenData Sure, my
address is 2
Subject apples.little.greenData Sure, my
address is 4
THE INFORMATION BUS
18
Adapters

Adapters convert data from legacy systems to
pub/sub messages
Subject Data ltobjectgt
THE INFORMATION BUS
Other clients dont know that theres a legacy
system involved
19
Dynamic System Evolution
  • New clients can be brought on-line at any time
  • Subscribe to current subjects
  • Publish objects of conventional type
  • Publish objects of novel type and implementation
  • Create new subjects for subscription
  • Existing subscriptions unaffected

20
Problems solved by the Information Bus
  • System is available, evolvable
  • Maintenance may be performed on-line
  • New services and clients can be rolled out
    incrementally, without downtime
  • Is subject-based subscription a limitation?
  • Simple subject easier to test than arbitrary
    tuple signatures
  • Lets look closer at this matching problem

21
Matching Events in a Content-based Subscription
System
  • Scenario The content-based pub/sub system
  • Like the Information Bus subscriptions based on
    content, rather than a membership list
  • A participant has (potentially) many
    subscriptions
  • A participant receives (potentially) many
    publications

22
The Matching Problem
  • Each participant must test each event to see
    which subscriptions it matches
  • Attribute-based subscription model
  • Each event may have multiple attributes, some or
    all of which may be tested
  • Example subscriptions
  • Fruitapple Sizelittle Colorgreen
  • Fruitapple Size Colorred
  • Fruit Sizelittle Color
  • dont care (match anything) 

23
The Matching Problem
  • Trivially, this problem is linear in the number
    of subscriptions
  • By adding multiple attributes, its now linear in
    the number of attributes too
  • Can we do better than the naïve matching
    implementation?

24
The Exact Attribute Problem
  • Consider a special case of this problem
  • Each attribute is to be matched exactly
  • (Alternatives substring match, lexicographic
    comparison, etc.)

25
General algorithm
  • Pre-process all subscriptions into a matching
    tree
  • Like a decision tree of attribute tests
  • Goal If multiple subscriptions have the same
    attribute requirements, only test that attribute
    once for all subscriptions
  • Similar problem matching multiple strings in
    text
  • consider each char of each string an attribute

26
Naïve Matching
  • Subscriptions
  • SUB1 apples.little.green
  • SUB2 apples..yellow
  • SUB3 bananas.little.green
  • Algorithm
  • Search each subscription separately
  • For each event,
  • For each subscription,
  • For each attribute,
  • Test against event

Naïve algorithm
1
1
1
apples?
apples?
bananas?
2
2
2
little?

little?
3
3
3
green?
yellow?
green?
SUB1
SUB2
SUB3
27
Matching Tree
  • Subscriptions
  • SUB1 apples.little.green
  • SUB2 apples..yellow
  • SUB3 bananas.little.green
  • Algorithm
  • Search all subscriptions together
  • For each event,
  • Recursive tree search
  • For each attribute (node)
  • Test against event
  • Follow all matching edges
  • Leaf nodes matches

Matching tree algorithm
1
apples?
bananas?
2
2
little?

little?
3
3
3
green?
yellow?
green?
SUB1
SUB2
SUB3
28
Complexity of the matching tree
  • Why is this better?
  • By inspection, the matching tree tends to have
    fewer tests than the trivial implementation
  • Fewer nodes, that is, assuming theres some
    overlap in attribute values among your
    subscriptions
  • Still linear in number of subscriptions, however

29
Complexity of the matching tree (cont.)
  • Deeper insight
  • For the exact-matching problem, the number of
    branches you can follow is at most 2
  • i.e. some events attri X you can only
    follow X and
  • It gets better, however
  • If there are no subscriptions for attri, you
    will follow 0 or 1 branches
  • Intuition more like a traditional search tree

30
Complexity of the matching tree (cont.)
  • Time complexity shown to be O(N1-?)
  • (The expected complexity for random events)
  • ? related to number of non- edges in the matched
    path can be as high as ½
  • Intuition the more exact tests there are, the
    fewer branches you will follow
  • Other complexity characteristics
  • Space complexity linear
  • Pre-computation linear

31
Complexity of the matching tree (cont.)
  • Simulation with random data

complexity
of subscriptions
32
Optimizations
  • Collapse multiple dont care edges into a
    single edge
  • Rationale Many subscriptions dont care about
    most attributes of data (60 speedup in
    simulation)
  • Pre-compute successor nodes
  • Short-circuit parts of the matching tree in
    special situations

33
Successor Node Optimization
  • Subscriptions
  • SUB1 .little.green
  • SUB2 ..yellow
  • SUB3 bananas.little.green
  • Whats going on?
  • Annotate nodes with links to other nodes you know
    will also match at that point
  • Example if we match bananas.little, we know
    .little and . will also match for sure

Matching tree algorithm
1

bananas?
2
2
little?

little?
3
3
3
green?
yellow?
green?
SUB1
SUB2
SUB3
34
Summary and Discussion
  • Publish/subscribe participants connected only by
    exchanged data
  • Flexible, loose connections an evolvable system
  • No Linda-like storage
  • (but you could implement a storage service in a
    pub/sub system)
  • So what about the matching problem?
  • It only exists in broadcast pub/sub
  • Each participant sees each event
  • Question Is this realistic?
  • Trend multicast instead of broadcast
  • Subscription lists more administration, but
    potentially better publication performance
  • P2P?
Write a Comment
User Comments (0)
About PowerShow.com