Title: Republishers in a PublishSubscribe Architecture for Data Streams
1Republishers in a Publish/Subscribe Architecture
for Data Streams
- Alasdair J G Gray and Werner Nutt
- School of Mathematical and Computer Sciences,
- Heriot-Watt University, Edinburgh
- 6th July 2005
2Overview
- Motivation
- Publish/Subscribe Architecture
- Query planning
3Motivation
- Scenario
- Streams generated by distributed sensors
- Users are also distributed
- Use data integration to match users to streams
- For example,
- Grid monitoring for logging and bookkeeping
- Sensor networks
Bookkeeping
Job progress
Grid
Monitoring data
4Data Streams as Relations
- Sensor readings can be viewed as
- tuples
- conforming to a relational schema
- Example Network ThroughPut
5Publish/Subscribe Architecture
- Local as View Approach
- Consumers pose a query over the schema to request
streams - Producers describe their stream using a view on
the schema - Queries and views are selections over a single
relation
Registry
Data Streams
6Query Planning Consumer Query
C from 'hw' ? psize 1024
Problem Approach does not scale to hundreds of
producers and consumers.
S1 from 'hw' ? tool 'udp'
S2 from 'hw' ? tool 'ping'
S3 from 'ral' ? tool 'ping'
S4 from 'ral' ? tool 'udp'
7Republishers Provide Scalability
C from 'hw' ? psize 1024
R3 TRUE
R1 from 'hw'
R2 from 'ral'
S1 from 'hw' ? tool 'udp'
S2 from 'hw' ? tool 'ping'
S3 from 'ral' ? tool 'ping'
S4 from 'ral' ? tool 'udp'
8Plans Need to be Maintained
- Queries are long lived
- Set of publishers can change
- Query plans should reflect changes
- What happens when we
- add a republisher?
- remove a republisher?
9Adding a Republisher 1st Attempt
R3 TRUE
Relevant publishers
Maximal relevant
Replan other queries
Adding a new publisher
Replan R3
R1 from 'hw'
R2 from 'ral'
R4 tool 'ping'
S1 from 'hw' ? tool 'udp'
S2 from 'hw' ? tool 'ping'
S3 from 'ral' ? tool 'ping'
S4 from 'ral' ? tool 'udp'
10Desirable Properties for a Hierarchy
- Correctness streams answer queries
- Cycle freeness loops can lead to duplicates
- Uniqueness hierarchy defined for a set of
publishers - Local planning Publishers and Consumers only
need to communicate with the Registry
11Adding a Republisher 2nd Attempt
C from 'hw' ? psize 1024
R3 TRUE
Relevant publishers
R1 from 'hw'
R2 from 'ral'
R4 tool 'ping'
S1 from 'hw' ? tool 'udp'
S2 from 'hw' ? tool 'ping'
S3 from 'ral' ? tool 'ping'
S4 from 'ral' ? tool 'udp'
12Removing a Republisher
C from 'hw' ? psize 1024
R3 TRUE
R1 from 'hw'
R2 from 'ral'
R4 tool 'ping'
S1 from 'hw' ? tool 'udp'
S2 from 'hw' ? tool 'ping'
S3 from 'ral' ? tool 'ping'
S4 from 'ral' ? tool 'udp'
13Conclusions
- Republishers
- Allow system to scale
- Complicate query answering problem
- Republishers require special planning
- We have developed algorithms that allows the
system to adapt to changes in the set of
publishers - Full details available in HW Technical Report
www.macs.hw.ac.uk8080/techreps/view_record.jsp?id
0031
14Integrating Data Streams
- Local as View Approach
- Consumers pose a query over the schema to request
streams - Producers describe their stream using a view on
the schema - Queries and views are selections over a single
relation
15Example
C from 'hw' ? psize 1024
R3 TRUE
R1 from 'hw'
R2 from 'ral'
R4 tool 'ping'
S1 from 'hw' ? tool 'udp'
S2 from 'hw' ? tool 'ping'
S3 from 'ral' ? tool 'ping'
S4 from 'ral' ? tool 'udp'