Continuously Adaptive Continuous Queries over Streams - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Continuously Adaptive Continuous Queries over Streams

Description:

But, CQs are long running. Initially valid assumptions less so over time ... Long running, 'standing queries', similar to trigger systems. Exclusively read-only ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 23

Provided by: ippokrat

Category:

more less

Transcript and Presenter's Notes

Title: Continuously Adaptive Continuous Queries over Streams

1
Continuously Adaptive Continuous Queries over
Streams
SIGMOD 2002
(some slides were taken from Maddens SIGMOD
presentation)

Samuel Madden
Mehul Shah
Joseph M. Hellerstein
Vijayshankar Raman

Presented by Ippokratis Pandis
15-823 Hot Topics in DB Systems
2
Introduction CQ 1

Description
Streams of Data (Sensors/Web pages/Stock
Analysis/Telephony/)
Users register logical specifications of interest
Engine filters, combines data and returns result
Some Characteristics
Proposed systems are based on static plans
But, CQs are long running
Initially valid assumptions less so over time
Static optimizers at their worst!

CQ systems should be Adaptive
3
Introduction CQ 2

Long running, standing queries, similar to
trigger systems
Exclusively read-only operations
Installed continuously produce results until
removed
Lots of queries, over the same data sources
Global query optimization problem hard!
Idea adaptive heuristics not quite as hard?
Bad decisions are not final

Opportunities for work sharing
4
Introduction - Eddies

No need to re-present them
Properties
Data-flow-oriented components
No static ordering of operators
Adapt quickly to the fluctuating environment
Policy dynamically orders operators on a per
tuple basis
done and ready bits encode where tuple has been,
where it can go
Routing policies use back-pressure and lottery
picking to favor fast and high-filtering
operators

5
Idea

CQ
Eddies
SteMs/Qrouped Filters
CACQ

6
CACQ Implementation

Monotonic queries, from point when query is
registered
Streaming answers
Non-blocking operators
Windowed Symmetric Joins (Windows in tuples or
time)

7
Single Query, Single Source

Use ready bits to track what to do next
All 1s in single source
Use done bits to track what has been done
Tuple can be output when all bits set
Routing policy dynamically orders tuples

R2
R2
R1
R2
R2
R2
R1
R2
SELECT FROM R WHERE R.a gt 10 AND R.b lt 15
1 1 0 0
1 1 0 1
1 1 0 0
1 1 1 0
1 1 11
8
Multiple Queries 1
R.a gt 10
R.a gt 20
R1
R.a 0
Grouped Filters
R1
R.b lt 15
R1
R.b 25
R1
R.b ltgt 50
0 0 0 0 0
0 0 1 0 0
0 1 1 0 0
0 1 1 1 1
1 1 1 1 1
9
Multiple Queries 2
R.a gt 10
R2
R.a gt 20
R2
R.a 0
R2
Grouped Filters
R2
R2
R.b lt 15
R2
Reorder Operators!
R.b 25
R.b ltgt 50
1 1 1 1 1
0 0 0 0 0
0 0 0 1 1
1 0 0 1 1
1 1 0 1 1
10
Outputting Tuples

Store a completionMask bitmap for each query
One bit per operator
Set if the operator in the query
To determine if a tuple t can be output to query
q
Eddy ANDs qs completionMask with ts done bits
Output only if qs bit not set in ts
queriesCompleted bits
Applied every time a tuple returns from an
operator

completionMasks
Done 1100
QueriesCompleted0 0
Q1 1100
Q2 0111
Done 0111
11
Grouped Filters 1

Use binary trees to efficiently index range
predicates
Two trees (LT GT) per attribute
Insert constant
When tuple arrives
Scan everything to right (for GT) or left (for
LT) of the tuple-attribute in the tree
Those are the queries that the tuple does not
pass
Hash tables to index equality, inequality
predicates

12
Grouped Filters 2
Greater-than tree over S.a
S.a gt 1 S.a gt 7 S.a gt 11
13
Work sharing through Linegage
Q1 SELECT FROM s WHERE A, B, C Q2 SELECT
FROM s WHERE A, B, D
Conventional Queries
Query 1
Query 2
Lineage (Queries Completed) Enables Any Ordering!
sCDBA
Intersection of CD goes through AB an extra time!
sBC
sCDB
sBD
sAB
sAB
sCD
AB must be applied first!
sc
sD
sC
sB
s
s
s
s
Data Stream S
14
Overhead vs. Work Sharing

Overhead in additional bits per tuple
Experiments studying performance, size in paper
Bit / query / tuple is most significant
Trading accounting overhead for work sharing
100 bits / tuple allows a tuple to be processed
once, not 100 times
Reduce overhead by not keeping state about
operators tuple will never pass through

15
Joins

Use symmetric hash join to avoid blocking
Use State Modules (SteMs) to share storage
between joins with a common base relation

16
Joins via SteMs

Idea Share join indices over base relations
State Modules (SteMs) are
Unary indexes (e.g. hash tables, trees)
Built on the fly (as data arrives)
Scheduled by CACQ as first class operators
Based on symmetric hash join

17
Routing Policies 1

Basic Lottery Ticket Policy
Give operators tickets for consuming tuples, take
away tickets for producing them
To choose the next operator to route, run a
lottery
More selective operators scheduled earlier
Modification for CACQ
Give more tickets to operators shared by multiple
queries (e.g. grouped filters)
When a shared operator outputs a tuple, charge it
multiple tickets
Intuition cardinality reducing shared operators
reduce global work more than unshared operators
Not optimizing for the throughput of a single
query!

18
Routing Policies 2
Query
All attributes uniformly distributed over 0,100
19
CACQ vs. NiagaraCQ

NiagaraCQ is another proposal for CQ which uses
static plans
CACQ does better since
It is adaptive
It can exploit more work sharing opportunities

20
Summary

Efficient mechanism for processing multiple
simultaneous monitoring queries over streaming
data sources
Share work by processing all queries within a
single eddy
Continuous adaptivity via eddies routing policy
Queries come go, but performance adapts without
costly multiquery reoptimization
Maximize ability to work share by explicitly
encoding lineage
Share selections via grouped filter
Share join state via SteMs
Experimental results show good performance in
comparison with other proposed CQ systems

21
Discussion

What was the actual intellectual contribution of
this paper?
Performance on real data?
What is the overhead of the routing? Other
routing policies?
How often do we collect data? Can we do some
processing of the data in the nodes that acquire
it, in order to reduce bandwidth?
Hardware support for those service-oriented
component?

22
Thank you!!
ipandis_at_cs.cmu.edu

Write a Comment

User Comments (0)