Continuously Adaptive Continuous Queries (CACQ) over Streams - PowerPoint PPT Presentation

About This Presentation

Title:

Continuously Adaptive Continuous Queries (CACQ) over Streams

Description:

Samuel Madden, Mehul Shah, Joseph ... Dynamic operator ordering avoids static optimizer danger ... Policy dynamically orders operators on a per tuple basis ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 27

Provided by: samma7

Learn more at: https://ics.uci.edu

Category:

more less

Transcript and Presenter's Notes

Title: Continuously Adaptive Continuous Queries (CACQ) over Streams

1
Continuously Adaptive Continuous Queries (CACQ)
over Streams
Samuel Madden, Mehul Shah, Joseph Hellerstein,
and Vijayshankar Raman
Presented by Bhuvan Urgaonkar
2
CACQ Introduction

Proposed continuous query (CQ) systems are based
on static plans
But, CQs are long running
Initially valid assumptions less so over time
Static optimizers at their worst!
CACQ insight apply continuous adaptivity of
eddies to continuous queries
Dynamic operator ordering avoids static optimizer
danger
Process multiple queries simultaneously
Interestingly, enables sharing of work storage

3
Outline

Background
Motivation
Continuous Queries
Eddies
CACQ
Contributions
Example driven explanation
Results Experiments

4
Outline

Background
Motivation
Continuous Queries
Eddies
CACQ
Contributions
- Example driven explanation
Results Experiments

5
Motivating Applications

Monitoring queries look for recent events in
data streams
Sensor data processing
Stock analysis
Router, web, or phone events
In CACQ, we confine our view to queries over
recent-history
Only tuples currently entering the system
Stored in in-memory data tables for time-windowed
joins between streams

6
Continuous Queries

Long running, standing queries, similar to
trigger systems
Installed continuously produce results until
removed
Lots of queries, over the same data sources
Opportunity for work sharing!
Idea adaptive heuristics

7
Eddies Adaptivity

Eddies (Avnur Hellerstein, SIGMOD 2000)
Continuous Adaptivity
No static ordering of operators
Policy dynamically orders operators on a per
tuple basis
done and ready bits encode where tuple has been,
where it can go

8
Outline

Background
Motivation
Continuous Queries
Eddies
CACQ
Contributions
- Example driven explanation
Results Experiments

9
CACQ Contributions

Adaptivity
Policies for continuous queries
Single eddy for multiple queries
Tuple Lineage
In addition to ready and done, encode output
history in tuple in queriesCompleted bits
Enables flexible sharing of operators between
queries
Grouped Filter
Efficiently compute selections over multiple
queries
Join Sharing through State Modules (SteMs)

10
Explication By Example

First, example with just one query and only
selections
Then, add multiple queries
Then, (briefly) discuss joins

11
Eddies CACQ Single Query, Single Source
SELECT FROM R WHERE R.a gt 10 AND R.b lt 15

Use ready bits to track what to do next
All 1s in single source
Use done bits to track what has been done
Tuple can be output when all bits set
Routing policy dynamically orders tuples

R2
R2
R1
R2
R2
R2
R1
R2
R2 R2
a 15
b 0
R1 R1
a 5
b 25
1 1 0 0
1 1 0 1
1 1 0 0
1 1 1 0
1 1 11
12
Multiple Queries
R.a gt 10
R.a gt 20
R1
R.a 0
Grouped Filters
R1
R.b lt 15
R1
R.b 25
R1
R.b ltgt 50
R1 R1
a 5
b 25
0 0 0 0 0
0 0 1 0 0
0 1 1 0 0
0 1 1 1 1
1 1 1 1 1
13
Multiple Queries
R.a gt 10
R2
R.a gt 20
R2
R.a 0
R2
Grouped Filters
R2
R2
R.b lt 15
R2
Reorder Operators!
R.b 25
R.b ltgt 50
R2 R2
a 15
b 0
0 0 0 0 0
0 0 0 1 1
1 0 0 1 1
1 1 0 1 1
1 1 1 1 1
14
Outputting Tuples
completionMasks completionMasks completionMasks completionMasks completionMasks
? a b c d
Q1 1 1 0 0
Q2 0 1 1 1

Store a completionMask bitmap for each query
One bit per operator
Set if the operator in the query
To determine if a tuple t can be output to query
q
Eddy ANDs qs completionMask with ts done bits
Output only if qs bit not set in ts
queriesCompleted bits
Every time a tuple returns from an operator

completionMasks
Done 1100
QueriesCompleted0 0
Q1 1100
Q2 0111
Done 0111
15
Grouped Filter

Use binary trees to efficiently index range
predicates
Two trees (LT GT) per attribute
Insert constant
When tuple arrives
Scan everything to right (for GT) or left (for
LT) of the tuple-attribute in the tree
Those are the queries that the tuple does not
pass
Hash tables to index equality, inequality
predicates

Greater-than tree over S.a
S.a gt 1 S.a gt 7 S.a gt 11
16
Work Sharing via Tuple Lineage
Q1 SELECT FROM s WHERE A, B, C Q2 SELECT
FROM s WHERE A, B, D
Conventional Queries
Query 1
Query 2
Lineage (Queries Completed) Enables Any Ordering!
sCDBA
Intersection of CD goes through AB an extra time!
sBC
sCDB
sBD
sAB
sAB
sCD
AB must be applied first!
sc
sD
sC
sB
s
s
s
s
Data Stream S
17
Tradeoff Overhead vs. Shared Work

Overhead in additional bits per tuple
Experiments studying performance, size in paper
Bit / query / tuple is most significant
Trading accounting overhead for work sharing
100 bits / tuple allows a tuple to be processed
once, not 100 times
Reduce overhead by not keeping state about
operators tuple will never pass through

18
Joins in CACQ

Use symmetric hash join to avoid blocking
Use State Modules (SteMs) to share storage
between joins with a common base relation
Detail about effect on implementation benefit
in paper
See Raman, UC Berkeley Ph.D. Thesis, 2002.

19
Routing Policies

Previous system provides correctness policy
responsible for performance
Consult the policy to determine where to route
every tuple that
Enters the system
Returns from an operator
Basic Ticket Policy
Give operators tickets for consuming tuples, take
away tickets for producing them
To choose the next operator to route, run a
lottery
More selective operators scheduled earlier
Modification for CACQ
Give more tickets to operators shared by multiple
queries (e.g. grouped filters)
When a shared operator outputs a tuple, charge it
multiple tickets
Intuition cardinality reducing shared operators
reduce global work more than unshared operators
Not optimizing for the throughput of a single
query!

20
Outline

Background
Motivation
Continuous Queries
Eddies
CACQ
Contributions
- Example driven explanation
Results Experiments

21
Evaluation

Real Java implementation on top of Telegraph QP
4,000 new lines of code in 75,000 line codebase
Server Platform
Linux 2.4.10
Pentium III 733, 756 MB RAM
Queries posed from separate workstation
Output suppressed
Lots of experiments in paper, just a few here

22
Results Routing Policy
All attributes uniformly distributed over 0,100
Query
1
From S select index where a gt 90
From S select index where a gt 90 and b gt 70
From S select index where a gt 90 and b gt 70 and c gt 50
From S select index where a gt 90 and b gt 70 and c gt 50 and d gt 30
From S select index where a gt 90 and b gt 70 and c gt 50 and d gt 30 and e gt 10
2
3
4
5
23
CACQ vs. NiagaraCQ

Performance Competitive with Workload from NCQ
Paper
Different workload where CACQ outperforms NCQ

result gt stocks
Expensive
SELECT stocks.sym, articles.text FROM
stocks,articles WHERE stocks.sym articles.sym
AND UDF(stocks)
See Chen et al., SIGMOD 2000, ICDE 2002
24
CACQ vs. NiagaraCQ 2
SA
SA
SA
Lineage Allows Join To Be Applied Just Once
S
A
No shared subexpressions, so no shared work!
25
CACQ vs. NiagaraCQ Graph
26
Conclusion

CACQ sharing and adaptivity for high performance
monitoring queries over data streams
Features
Adaptivity
Adapt to changing query workload without costly
multi-query reoptimization
Work sharing via tuple lineage
Without constraining the available plans
Computation sharing via grouped filter
Storage sharing via SteMs
Future Work
More sophisticated routing policies
Batching query grouping
Better integration with historical results
(Chandrasekaran, VLDB 2002)