Continuous%20Query%20Languages%20for%20DSMS - PowerPoint PPT Presentation

About This Presentation

Title:

Continuous%20Query%20Languages%20for%20DSMS

Description:

In PODS, Santa Barbara, CA, May 2001. [25]P. Seshadri. ... Tribeca: A stream database manager for network traffic analysis. In VLDB, 1996. ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 32

Provided by: mir135

Learn more at: http://web.cs.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: Continuous%20Query%20Languages%20for%20DSMS

1
Continuous Query Languages for DSMS

CS240B Notes
by
Carlo Zaniolo

2
DSMS Languages for Continuous Queries

Relational Algebra Operators
SQL and User-Defined aggregates
The Blocking problem
The expressive Power problem
XML streams and their query languages.

3
CQLs for DSMS

Most of DSMS projects use SQL for continuous
queriesfor good reasons, since
Many applications span data streams and DB tables
A CQL based on SQL will be easier to learn use
Moreover the fewer the differences the better!
But DSMS were designed for persistent data and
transient queries---not for persistent queries on
transient data
Adaptation of SQL and its enabling technology
presents many research challenges
Lack of expressive powereven worse now since
only nonblocking operators are allowed.

4
Continuous Query Graph many componentsarbitrary
DAGs
?
Source1
U
Sink
Source2
s
5
Relational Algebra Operators

Stored data
Selection, Projection
Union
Join (including X) on tables
Set Difference
Aggregates
Traditional Blocking aggregates
OLAP functions on windows or unlimited preceding

Data Streams
... same
Union by Sort-Merging on timestamps
Join of Stream with table
Window joins on streams (timestamps merged into 1
column)
No stream difference (blockingdiff of stream
with table OK).
Aggregates
No blocking aggregate
OLAP functions on windows or unlimited preceding
Slides, and tumbles.

6
Bolts and Nuts

create stream bids(bid, item, offer, Time)
create stream mybids as (select bid, offer, Time
from bids
where itembolt
union
select bid, offer,
Time
from bids
where itemnut)
Result same as select bid, offer, Time where
item bolt or itemnut

7
Joins

We could create a stream called interesting bids
by say joining bids with the interesting_items
table.
We next find the bolt bids for which there was a
nut bid offered in the last 5 minutes for the
same price.
create stream selfjoinbids as
(select S1.bid, S1.offer, S2.bid,
S2.Time
from bids as S1, bids as S2 window of
5 minutes
where S1.itembolt and S2.itemnut and
S1.offerS2.offer)
The window condition implies that S1.Time gt
S2.Time and S2.Time gt S1.Time-5 minutes.
Windows on both streams are used very often.

\
8
Processing Unions

Union When tuples are present at all inputs,
select one with minimal timestamp and
Production add this tuple to the output, and
Consumption remove it from the input.

9
Window Joins
A
B

Window Join of Stream A and Stream BWhen tuples
are present at both inputs, and the timestamp of
A is less or equal than that of B, then perform
the following operations (symmetric operations
are performed if timestamp of B is less or equal
than that of A)
Production compute the join of the tuple in A
with the tuples in W(B) and add the resulting
tuples to output buffer (these tuple have the
same timestamp a the tuple in A)
Consumption the current tuple in A is removed
from the input and added to the window buffer
W(A) (from which the expired tuples are also
removed)

10
Relational Algebra Operators

Stored data
Selection, Projection
Union
Join (including X) on tables
Set Difference
Aggregates
Traditional Blocking aggregates
OLAP functions on windows or unlimited preceding

Data Streams
... same
Union by Sort-Merging on timestamps
Join of Stream with table
Window joins on streams (timestamps merged into 1
column)
No stream difference (blockingdiff of stream
with table OK).
Aggregates
No blocking aggregate
OLAP functions on windows or unlimited preceding
Slides, and tumbles.
Including UDAs

11
User-Defined AggregatesMax Power via Min SQL
Extensions

Windows (logical, physical, slides, tumbles,)
flexible synopses that solve the blocking problem
for aggregates
DSMS only support these constructs on built-in
aggregates
ESL is the first to support the complete
integration of these two
User Defined Aggregates (UDAs) the key to power
and extensibility, and
And thus can support data mining,
XML,
sequences not supported by other DSMS
One framework for aggregates and windows, whether
they are built-ins or user-defined, and
independent on the language used to define
them.

12
Defining Traditional Aggregates

Specification consists of 3 blocks of code---
Written in an external PL (as DBMS and other DSMS
do), or
In SQL itself (SQL becomesTuring Complete!)
INITIALIZE
Executed upon the arrival of the first tuple
ITERATE
Executed upon the arrival of each subsequent
tuples (an incremental computation suitable for
streams)
TERMINATE
Executed after the end of the relation/stream has
been reached
Invocation SELECT myavg(start_price) FROM
OpenAuction

13
The UDA AVG in SQL

AGGREGATE avg(Next Int) Real
TABLE state(tsum Int, cnt Int)
INITIALIZE
INSERT INTO state VALUES (Next, 1)
ITERATE
UPDATE state
SET tsumtsumNext, cntcnt1
TERMINATE
INSERT INTO RETURN
SELECT tsum/cnt FROM state

INSERT INTO RETURN in TERMINATE ? a blocking
UDA

14
NonBlocking UDA AVG of last 200 Values

AGGREGATE myavg(Next Int) Real
TABLE state(tsum Int, cnt Int)
INITIALIZE
INSERT INTO state VALUES (Next, 1)
ITERATE
UPDATE state SET tsumtsumNext, cntcnt1
INSERT INTO RETURN
SELECT tsum/cnt FROM state
WHERE cnt 200 0
UPDATE state SET tsumNext, cnt1
WHERE cnt 200 1
TERMINATE

Empty TERMINATE Denotes a non-blocking UDA

15
UDAs in ESL

In ESL user-defined Aggregates (UDAs) can be
defined directly in SQL, rather than in a PL
Native extensibility in SQL via UDAs (which can
also be defined in a PL for better performance)
No impedance mismatch
Access to DB tables from UDAs
Data Independence and optimization
Good ease of use and performance
Turing completeness nb-completeness.

16
Data Intensive Applications UDAs

Complex Applications can expressed concisely,
with good performance
ATLAS a single-user DBMS developed at UCLA.
Support for SQL with UDAs
On top of Berkeley-DB record manager.
Data Mining Algorithms in ATLAS
Decision Tree Classifiers 18 lines of codes
APriori 40 lines of codes
Modest overhead lt50 w.r.t procedural UDA
Data Stream Applications in ESL
Data Stream Mining, approximate aggregates,
sketches, histograms,

17
SQL2003 OLAP FunctionsAggregates on Windows
CREATE STREAM ClosedAuction (/auction closings
/ itemID, /id of
the item in this auction./ buyerID
/buyer of this item./) Final price real
/final price of the item /,
Current_time) order by source
Auctions

For each seller, show the average selling price
over the last 10 items sold (physical window)

CREATE STREAM LastTenAvg SELECT
sellerID, AVG(price) OVER(PARTITION BY sellerID
ROWS 9 PRECEDING), Current_time FROM
ClosedPrice
18
Optimizing Window AVG in ESL

For each expired tuple decrease the count by one
and the sum by the expired valueworks for
logical physical windows

WINDOW AGGREGATE avg(Next Real) Real TABLE
state(tsum Int, cnt Real) TABLE
inwindow(wnext Real)
INITIALIZE INSERT INTO state
VALUES (Next, 1)
ITERATE UPDATE state SET tsumtsumNext,
cntcnt1 INSERT INTO RETURN SELECT
tsum/cnt FROM state
EXPIRE /if there are expired tuples, take the
oldest / UPDATE state SET cnt
cnt-1, tsum tsum (select wnext
FROM inwindow
WHERE oldest(inwindow))

19
MAX

System maintains inwindow
Remove dominated (less older) values
The oldest is always the max.

WINDOW AGGREGATE max (Next Real) Real TABLE
inwindow(wnext real) INITIALIZE etc.
/system adds new tuples to inwindow/
ITERATE DELETE FROM inwindow WHERE
wnext ltNext INSERT INTO RETURN
SELECT wnext FROM inwindow
WHERE oldest(inwindow)
EXPIRE /expired tuples removed
automatically/
20
For Each Aggregate two versions

The traditional Base aggregate with terminate
The Window aggregate with inwindow and expire.
These definitions will take care of both logical
and physical windows.
But there are more complications slides and
tumbles.

21
Slides and Tumbles

Every two minutes, show the average selling
price over the last 10 minutes (logical window)

CREATE STREAM LastTenAvg SELECT sellerID,
max(price) OVER(RANGE 10 MINUTE PRECEDING
SLIDE 2
MINUTE), Current_time FROM ClosedPrice
Here the window is W10 and the slide is S2.
Tumble When S W
22
SLIDEs

The slide constructs divides a window into panes,
results only returned at the end of each pane
Slide is conducive to optimization.
Combine summaries into the desired aggregation
E.g. MAX(1, 2, 3, 4) MAX(MAX(1,2), MAX(3,4))
4
I.e., for MAX, we can perform MAX on subsets of
numbers as local summaries, then combine them
together to get the true MAX
Proposed before but what constructs should be
used to integrate these concepts into the
language?

23
Slides Tumbles--Examples

Tumble where the SLIDE size is equal or larger
than the window size
E.g. Once every 50 tuples, compute and return
average over the last 10 tuples
Easy to optimize
Skip the first 40 tuples of every 50 tuples, and
compute the blocking base version of the
aggregate on the last 10
Slide where slide size is smaller than the
window size
E.g. Once every 10 tuples, compute and return
average over the last 50 tuples
Naïve implementation--not optimized
Perform incremental maintenance on every incoming
tuple
Ignore RETURN statements for most incoming tuples
Only invoke RETURN once every 10 tuples

24
Pane-based SLIDE Optimization

Two-level cascading aggregates using two existing
aggregates
Perform sub-aggregation inside each pane using
the base aggregate No need for incremental
maintenance here
Computed with a blocking aggregate once for each
pane
Combine the summary tuples using the window
aggregate that returns on every incoming tuple
(non-blocking)
With incremental maintenance here
At any time, only the last un-finished pane needs
to store data tuples
all finished panes are reduced to one reusable
summary tuple

25
Pane-based SLIDE optimization

Example SUM with window size 50 tuples, and
slide size 10 tuples
First create a stream of summary tuples using
base aggregate
CREATE STREAM temp AS (
SELECT itemID, base_max(sale_price) as s
OVER(PARTITION BY itemID ROWS 9 PRECEDING
SLIDE 10)
FROM Auction)
Then apply the window version of the aggregate
SELECT itemID, window_max(s)
OVER(PARTITION BY itemID ROWS 4 PRECEDING)
FROM temp

This simple approach can be used to implement
very complex aggregations (e.g. ensemble
classifiers)
Applies uniformly to logical/physical windows
defined in SQL or in an external language

26
Summary

Logical, Physical x tumble, slide,
unlimited_preceding
Six different types of calls, supported by two
definitions
Both SQL or procedural languages can be used in
the definition.

27
Window UDAs vs. Base UDAs

Base UDAs
called as traditional SQL-2 aggregates, with
optional GROUP BY
Window UDAs
called with SQL2003 OVER clause
logical or physical windows
optional PARTITION BY and SLIDE clauses in ESL
Clear semantics and optimization rules unify
UDAsSQL or PL-defined, algebraic or not
window (logical physical), slice, tumbles,
etc.
System and user roles in optimization.

28
Window UDAs Physical Optimization

The Stream Mill System provides efficient support
for
Management of new expiring tuples in buffer
Main memory intelligent paging into disk
Events caused by tuple expiration
Users can access the buffer as the table called
inwindow

29
Conclusion

Language Technology
ESL a very powerful language for data stream and
DB applications
Simple semantics and unified syntax conforming
to SQL2003 standards
Strong case for the DB-oriented approach to data
streams
System Technology
Some performance-oriented techniques
well-developede.g., buffer management for
windows
For others work is still in progressstay tuned
for latest news
Stream Mill is up and running http//wis.cs.ucla.
edu/stream-mill

The End
THANK YOU !

31
References

1ATLaS user manual. http//wis.cs.ucla.edu/atlas
.
2SQL/LPP A Time Series Extension of SQL Based
on Limited Patience Patterns, volume 1677 of
Lecture Notes in Computer Science. Springer,
1999.
4A. Arasu, S. Babu, and J. Widom. An abstract
semantics and concrete language for continuous
queries over streams and relations. Technical
report, Stanford University, 2002.
5B. Babcock, S. Babu, M. Datar, R. Motwani, and
J. Widom. Models and issues in data stream
systems. In PODS, 2002.
9D. Carney, U. Cetintemel, M. Cherniack, C.
Convey, S. Lee, G. Seidman, M. Stonebraker, N.
Tatbul, and S. Zdonik. Monitoring streams - a new
class of data management applications. In VLDB,
Hong Kong, China, 2002.
10J. Celko. SQL for Smarties, chapter Advanced
SQL Programming. Morgan Kaufmann, 1995.
11S. Chandrasekaran and M. Franklin. Streaming
queries over streaming data. In VLDB, 2002.
12J. Chen, D. J. DeWitt, F. Tian, and Y. Wang.
NiagaraCQ A scalable continuous query system for
internet databases. In SIGMOD, pages 379-390, May
2000.
13C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk,
and O. Spatscheck. Gigascope A stream database
for network applications. In SIGMOD Conference,
pages 647-651. ACM Press, 2003.
14Lukasz Golab and M. Tamer Özsu. Issues in
data stream management. ACM SIGMOD Record,
32(2)5-14, 2003.
15J. M. Hellerstein, P. J. Haas, and H. J.
Wang. Online aggregation. In SIGMOD, 1997.
16 Yijian Bai, Hetal Thakkar, Chang Luo, Haixun
Wang, Carlo Zaniolo A Data Stream Language and
System Designed for Power and Extensibility.
Proc. of the ACM 15th Conference on Information
and Knowledge Management (CIKM'06), 2006
17 Yijian Bai, Hetal Thakkar, Haixun Wang and
Carlo Zaniolo Optimizing Timestamp Management in
Data Stream Management Systems. ICDE 2007.

32
References (Cont.)

18 Yan-Nei Law, Haixun Wang, Carlo Zaniolo
Query Languages and Data Models for Database
Sequences and Data Streams. VLDB 2004 492-503
19 Sam Madden, Mehul A. Shah, Joseph M.
Hellerstein, and Vijayshankar Raman. Continuously
adaptive continuous queries over streams. In
SIGMOD, pages 49-61, 2002.
20R. Motwani, J. Widom, A. Arasu, B. Babcock,
M. Datar S. Babu, G. Manku, C. Olston, J.
Rosenstein, and R. Varma. Query processing,
approximation, and resource management in a data
stream management system. In First CIDR 2003
Conference, Asilomar, CA, 2003.
21R. Ramakrishnan, D. Donjerkovic, A.
Ranganathan, K. Beyer, and M. Krishnaprasad.
SRQL Sorted relational query language, 1998.
23Reza Sadri, Carlo Zaniolo, and Amir M.
Zarkesh andJafar Adibi. A sequential pattern
query language for supporting instant data
minining for e-services. In VLDB, pages 653-656,
2001.
24Reza Sadri, Carlo Zaniolo, Amir Zarkesh, and
Jafar Adibi. Optimization of sequence queries in
database systems. In PODS, Santa Barbara, CA, May
2001.
25P. Seshadri. Predator A resource for
database research. SIGMOD Record, 27(1)16-20,
1998.
26P. Seshadri, M. Livny, and R. Ramakrishnan.
SEQ A model for sequence databases. In ICDE,
pages 232-239, Taipei, Taiwan, March 1995.
27Praveen Seshadri, Miron Livny, and Raghu
Ramakrishnan. Sequence query processing. In ACM
SIGMOD 1994, pages 430-441. ACM Press, 1994.
28M. Sullivan. Tribeca A stream database
manager for network traffic analysis. In VLDB,
1996.
29D. Terry, D. Goldberg, D. Nichols, and B.
Oki. Continuous queries over append-only
databases. In SIGMOD, pages 321-330, 6 1992.
30Peter A. Tucker, David Maier, Tim Sheard, and
Leonidas Fegaras. Exploiting punctuation
semantics in continuous data streams. IEEE Trans.
Knowl. Data Eng, 15(3)555-568, 2003.
31Haixun Wang and Carlo Zaniolo. ATLaS a
native extension of SQL for data minining. In
Proceedings of Third SIAM Int. Conference on Data
MIning, pages 130-141, 2003.