STREAM: The Stanford Data Stream Management System - PowerPoint PPT Presentation

About This Presentation
Title:

STREAM: The Stanford Data Stream Management System

Description:

No intelligent way for applications with continuous data streams to query data ... Reinvestigate data management and query processing ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 21
Provided by: shen162
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: STREAM: The Stanford Data Stream Management System


1
STREAM The Stanford Data Stream Management System
  • Proponent Team
  • Andy Mason
  • Sheng ZhongCS525s - Fall 2006

2
Introduction
  • Motivation
  • Overview
  • Features
  • User Interfaces
  • Future work

3
Motivation
  • No intelligent way for applications with
    continuous data streams to query data using
    regular databases
  • Network monitoring
  • Telecommunications data management
  • Sensor networks
  • Reinvestigate data management and query
    processing
  • Multiple, continuous, rapid, time-varying data
    streams
  • Goal is to address the following
  • Basic theory results
  • Algorithms
  • Implementing a comprehensive prototype data
    stream management system

4
Overview of STREAM
  • STanford stREam datA Manager (STREAM)
  • Developed by Stanford in 2002/2003-2005 (Ended,
    Jan 2006)
  • Uses traditional relational operators in a
    streaming context
  • Basic query
  • Concept SQL lttuple, timestampgt
  • Relational database
  • relation in, relation out (i.e.
    relation-to-relation)
  • STREAM
  • stream-to-relation
  • stream-to-stream
  • relation-to-relation
  • Developed new query language
  • Continuus Query Language (CQL)

5
Overview of STREAM (2)
  • Data streams and stored relations
  • Declarative language for registering continuous
    queries
  • Flexible query plans and execution strategies
  • Textual, graphical, and application interfaces
  • Relational, centralized (for now)

Ref Widom, slide 8
6
The Big Picture
DSMS
Scratch Store
Stored Relations
Ref Widom, slide 6
7
Network Monitoring
Intrusion Warnings
Online Performance Metrics
Register Monitoring Queries
DSMS
Network measurements, Packet traces
Scratch Store
Lookup Tables
Ref Widom, slide 7
8
CQL Continuous Query Language
  • Essentially a minor extension to SQL
  • Stream-to-relation
  • Sliding window over a stream
  • Relation-to-stream
  • 3 operators Istream, Dstream, Rstream
  • Relation-to-relation
  • Basically uses standard SQL

9
CQL Examples
  • Windowed join of two streams S1 and S2
  • Select from S1 Rows 1000,
  • S2 Range 2 Minutes
  • Where S1.AS2.A and S1.A gt 10
  • Probe stored table R based on each tuple in
    stream S and stream the result
  • Select Rstream(S.A, R.B) from S Now, R
  • Where S.A R.A

10
Query Plans
  • Each registered CQL query is compiled into a
    STREAM query plan
  • Query plan components
  • Operators perform the processing
  • Queues buffer tuples
  • Synopses store operator state

11
Simple Query Plan
  • Select from
  • S1 Rows 1000,
  • S2 Range 2 Minutes
  • Where
  • S1.AS2.A
  • And
  • S1.A gt 10

Ref STREAM, Section 3.4
12
Synopsis Sharing
  • Multiple synopsis within a single query plan
    materializing nearly identical relations
  • Replace two synopses with lightweight stubs
  • Use a single store to hold the actual tuples

Ref STREAM, Section 4
13
Operator Scheduling
  • Global scheduler invokes run method of query plan
    operators with timeslice parameter
  • Many possible scheduling objectives minimize
    latency, memory use, computation, inaccuracy,
    starvation,
  • Round-robin
  • Minimize queue sizes
  • Minimize combination of queue sizes and latency
  • Parallel versions of above

Ref Widom, slide 39
14
Handling Stream Overloads
  • Load-shedding discarding tuples
  • Goal deliver best possible approximate answer
    while not falling behind
  • What is definition of best?
  • Maximum subset
  • Maximum random sample
  • We have techniques with provable guarantees for
    specific query types
  • Extremely hard problem for general plans

Ref Widom, slide 40
15
Notes on Time in STREAM
  • All stream elements have timestamps
  • Necessary for time-based windows
  • Necessary for consistent well-defined semantics
    over multiple streams and updatable relations
  • Basic correctness requirement query processor
    must see stream elements in timestamp order
  • Easy when time is centralized system clock
  • Stream elements timestamped on entry to system

Ref Widom, slide 43
16
Application-Defined Time
  • Streams may contain application timestamps
  • Sensor readings, financial transactions, etc.
  • Elements may arrive out of order at DSMS
  • Distributed streams with time skew among them
  • Latency reaching DSMS
  • Reordering on transmission channel
  • Our solution heartbeats
  • Provided by application or deduced from measured
    parameters (skew, latency, etc.)

Ref Widom, slide 44
17
STREAM User Interfaces
  • Main interface (initial)
  • Web
  • Register queries
  • STREAM visualizer
  • Windows-based application
  • View query plans
  • View detailed system information (memory, cpu)
  • Dynamically adjust properties
  • View monitoring graphs
  • Platform independent usage (planned)
  • SOAP (HTTP/XML) interface
  • Register queries
  • Stream results over network

18
Future Work
  • Distributed Stream Processing
  • Crash Recovery
  • Improved Approximation
  • Relationship to Publish-Subscribe Systems

19
Summary
  • Motivation behind STREAM
  • STREAM Stanfords solution
  • Core features
  • Future Work

20
References
  • Widom, Jennifer The Stanford Data Stream
    Management System
  • http//www-db.stanford.edu/widom/stream-talk.ppt
  • STREAM The Stanford Data Stream Management
    System
  • http//web.cs.wpi.edu/cs525/f06s-EAR/cs525-homepa
    ge_files/LITERATURE/STREAM-overview-book-chapter-2
    004-20.pdf
Write a Comment
User Comments (0)
About PowerShow.com