Chapter 10: Stream-based Data Management - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Chapter 10: Stream-based Data Management

Description:

Future Plans: Borealis. Dynamic revision of query results ... RFID, cell phone applications. Include current status of Borealis implementation. ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 18
Provided by: spatia
Category:

less

Transcript and Presenter's Notes

Title: Chapter 10: Stream-based Data Management


1
Chapter 10 Stream-based Data Management
  • Title Retrospective on Aurora
  • Authors Hari Balakrishnan, et. al.

2
Design, Implementation, and Evaluation of the
Linear Road Benchmark on the Stream Processing
Core
  • Problem
  • Problem Statement
  • Why is this problem important?
  • Why is this problem hard?
  • Approaches
  • Approach description, key concepts
  • Contributions (novelty, improved)
  • Assumptions

3
Problem Statement
  • Given
  • Stream data
  • Experience on the development of five
    stream-based applications using Aurora stream
    processing engine
  • Find
  • Key requirements of streaming applications
  • Objectives
  • Reflect on the design of Aurora based on this
    experience
  • Eliminate the limitations and address new
    challenges on a follow-on project, Borealis
  • Constraints
  • Data streams arrive in no particular order.
  • Data streams arrive without any temporal
    regularity.

4
Why is this problem important?
  • Stream-processing applications
  • Financial Services stock ticker
  • Transportation congestion pricing, dynamic
    tolls
  • Sensor Networks Environment monitoring
  • Defense Battalion monitoring

5
Why is this problem Hard?
  • High update rate
  • Time-series
  • Streaming applications entail time series.
  • Time series operations are not well supported by
    current DBMSs.
  • Real-time constraints
  • Outbound processing, where data are stored before
    being processed, cannot deliver real-time
    latency.
  • SPEs must adopt inbound processing, where query
    processing is performed directly on incoming
    messages.
  • Spikes in message load.
  • Incoming traffic is bursty.
  • Quality of Service (QOS) requirements

6
Novel Contributions
  • Comparison with SQL-centric related Work
  • Data Flow Network (DFN) centric
  • Developer compose DFN using graphical user
    interface
  • Optimizer rearrange DFN, e.g. swap boxes,
  • Compiler Translate DFN to intermediate
    representation
  • Run-time Schedule tasks based on QOS
    requirements
  • Other Contributions Lessons Learnt
  • Identify characteristics of streaming
    applications
  • from 5 case studies
  • Identify core performance tuning ideas

7
Aurora Architecture
  • Aurora is based on a dataflow-style boxes
    arrows paradigm unlike others using SQL style
    query interface. (i.e., performing query back and
    forth adds system overhead and latency.)
  • Can be spread across any number of machines for
    scalability and availability.

Input
Output
Operator
Aurora Operators
Aurora GUI
8
Aurora Case Study 1 Financial Services
  • An application detects feed problems and triggers
    switch between feeds in real time.
  • Hierarchical Alarm
  • Low alarm is triggered when update is delayed
    beyond threshold (e.g., 5 sec).
  • High alarm is triggered when low alarms
    accumulate beyond threshold (e.g., 100 times).
  • Boxes in red circle separate the alarms from
  • both Reuters and Comstock into alarms from
  • NYSE and alarms from NASDAQ.
  • Filter Merging techniques
  • This case study illustrates the ability to detect
    stream imperfections and extend functionality
    using user-defined Map functions.

9
Aurora Case Study 2 Linear Road Benchmark
  • Linear Road is a bench mark for stream processing
    eingines.
  • Simulates an unban highway system that uses
    variable tolling (i.e, congestion-based
    pricing).
  • Linear Road should support for
  • Two continuous queries
  • Calculates a segment toll every time a vehicle
    enters the segment.
  • Detects and reports accidents and adjusts tolls
    accordingly.
  • Three Historical queries
  • Request an account balance
  • Days total expenditure for a given vehicle
  • Prediction of travel time between two segments
    using historical data
  • Each of these queries must be answered with a
    specified accuracy and within a specified
    response time.

10
Aurora Case Study 3 Battalion Monitoring
  • Aircrafts gather data and send them to monitoring
    stations.
  • Enemy units cross a given line, signaling an
    attack.
  • The limited resource is the bandwidth between
    aircraft and ground. When an attack is initiated,
    selective dropping of data is allowed to serve
    important classes.
  • Authors could test their load-shedding
    techniques.
  • Insert random drop boxes to discard a fraction of
    their input tuples.
  • Insert semantic, predicate-based drop filters.
  • Observations
  • The semantic load-shedding techniques achieve the
    least value utility loss.
  • As load increases, two techniques show similar
    performance.
  • At high loads, all algorithms converge to same
    loss levels.

11
Aurora Case Study 4 Environmental Monitoring
  • Monitoring toxins in water.
  • Stream data is fish behavior (e.g., breathing
    rate) and water quality (e.g., temperature).
  • When the fish behave abnormally, an alarm is
    sounded.
  • The water data contain 1,2, and 4 hour sliding
    windows.
  • Ease of developing stream applications
  • Aurora proved very convenient for sliding window
    calculation.
  • Auroras GUI proved invaluable.

12
Aurora Case Study 5 Medusa
  • Is a distributed stream-processing system using
    Aurora.
  • Takes Aurora queries and distributes them across
    multiple nodes.
  • Offers several Benefits
  • Incremental scalability over multiple nodes.
  • High availability by mutual monitoring between
    nodes.
  • Composition of stream feeds from different
    participants.
  • Handling load spikes by federated system.

13
Lessons Learnt Application Characteristics
  • Common Queries
  • Historical data using Open window
  • Last 10 weeks worth of toll data for each driver
  • Aggregate - How much a driver has spent on tolls
    over past 10 weeks?
  • Tables of historical data with arbitrary update
    patterns
  • Synchronization
  • Stream applications rely on shared data and
    computation.
  • WaitFor (P Predicate, T Timeout)
  • Unpredictable stream behavior
  • Financial services application detects arrival
    rate of a stream.
  • Military application adjust resources during
    times of stress.

14
Application Characteristics
  • XML and other feed formats
  • E.g., stock quote data in XML format
  • In case, protocol stability and portability are
    important than a minor performance loss.
  • Programmatic interfaces globally accessible
    catalogs
  • Scripting Aurora networks
  • Metadata

15
Lessons Learnt Performance Tuning
  • Requirements
  • Main memory implementation
  • Data movement across DFN elements
  • Scheduling of DFN elements
  • Performance Decisions
  • Memory copying memcpy() implementations
  • Scheduler
  • Reduce scheduler overheads by aggressive
    profiling
  • Tight loops
  • keep unnecessary house-keeping out of tight loops
  • Data-structures
  • Optimize data-structures used to implement DFN
    elements

16
Future Plans Borealis
  • Dynamic revision of query results
  • Intelligently corrects query results that have
    already been emitted with the corrected data that
    arrive later.
  • Dynamic query modification
  • E.g., traders wish to be alerted of interesting
    events, where the defn of interesting varies.
  • Distributed optimization
  • Server-heavy or sensor-heavy optimization problem
    becomes emerging.
  • More flexible optimization to handle a very large
    of devices
  • Implementation plans

17
Summary
  • Papers focus
  • Identify the requirements of stream applications
    by the experience from the design and
    implementation of Aurora stream-processing engine
  • Ideas
  • Describe five applications and their
    implementation in detail.
  • Reflect on the design of Aurora based on the
    experience.
  • Discuss future ideas on follow-on project.
  • Contributions
  • Identify key requirements of streaming
    applications
  • Analytical Validation
  • Case study

18
Assumptions, Rewrite today
  • Assumptions
  • Archiving is not necessary!
  • Performance more important than declarative query
    language
  • Rewrite today
  • Compare performance with competition, e.g. STREAM
  • Allow archiving along with stream processing
  • Consider other applications
  • RFID, cell phone applications
  • Include current status of Borealis implementation.
Write a Comment
User Comments (0)
About PowerShow.com