Chapter 10: Stream-based Data Management - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 10: Stream-based Data Management

Description:

... based Data Management ... Study the behavior of the streaming infrastructure support ... Input data is increasing over time for stress-test. Scalability ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 15
Provided by: spatia
Category:

less

Transcript and Presenter's Notes

Title: Chapter 10: Stream-based Data Management


1
Chapter 10 Stream-based Data Management
  • Title Design, Implementation, and Evaluation of
    the Linear Road Benchmark on the Stream
    Processing Core
  • Authors Navendu Jain, Lisa Amini, et. al.

2
Design, Implementation, and Evaluation of the
Linear Road Benchmark on the Stream Processing
Core
  • Problem
  • Problem Statement
  • Why is this problem important?
  • Why is this problem hard?
  • Approaches
  • Approach description, key concepts
  • Contributions (novelty, improved)
  • Assumptions

3
Problem Statement
  • Given
  • Stream data, continuous queries in large-scale
    distributed environments
  • Streaming data application (Linear Road)
  • Stream processing middleware (Stream Processing
    Core, SPC)
  • Find
  • Performance bottlenecks of streaming data
    applications
  • Objectives
  • Understand the performance characteristics of the
    stream data application
  • Constraints
  • SPC is constantly overloaded with respect to the
    available resources.
  • Processing elements are a mix of I/O-bound as
    well as CPU-bound.
  • It is unrealistic for applications to store the
    full history of a stream in memory. ?
    Memory-bound.

4
Why is this problem important?
  • High volume, continuous data are ubiquitous.
  • Text and transactional data
  • Digital audio, video, and image
  • Instant messages, network packet traces
  • Sensor data
  • Stream processing applications become important
    in the networking and database community.

5
Why is this problem Hard?
  • Stream data are
  • Large volume
  • High data rates
  • Generated by multiple distributed data sources
  • Rapidly updated
  • Processing stream data requires
  • Filtering
  • Aggregation
  • Correlation
  • A system supporting the stream data processing
    applications should consider
  • Scalability
  • Latency
  • Resource utilization

6
Novelty of Contribution
  • Related Work
  • DataCutter, StreaMIT Connections between
    applications are statically determined.
  • TelegraphCQ, Aurora, Borealis, STREAM provide
    support for stream data manipulation from a
    database-centric perspective, but, process
    streams of tuples individually. (i.e.,
    small-scale)
  • Benchmarks Previous works on Linear Road did not
    report any performance number
  • Contributions
  • SPC is dynamic application composition.
  • Evaluate the SPC using the Linear Road
    application employing multiple distributed
    configurations.
  • ? Highly scalable implementation of the Linear
    Road application
  • Study the behavior of the streaming
    infrastructure support for large-scale continuous
    and historical queries.
  • ? Addressing performance bottlenecks and tuning
    them.

7
SPC Architecture
  • Publish-subscribe model
  • Each processing element (PE) that consumes and
    produces stream data specifies the
    characteristics of the streams.
  • SPC dynamically determines the stream connections
    by matching stream descriptors as new
    applications and new data sources join and leave
    the system.
  • Reusing streams
  • Results in significant resource savings.
  • Discovers useful info. over an ever-changing set
    of data sources.

8
Performance Challenges and Optimizations in SPC
  • Challenges
  • The PEs consist of performing
  • Small amount of processing on large volumes of
    data
  • Large amount of processing on lower volumes of
    data
  • Thus, a mix of I/O-bound CPU-bound
  • Impossible to store stream history in memory ?
    memory-bound
  • Optimizations
  • SDO filtering SPC can filter out unwanted
    objects ? saving resources.
  • Events PEs can subscribe to system events. ? Can
    adapt its algorithm.
  • Dynamic copies of PEs

9
Linear Road Benchmark
  • Simulates the traffic characteristics of a simple
    urban expressway system.
  • Input to the Linear Road benchmark is stream data
    format.
  • Requires stream-based data management system
    (SDMS) to process a set of continuous and
    historical queries.

10
Prototype Implementation
  • Design principles
  • Modularity
  • Data Aggregation
  • Network and Data Locality
  • Flexible Programming Environment
  • Linear Road in SPC
  • The figure shows the
  • query network
  • infrastructure
  • comprising 15 PEs.

11
Experiments
  • Input data is increasing over time for
    stress-test
  • Scalability

12
Experiments
  • Analyzing Bottleneck PEs
  • PE Placement Policy

13
Summary
  • Papers focus
  • Understanding the performance characteristics of
    stream processing applications in a distributed
    setup
  • Ideas
  • Design and implementation of the Linear Road
    benchmark on the SPC middleware.
  • Identify the main performance bottlenecks to
    achieve scalability and low query response
    latency
  • Contributions
  • Demonstrate a scalable distributed implementation
    of Linear Road
  • Highlight the importance of addressing
    performance bottlenecks
  • Analytical Validation
  • Experiments
  • Prototyping

14
Assumptions, Rewrite today
  • Assumptions
  • Restrict evaluation to SPC support for the Linear
    Road application assuming that their design
    decisions are performance results are applicable
    to other streaming applications.
  • The system is constantly overloaded with respect
    to the available resources.
  • PEs are I/O, CPU, and memory bound.
  • Rewrite today
  • Apply the ideas to other types of streaming
    applications.
  • More extensive experiments on performance tuning.
Write a Comment
User Comments (0)
About PowerShow.com