A Grid-Based Middleware for Processing Distributed Data Streams - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

A Grid-Based Middleware for Processing Distributed Data Streams

Description:

LSS stores unprocessed itemsets Dynamic Migration Memory usage of LSS Dynamic Migration Memory usage of LSS Dynamic Migration LSS migration does not impact ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 73
Provided by: weil69
Category:

less

Transcript and Presenter's Notes

Title: A Grid-Based Middleware for Processing Distributed Data Streams


1
A Grid-Based Middleware for Processing
Distributed Data Streams
  • Liang Chen
  • Advisor Gagan Agrawal
  • Computer Science Engineering

2
Roadmap
  • Introduction
  • Motivation
  • Our approach and challenges
  • System Overview and Initial Evaluation
  • Introduce system architecture and design
  • Discuss the self-adaptation function
  • Self-Adaptation Algorithm
  • Explain the algorithm
  • Evaluate the system by using two data mining
    applications
  • Resource Allocation Schemes
  • Dynamic Migration
  • Motivation
  • Light-weight summary structure (LSS)
  • How applications utilize the dynamic migration
  • Evaluation
  • Adaptive Volume Rendering
  • Related work
  • Conclusion and Future work

3
Introduction-Motivation
  • What is data steam
  • Data stream data arrive continuously
  • Enormous volume and must be processed online
  • Need to be processed in real-time
  • Data sources could be distributed
  • Data Stream Applications
  • Online network intrusion detection
  • Sensor networks
  • Network Fault Management system for
    telecommunication network elements

4
Introduction-Motivation
Network Fault Management System (NFM) analyzing
distributed alarm streams

Switch Network
NFM (Network Fault Management) System
5
Introduction-Motivation
  • Challenges
  • Data and/or computation intensive
  • System can be easily overloaded

6
Introduction-Motivation
  • Possible solutions
  • Grid computing technologies
  • Automatically adjust processing rate

7
Introduction-Motivation
  • The needs for processing distributed data streams
  • A middleware running in Grid
  • Allocate Grid resources
  • Provide self-adaptation function

8
Introduction-Our Approach
  • We implemented a middleware to meet the needs
  • Five contributions of our work
  • 1. Utilizing existing grid standards
  • Liang Chen, K. Reddy and G. Agrawal
    GATES A Grid-Based Middleware for Processing
    Distributed Data Streams.HPDC, 2004.
  • 2. Providing self-Adaptation functionality
  • Liang Chen and G. Agrawal Supporting
    Self-Adaptation in Streaming Data Mining
    Applications. IPDPS, 2006.
  • 3. Supporting automatic resource allocation
  • Liang Chen and G. Agrawal A Static
    Resource Allocation Framework for Grid-Based
    Streaming Applications. Concurrency Computation
    Practice and Experience Journal, Volume 18, Issue
    6 , Pages 653 - 666.
  • 4. Supporting efficient dynamic migration
  • Liang Chen, Q. Zhu and G. Agrawal A
    Supporting Dynamic Migration in Tightly Coupled
    Grid Applications. SC 2006.
  • 5. Studying adaptive rendering application

9
Roadmap
  • Introduction
  • Motivation
  • Our approach and challenges
  • System Overview and Initial Evaluation
  • Introduce system architecture and design
  • Discuss the self-adaptation algorithms
  • Self-Adaptation Algorithm
  • Introduce the algorithm
  • Evaluate the system by using two data mining
    applications
  • Resource Allocation Schemes
  • Dynamic Migration
  • Motivation
  • Light-weight summary structure (LSS)
  • How applications utilize the dynamic migration
  • Evaluation
  • Adaptive Volume Rendering
  • Related work
  • Conclusion and Future work

10
System Architecture and Design(Architecture)
  • Use Globus Toolkit 3.0, built on OGSA
  • Allows users to specify their algorithms
    implemented in Java
  • Take care of plugging user-defined algorithms
    into the system and running them in Grid.
  • Applications need be broken down into a number of
    pipelined stages

11
System Architecture and Design(Architecture)
Application
Stage A
Stage B
Stage C
12
System Architecture and Design(GATES API
Functions)
  • Public class Second-Stage implements
    StreamProcessing
  • void work(buffer in, buffer out)
  • while(true)
  • DATA GATES.getFromInputBuffer(in)
  • Inter-Results Processing(Data)
  • GATES.putToOutputBuffer (out,
    Inter-Results)

13
Adaptation Parameter
  • Definition
  • A parameter in an application
  • Changing the parameters value can change
    processing rate of the application, also impact
    accuracy of the processing
  • Two kinds of adaptation parameters
  • Performance parameter
  • Accuracy parameter
  • Example
  • Sampling rate is an accuracy parameter

14
Pseudo Codes Again with Self-adaptation API
Functions
Public class Second-Stage implements
StreamProcessing //Initialize
sampling-rate Sampling-rate (Max Min)/2
void work(buffer in, buffer out)
GATES.specifyAccuracyPara(Sampling-rate, Max,
Min) while(true) DATA
GATES.getFromInputBuffer(in)
Inter-Results Processing(Data,
Sampling-rate) GATES.putToOutputBuffer
(out, Inter-Results) Sampling-rate
GATES.getSuggestedValue()
15
Roadmap
  • Introduction
  • Motivation
  • Our approach and challenges
  • System Overview and Initial Evaluation
  • Introduce system architecture and design
  • Discuss the self-adaptation function
  • Self-Adaptation Algorithm
  • Explain the algorithm
  • Evaluate the system by using two data mining
    applications
  • Resource Allocation Schemes
  • Dynamic Migration
  • Motivation
  • Light-weight summary structure (LSS)
  • How applications utilize the dynamic migration
  • Evaluation
  • Adaptive Volume Rendering
  • Related work
  • Conclusion and Future work

16
Self-Adaptation Algorithm
  • View the system as a pipeline
  • To ensure real-time processing, a balanced
    pipeline is needed
  • When average queue length is too small or too
    large, queue is under or over loaded. Pipeline is
    not balanced.
  • Measure the average lengths of the queues in the
    pipeline
  • When GATES.getSuggestedValue() is invoked, use
    the heuristic way to determine a new value for
    the adaptation parameter according to the
    measured lengths

17
Self-adaptation Algorithm
  • The way we measure average queue length
  • the heuristic way to adjust an adaptation
    parameter
  • Should the adaptation parameter be modified, and
    if so, in which direction?
  • How to find a new value (update the value) of the
    adaptation parameter

18
Self-adaptation Algorithm
  • Should the adaptation parameter be modified, and
    if so, in which direction?
  • The answer is related to the pipelines load
    state.

19
Self-adaptation Algorithm
Convergent States
Non-Convergent States
20
Self-adaptation Algorithm
Summary of Load States
21
Self-adaptation Algorithm
  • How to determine a new value for the adaptation
    parameter
  • Linear update increase or decrease by a
    fixed value
  • Hard to find a proper fixed value
  • Binary search

22
Self-adaptation Algorithm
Left Border
Current Value
Right Border
New Value
23
(No Transcript)
24
Self-adaptation Algorithm
  • Two Data mining applications
  • Clustream Clustering data-points in streams

25
Data Mining Applications System Evaluation
  • Dist-Freq-Counting finding frequent itemsets
    from distributed streams

26
Data Mining Applications System Evaluation
27
Data Mining Applications System Evaluation
28
Data Mining Applications System Evaluation
29
Data Mining Applications System Evaluation
30
Data Mining Applications System Evaluation
31
Data Mining Applications System Evaluation
Data Mining Applications System Evaluation
32
Data Mining Applications System Evaluation
Data Mining Applications System Evaluation
33
Data Mining Applications System Evaluation
Data Mining Applications System Evaluation
34
Roadmap
  • Introduction
  • Motivation
  • Our approach and challenges
  • System Overview and Initial Evaluation
  • Introduce system architecture and design
  • Discuss the self-adaptation algorithms
  • Self-Adaptation Algorithm
  • Explain the algorithm
  • Evaluate the system by using two data mining
    applications
  • Resource Allocation Schemes
  • Dynamic Migration
  • Motivation
  • Light-weight summary structure (LSS)
  • How applications utilize the dynamic migration
  • Evaluation
  • Adaptive Volume Rendering
  • Related work
  • Conclusion and Future work

35
Resource Allocation Schemes
  • Problem Definition
  • Grid resource allocation for pipelined
    applications that process distributed streaming
    data in real-time is challenging
  • The scheme consists of two parts
  • Static Part allocate resources before an
    application runs
  • Dynamic Part re-allocate resources in run-time
  • A framework to monitor resources and support
    dynamic resource allocation

36
Static Allocation Scheme
  • Static allocation problem determining a
    deployment configuration
  • Objective Automatically generate a deployment
    configuration according to the information of
    available resources
  • The number of data sources and their location
  • The destination
  • The number of stages consisting of a pipeline
  • The number of instances of each stage
  • How the instances connect to each other
  • The node where each instance is placed

37
Roadmap
  • Introduction
  • Motivation
  • Our approach and challenges
  • System Overview and Initial Evaluation
  • Introduce system architecture and design
  • Discuss the self-adaptation algorithms
  • Improved Self-Adaptation
  • self-adaptation algorithm
  • Evaluate the system by using two data mining
    applications
  • Resource Allocation Schemes
  • Dynamic Migration
  • Motivation
  • Light-weight summary structure (LSS)
  • How applications utilize the dynamic migration
  • Evaluation
  • Adaptive Volume Rendering
  • Related work
  • Conclusion and Future work

38
Dynamic Migration-Motivation
  • Grid resources vary frequently
  • Dynamically allocating new resources and
    migrating applications to the new resources
    improve performance
  • Checkpointing is a classic method to support
    dynamic migration
  • A snapshot of systems running state
  • Transmit to a remote site
  • Restore execution context and restart processes
  • Disadvantages of checkpointing
  • Platform dependent
  • Inefficient
  • Involve lots of implementation efforts
  • Our approach is base on Light-weight Summary
    Structure (LSS)

39
Dynamic Migration-LSS
  • Processing Structure
  • ...
  • while(true)
  • read_data_from_streams()
  • process_data()
  • accumulate_intermediate_results()
  • reset_auxiliary_structures()
  • ...
  • Data structure storing summary information is
    Light-weight summary structure
  • Others are Auxiliary structures

40
Dynamic Migration-LSS
  • Two observations with respect to LSS
  • The size of LSS is much smaller than that of the
    total memory
  • Auxiliary structures are usually reset at the end
    of each loop. Unnecessary to migrate auxiliary
    structures when migration occurs at the end of a
    loop
  • LSS can be used to support dynamic migration
  • GAETS provides an API function to allocate a
    block of memory to be LSS
  • An application stores summary information to LSS
  • transmit only LSS at the end of the loop to a new
    node and restore the LSS at the new node

41
Dynamic Migrationsupported by GATES
42
Dynamic Migration
  • Advantages of using LSS
  • Efficient, only LSS is migrated
  • Not impact the accuracy of processing
  • Support migration across heterogeneous platforms
  • Reduce application developers efforts on making
    application capable of migration

43
Dynamic Migration
44
Dynamic Migration
  • Evaluation
  • Three applications
  • Counting sample
  • LSS stores intermediate top M frequently
    occurring numbers
  • Clustream, clustering data points in streams
  • LSS stores micro-clusters computed at the second
    stage
  • Dist-Freq-Counting, finding frequent itemsets in
    distributed streams.
  • LSS stores unprocessed itemsets

45
Dynamic Migration
  • Memory usage of LSS

46
Dynamic Migration
  • Migration using LSS is efficient

47
Dynamic Migration
  • Migration using LSS is efficient

48
Dynamic Migration
  • Benefits of migration in a dyamic environment

49
Dynamic Migration
  • Memory usage of LSS

50
Dynamic Migration
  • Migration using LSS is efficient

51
Dynamic Migration
  • Migration using LSS is efficient

52
Dynamic Migration
  • Benefits of migration in a dynamic environment

53
Dynamic Migration
  • LSS migration does not impact processing accuracy
  • The counting sample application was used
  • Compared the average accuracy of the processing
    results from the non-migration and the migration
    versions, they are 97.28 and 97.51 accurate

54
Roadmap
  • Introduction
  • Motivation
  • Our approach and challenges
  • System Overview and Initial Evaluation
  • Introduce system architecture and design
  • Discuss the self-adaptation algorithms
  • Self-Adaptation Algorithm
  • Explain the algorithm
  • Evaluate the system by using two data mining
    applications
  • Resource Allocation Schemes
  • Dynamic Migration
  • Motivation
  • Light-weight summary structure (LSS)
  • How applications utilize the dynamic migration
  • Evaluation
  • Adaptive Volume Rendering
  • Related work
  • Conclusion and Future work

55
Adaptive Volume Rendering
  • Motivation Grid computing is needed
  • Visualization involves large volumes of dataset
  • We focus on streaming volume data
  • Interactively visualizing volume data in
    real-time is needed
  • Computationally intensive
  • Resources consumed
  • Real-time processing can not be guaranteed
  • The places where data are generated are
    distributed
  • Typical client-server architecture is not
    scalable
  • Network bandwidths of wide-area networks are low
  • Computing capability of normal desktop is not
    enough
  • Grid techniques would be a good solution
  • Divide the procedure into stages organized in a
    pipeline
  • Allocate nodes close to data source to
    pre-process volume data
  • The size of intermediate results is much smaller

56
Adaptive Volume Rendering
  • Motivation GATES is desirable
  • Automatic adaptation is desirable
  • Volume rendering algorithms running on a grid
    need to be highly adaptive
  • Adaptation usually achieved by manually adjusting
    adaptation parameters
  • Such manual parameter adaptation is very
    challenging in a grid environment
  • Automatic resource allocation is desirable
  • Grid environment is highly changeable
  • The GATES middleware could fulfill the needs
  • Grid-based
  • Provide the self-adaptation function to
    applications
  • Automatically allocate Grid resources

57
Adaptive Volume Rendering
  • Overall design
  • Two pipelined steps the first step
  • Build octrees from volume data
  • Octree is a tree data structure, in which each
    internal node has up to 8 children
  • Here, we use an octree to represent
    multiresolution information for a volume
  • Procedure to build an octree for a volume is as
    follows
  • Divide volume space into 8 subvolumes and create
    8 children nodes
  • For each subvolume, calculate standard deviation
    of all voxels in the subvolume, and store the
    deviation to the corresponding child node
  • If the deviation is larger than a pre-defined
    value, divide the subvolume, repeat the above
    procedure. Otherwise, stop

58
Adaptive Volume Rendering
  • Overall design
  • Two pipelined steps the second step
  • Use an octree and its corresponding volume to
    render images
  • Provided an error tolerance (or user-defined
    resolution), use DFS to traverse the octree and
    stop at the nodes where the deviation is less
    than the resolution or error tolerance.
  • Project the corresponding 3D-subvolumes to an
    image

59
Adaptive Volume Rendering
60
Adaptive Volume Rendering
  • Make the rendering self-adaptive
  • Two adaptation parameters used in the third stage
  • Error Tolerance performance parameter
  • Image Size accuracy parameter
  • Only one adaptation parameter can be adjusted by
    GATES. So we fix one and adjust the other

61
(No Transcript)
62
Adaptive Volume Rendering
  • Experiment 1

63
Adaptive Volume Rendering
  • 100kbps

150kbps
200kbps
250kbps
64
Adaptive Volume Rendering
  • Experiment 2

65
Adaptive Volume Rendering
  • Experiment 3 compare the performance of two
    implementations
  • Java-imple
  • C-imple

66
Adaptive Volume Rendering
  • Experiment 3 compare the performance of two
    implementations

67
Related Work
  • Middleware for data stream processing
  • Data cutter, Stampede
  • Differences in a cluster, no self-adaptation, no
    specifically for real-time processing
  • Continuous query systems
  • STREAM, dQUOB, TelegraphCQ, NiagraCQ
  • Differences centralized, no adaptation supports
  • Distributed continuous query systems
  • Aurora, Medusa, Borealis
  • Differences continuous queries, not in Grid
    environment
  • In-Network aggregation in sensor network
  • Stream-based overlay networks

68
Related work
  • Grid Resource Allocation
  • Condor, Realtor, ACDS
  • Main Differences our work focus on Grid resource
    allocation for workflow applications
  • Adaptation Through a Middleware
  • Cheng et al.s adaptation framework, SWiFT,
    Conductor, DART, ROAM
  • Main Differences our work focus on general
    supports for adaptation in run-time
  • Dynamic Migration in Grid Environment
  • Condor, XCATS, Charm
  • Main Differences our work use LSS

69
Conclusion
  • Grid computing could be an effective solution for
    distributed data stream processing
  • GATES
  • Distributed processing
  • Exploit grid web services
  • Self-adaptation to meet the real-time constraints
  • Grid resource allocation schemes and dynamic
    migration

70
Future Work
  • CPU cycles and Network bandwidths
  • Currently, only network bandwidth is considered a
    constraint when scheduling Grid resources
  • Few related work proposes a metric to integrate
    both for pipelined appliations
  • Port GATES from GT3 to GT4
  • Support fault-tolerance and high availability
  • Further relieve programming burdens from
    application develops
  • Specify meta-data
  • Support distributed continuous queries
  • Specify a set of query operators

71
Acknowledgements
  • My advisor, Prof. Agrawal, proposed the idea of
    implementing the middleware, and gave lots
    advices for the directions of my research
  • Prof. Shen gave lots of helps on implementing the
    render application, and provided lots of write-up
    for the chapter 7

72
Questions?
  • No more questions? Thanks!
Write a Comment
User Comments (0)
About PowerShow.com