... Streaming Through Time: A Vision for ... Stock ticke - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

... Streaming Through Time: A Vision for ... Stock ticke

Description:

... Streaming Through Time: A Vision for ... Stock ticker data example. We want to compute real time info about stock data, but compensate when a correction ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 15
Provided by: jonathang7
Category:

less

Transcript and Presenter's Notes

Title: ... Streaming Through Time: A Vision for ... Stock ticke


1
Consistent Streaming Through Time A Vision for
Event Stream Processing
  • by Jonathan Goldstein (speaker), Roger Barga,
    Mohamed Ali, and Mingsheng Hong
  • Microsoft Research

2
Are StreamSQL semantics ok?
  • Suppose we want to monitor the bandwidth of a
    device
  • We create an input stream which has one field
    bytes sent
  • We create an output stream which computes a
    windowed sum
  • What are the StreamSQL semantics when the system
    gets overloaded (strange question to ask)?
  • Either events must be dropped, or they must be
    queued at the receiver or sender for later
    processing
  • Since window semantics are based on system time
    (StreamSQL server time) , if the device has
    constant bandwidth, apparent bandwidth will
    decrease!
  • In StreamSQL, the user has no reasonable way of
    knowing!
  • Conclusion Something is deeply wrong with the
    use of time in StreamSQL query semantics!

3
Whats in the paper?
  • Laundry list of CEDR features either unsupported
    or poorly supported in existing streaming systems
    (Read the paper)
  • Some of these features come from event processing
  • Some come from specific scenarios which we
    believe to be important
  • These features are described formally through a
    query language description

4
Whats in the talk (and the paper)?
  • Formal definitions of CEDR streams and operator
    semantics
  • Provides a clear and intuitive framework for
    discussing subtle semantic issues
  • Formalization of materialized view update
    semantics in standing queries and discuss why
    they are inadequate in isolation
  • Definition of a non-view update compliant
    operator which can express a very wide range of
    seemingly disparate streaming features
  • A myriad of window types, the separation of
    inserts and deletes, etc
  • We discuss theoretically both the expression and
    correct handling of both data delivered out of
    order and data retraction
  • Different formal notions of correctness lead to
    different consistency levels and associated
    performance tradeoffs

5
What is a stream and a standing query?
  • A stream is a (possibly infinite) collection of
    events, where each event contains
  • A payload (P)
  • A key which uniquely identifies the event (K)
  • An interval of time (application) for which the
    payload is valid Vs, Ve)
  • A time at which it arrives at a listener (C for
    CEDR time)
  • A standing query is an operator graph, where each
    operator takes 0 or more input streams and
    produces 0 or more output streams

Acknowledgement This is inspired by and built on
Rick Snodgrasss temporal work
6
What properties do operators have?
  • All operators should be well behaved
  • Definition 6 A CEDR operator O is well behaved
    iff for all (combinations of) inputs to O which
    are logically equivalent to infinity, Os outputs
    are also logically equivalent to infinity
  • Any well behaved operator, when given 2 identical
    sets of input streams, except for CEDR time,
    should produce identical sets of output streams,
    except for CEDR time
  • Query semantics are independent of CEDR time

7
What properties do operators have?
  • Some operators are also view update compliant
  • Definition 11 A unary CEDR operator O is view
    update compliant iff for all R, S s.t. (R) and
    (S) are identical, (O(R)) and (O(S)) are also
    identical
  • If we interpret the stream as describing a
    changing relation where each rows lifetime is
    specified by valid time, then
  • A view update compliant operator produces
    snapshot identical output for snapshot identical
    input

8
What are our operators?
  • We may now happily use all our favorite
    relational operators
  • Definition 9 Join ?f(P1,P2)(S1, S2)
  • ??(P1,P2)(S1, S2) (Vs, Ve, (e1.Payload
    concantenated with e2.Payload)) e1 ? E(S1), e2
    ? E(S2), Vsmax e1.Vs, e2.Vs, Vemin e1.Ve,
    e2.Ve, where Vs lt Ve, and ?(e1.Payload,
    e2.Payload)
  • These operators output streams describe the
    changing contents of a materialized view computed
    over the changing input relation(s) described by
    the input streams

9
Non-view update compliant operators
  • Moving window all output valid end times are
    set to their valid start times plus the window
    size
  • insert separation (CQL) all output valid end
    times are set to infinity
  • The semantics of these operations plus many more
    can be easily captured using AlterLifetime
  • Definition 12 AlterLifetime ?fvs, f?(S)
  • ?fvs, f?(S)(fVs(e), fVs(e) f? (e),
    e.Payload) e ? E(S
  • Allows the lifetime of input events to be
    recomputed
  • It is not view update compliant, but it is well
    behaved

10
But is this implementable?
Input
  • Avg(P) The usual average operator in
    materialized view update compliant form
  • But how could CEDR know it needed to wait for K2
    (to produce output) when it saw K1?
  • It couldnt have without waiting indefinitely or
    without some external guarantee

Correct Output
11
But is this implementable?
  • We need the ability to retract previously output
    results in the stream

is logically equivalent to
12
But is this implementable?
  • Our real definition of well behavedness
  • Any well behaved operator, when given
    logically equivalent sets of input streams,
    produces logically equivalent sets of output
    streams
  • Avg may now fully retract incorrect previous
    output and issue new correct output for the
    appropriate time period
  • We can denote operator semantics in a very clean
    manner even in a system with arbitrarily out of
    order data
  • The use of retractions to handle out of order
    data induces a spectrum of formally defined
    consistency levels for operators
  • These levels expose interesting tradeoffs between
    various aspects of performance and correctness
    (much more in the paper)

13
Imperfections in Event Streaming
  • How do current systems cope
  • Wait until were sure we have all data that
    affects our results up to a point in time (High
    consistency)
  • High latency
  • Requires application and network guarantee
  • Requires high memory
  • Absolutely correct answers
  • Useful for standing queries that result in some
    expensive form of corrective or examination
    action
  • A human must examine something because some
    aggregation (avg) or negation based alert tripped
  • Provide an answer quickly as of the current time,
    but ignore late arriving data (Low Consistency)
  • Low latency
  • No application or network guarantee required
  • Low memory
  • Sacrifices answer correctness
  • Useful in applications which are unable to
    provide guarantees about data arrival timeliness
    and where exact answers arent required
  • E.g. Aggregations in internet scale monitoring

14
Imperfections in Event Streaming
  • With retractions
  • Compute our output early in an optimistic fashion
    and retract later if necessary (Middle
    Consistency)
  • Low latency
  • Doesnt require application and network
    guarantees
  • High memory requirements equal to the high
    consistency case if we have guarantees
  • May produce more output
  • Useful in situations where we dont want to
    block, but where we want eventual correctness
  • Stock ticker data example. We want to compute
    real time info about stock data, but compensate
    when a correction is issued.
  • Shared expressions between two queries, one
    running at the high level of consistency and one
    at the low

15
Infinite Spectrum of Consistency Levels
B How long (at most) does the query block M
How long (at most) is the query required to
remember data
Blocking
Strong consistency
Slow cautious
B
Middle
consistency
Quick optimistic
M
Memory
Weak consistency
Small less correct
Big more correct
Write a Comment
User Comments (0)
About PowerShow.com