Evaluating Window Joins over Punctuated Streams - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluating Window Joins over Punctuated Streams

Description:

Evaluating Window Joins over Punctuated Streams Many s taken from talk by Luping Ding and Elke A. Rundensteiner, CIKM04 Database Systems Research Group – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 30
Provided by: lisad189
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Evaluating Window Joins over Punctuated Streams


1
Evaluating Window Joins over Punctuated Streams
  • Many slides taken from talk by
  • Luping Ding and Elke A. Rundensteiner, CIKM04
  • Database Systems Research Group
  • Worcester Polytechnic Institute

2
Stream Data Processing
  • Online Transaction Management
  • Sensor Network Monitoring
  • Network Usage Analysis
  • Online Auction

Register Continuous Queries
Stream Query Engine
Streaming Data
Streaming Result
3
New Challenges in Stream Context
  • Potentially infinite data streams vs. stateful
    operators. e.g., join, distinct,
  • Problem potentially unbounded state
  • Reason no hint on which data is no longer useful

4
Example -Symmetric Hash Join WA93
  • Memory overflow resolution state relocation
  • Example XJoin UF00,
  • Hash-Merge Join MLA04
  • Problems
  • Join state still grows with no bound
  • Delivery of some join results may be highly
    deferred

Memory Overflow
Memory
SA
SB
probe
insert
A
B
5
Avoiding Unbounded State
  • Solution exploit constraints to detect
    no-longer-useful data
  • Sliding window MWA03
  • Identify a bounded set of input data based on
    time
  • K-constraint BW03
  • Models clustered or ordered data arrival pattern
  • Punctuation TMSF03
  • Dynamically announce termination of certain value

6
Sliding Window KNV03
Wa
Wb


Timeline
Stream A
Stream B
7
Punctuation
  • Meta-knowledge embedded inside data streams
  • An ordered set of patterns corresponding to
    attributes of tuples
  • Wildcard (), constant (9), list (1,2,3), range
    (1, 20), empty (?)
  • Semantics tuples after a punctuation p will NOT
    match p


Bid
180
Marlie
820.00
Nov-13-03 110200
No more tuple will contain Item_id 180.
182
Ultrasale
1000.00
Nov-13-03 110500
180
Jocelyn
850.00
Nov-13-03 111400
180



181
pcfan
50.00
Nov-13-03 113600

8
Punctuation-Aware Join DMR04
A
C
A
B
1
200.00
Joinitem_id
2
63.00
SA
SB


175
80.00
175
80.00
175
100.00
175
100.00


No more tuple will have A 175.
175

181
50.00
180
135.00
175
20.00
158
310.00
175
20.00
Stream B
Stream A




9
Features of Punctuation
  • Purge rule. For any tuple ta from stream A, if
    there exists a punctuation Pb that has already
    been received from stream B such that match (ta,
    ,,Pb), ta will not be joining with any future
    arriving tuples from stream B. ta doesnt need to
    be maintained in the A state after being
    processed.
  • Propagation rule. The join operator can also
    propagate punctuations to the output stream in
    order to help downstream operators.

10
  • Based on punctuation semantics, we derive the
    following theorem as the foundation of our
    punctuation propagation algorithm.
  • Theorem 3.1. Let pa and pb be punctuations
    retrieved from streams A and B at time TSa and
    TSb respectively specifying the same punctuated
    value val of join attribute att. Then no output
    tuples with val being the value of attribute att
    will be generated after time max(TSa, TSb).

11
Sliding Window Join
  • Suppose Ta and Tb are time windows for streams A
    and B respectively. We define the invalidation
    rule from the join state based on the sliding
    window
  • Let tuple ta be the latest tuple with timestamp
    TSa from stream A that has been processed.The
    tuple in the B state with timestamp TSb such that
    TSb Tb lt TSa is called a time-expired tuple and
    can be invalidated. The same invalidation rule
    applies to tuples in the A state.

12
Basic Window join
TSa-Tb
TSb-Ta
Tb

Ta

TSa
TSb
Stream A
Stream B
timeline
13
Optimization Opportunities
  • Maintain smaller state than either pure window
    join or pure punctuation-exploiting join
  • Bid tuples that have been joined dont need to be
    maintained in state (Punctuation)
  • Drop tuples without affecting precision of result
  • Bid tuples out of 24-hour window of corresponding
    Auction tuple dont need to be processed
  • Aggregate result for some Auction tuples can be
    produced in less than 24 hours

14
Features of PWJoin algorithm
  • Punctuation-exploiting Window Join is composed of
    three operations
  • Probing state to find matching tuples for
    producing join results.
  • Purging no-longer-joining tuples by punctuations.
  • Invalidating expired tuples by windows. Among
    these operations.

15
Window and Punctuation Occur Simultaneously
SELECT A.item_id, Count () FROM
Auction Range 24 Hours A, Bid B
WHERE A.item_id B.item_id GROUP BY
A.item_id
Auction Stream
Group-byitem_id (count())
Joinitem_id
Bid Stream
Out1 (item_id)
Out2 (item_id, count)
Contains punctuations on item_id
Applies a 24-hour window on Auction stream
16
PWJoin Basics and Issue
Receive a new tuple ta from stream A
Invalidate tuples from B state
Probe B state
Insert ta into A state
Receive a new punct pa from stream A
Purge tuples from B state
Insert pa into A state
  • Issue how to design PWJoin state to facilitate
    all search-based operations?
  • Invalidate conducts time-based search
  • Probe and Purge needs value-based search

17
PWJoin State with Two-dimensional Index
Time List
I-Node Index (Hash Table)
Punctuation Time List
Punctuation Timestamp
p1 T1
p2 T2

Window Begin
8
8
none
10
10
punctuated
8
8
10
tuple
NextValueListTNode
T-Node
4
NextTimeListTNode
8
Key
Head
Tail
PunctFlag
Window End
I-Node
18
PWJoin Algorithm
  • Invalidate Once a new tuple t is retrieved from
    stream A, its timestamp is used to invalidate
    expired tuples from the head of the time list of
    stream B.
  • Probe probe I-Node index and join with tuples in
    value list of matching I-Node.
  • After invalidation is done, the join value of t
    is used to probe the I-Node index of the B state.
    If the matching I-Node iNode is found, the
    corresponding value list is located by following
    the Head pointer of iNode. Tuple t then joins
    with all tuples in this value list by following
    the NextValueListTNode pointer of each T-Node.
  • Finally, the PunctFlag of iNode is checked. If it
    is punctuated, t is discarded. If it is none,
    t is inserted into the A state.

19
PWJoin Algorithm
  • Purge probe I-Node index and delete tuples in
    value list of matching I-Node.
  • When a new punctuation p is retrieved from stream
    A, p is used to probe the I-Node index of the B
    state. If the matching I-Node iNode is found, all
    tuples in the corresponding value list are
    deleted. iNode is removed from the I-Node index
    as well. If the PunctFlag of iNode is
    punctuated, p is discarded. If iNode is not
    found or iNodes PunctFlag is none, p is used
    to probe the I-Node index of the A state and set
    the PunctFlag of the matching I-Node iNodea as
    punctuated.
  • If iNodea does not exist, a new I-Node is created
    with its PunctFlag marked as true and inserted
    into the I-Node index of the A state.

20
Punctuation Propagation CIKM04
  • An operator may propagate punctuations to benefit
    downstream operators

Auction Stream
Group-byitem_id (count())
Joinitem_id
Bid Stream
Item_id
Bidder_id
Bid_price
be unblocked by punctuations propagated by join
operator
propagate punctuations on item_id
180


21
Optimizations Enabled by Combined Constraints
Early Punctuation Propagation
Tuple Dropping
a1
a1
a6
a6
a1
a1
a2
a3
a2
a3
a3
a3
a3
a3
a7
a7
a4
a4
a3
a3
a2
a2
a1
a1
a8
a8
a3
a3
propagation point 2
a2
a2
a6
a6
a3
a3
a10
a10
a3
propagation point 1
a3
Stream S1
Stream S2
Stream S1
Stream S2
22
Achieving Optimizations by Combined Constraints
  • Early propagation
  • Invalidate punctuations in punctuation time list
    as invalidating tuples
  • Expired punctuations can be propagated
  • Tuple dropping
  • When early propagation happens, set PunctFlag of
    matching I-Node as propagated
  • Drop new tuples that matches an I-Node whose
    PunctFlag is propagated

23
Memory Cost Analysis
  • SbT SbTinsert - SbTpurge SbTarrive -
    SbTpurge
  • ?bTb - ? bTb(? paT/NKb,T)
  • ?b tuple input rate of stream B
  • ?pa punctuation input rate of stream A
  • NKb,T - of distinct join values occurred in
    stream B up to Tth time unit
  • Tb time window on stream B

Saving by Punctuation
Window Join
24
PWJoin vs. WJoin Memory and Tuple Output Rate
Stream A, B punct-asc-100-40
25
PWJoin vs. PJoin Punctuation Output Rate
Stream A punct-asc-100-40, Stream B
punct-random-30-40 Window 1 second
26
Conclusion
  • PWJoin algorithm
  • Designed storage structure for PWJoin state
  • Memory cost analysis of PWJoin

27
Thanks
  • WPI Database Research Group

many slides are from davis.wpi.edu/dsrg/CAPE/sl
ides
28
References
  • CIKM04, L. Ding and E.A. Rundensteiner.
    Evaluating Window Joins over Punctuated Streams.
    CIKM04.
  • KNV03 J. Kang, J. F. Naughton and S. D. Viglas.
    Evaluating Window Joins over Unbounded Streams.
    ICDE03.
  • UF00 T. Urhan and M. Franklin, XJoin A
    Reactively Scheduled Pipelined Join Operator.
    IEEE Data Engineering Bulletin, 23(2), 2000.
  • HH99 P. Haas and J. Hellerstein, Ripple Joins
    for Online Aggregation. SIGMOD99.
  • GO03 L. Golab and M. T. Ozsu, Processing
    Sliding Window Multi-Joins in Continuous Queries
    over Data Streams. VLDB03.
  • GGO04 L. Golab, S. Garg and M. T. Ozsu, On
    Indexing Sliding Windows over On-line Data
    Streams, EDBT04.
  • RDS04 E. A. Rundensteiner, L. Ding, T.
    Sutherland, Y. Zhu, B. Pielech and N. Mehta,
    CAPE Continuous Query Engine with
    Heterogeneous-Grained Adaptivity. VLDB Demo,
    2004.
  • BW04 S. Babu and J. Widom. Exploiting
    k-Constraints to Reduce Memory Overhead in
    Continuous Queries over Data Streams
  • TMS03 P. A. Tucker, D. Maier, T. Sheard and L.
    Fegaras. Exploiting Punctuation Semantics in
    Continuous Data Streams. TKDE, 15(3), 2003.
  • DMR04 L. Ding, N. Mehta, E. A. Rundensteiner
    and G. T. Heineman, Joining Punctuated Streams.
    EDBT04.
  • MWA03 R. Motwani, J. Widom, A. Arasu et al.
    Query Processing, Resource Management, and
    Approximation in a Data Stream Management System.
    CIDR03.

29
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com