XJoin: A Reactively-Scheduled Pipelined Join Operator - PowerPoint PPT Presentation

About This Presentation
Title:

XJoin: A Reactively-Scheduled Pipelined Join Operator

Description:

Efficiently evaluate equi-join in online query processing over distributed ... Out1 (item_id) Out2 (item_id, sum) No more bids for item 1080! CS561 - XJoin. 39 ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 44
Provided by: webC6
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: XJoin: A Reactively-Scheduled Pipelined Join Operator


1
XJoin A Reactively-Scheduled Pipelined Join
Operator
  • IEEE Bulletin, 2000
  • by Tolga Urhan and Michael J. Franklin

Based on a talk prepared by Asima Silva Leena
Razzaq
2
Goal of XJoin
  • Efficiently evaluate equi-join in online query
    processing over distributed data sources
  • Optimization objectives
  • Having small memory footprint
  • Fast initial result delivery
  • Hiding intermittent delays in data arrival

3
Outline
  • Hash Join History
  • Motivation of XJoin
  • Challenges in Developing XJoin
  • Three Stages of XJoin
  • Preventing Duplicates
  • Experimental Results
  • Conclusion

4
Classic Hash Join
  • 2-phase build and probe
  • Only one table is hashed in memory

2. Probe
1. Build
5
Hybrid Hash Join
  • One table is hashed both to disk and memory
    (partitions)
  • G. Graefe, Query Evaluation Techniques for Large
    Databases. ACM 1993.

6
Symmetric Hash Join (Pipelined)
  • Both tables are hashed (both kept in main memory
    only)
  • A. Wilschut, P. M.G. Apers, Dataflow Query
    Execution in a Parallel Main-Memory Environment,
    DPD 1991.

OUTPUT
Source S
Source R
7
Problem of SHJ
  • Memory intensive
  • Wont work for large input streams.
  • Wont allow for many joins to be processed in a
    pipeline (or even in parallel).

8
New Problem in Online Query Processing over
Distributed Data Sources
  • Unpredictable data access due to link congestion,
    load balances, etc.
  • Three classes of delays
  • Initial Delay first tuple arrives from remote
    source more slowly than usual
  • Slow Delivery data arrives at a constant, but
    slower than expected rate
  • Bursty Arrival data arrives in a fluctuating
    manner

9
Question
  • Why are delays undesirable?
  • Prolong the time for first output
  • Slow the processing if wait for data to first be
    there before acting
  • If too fast, you want to avoid loosing any data
  • Waste time if you sit idle while no data is
    coming
  • Unpredictable, one single strategy wont work

10
Motivation of XJoin
  • Produce results incrementally when available
  • Tuples returned as soon as produced
  • Allow progress to be made when one or more
    sources experience delays by
  • Background processing performed on previously
    received tuples so results are produced even when
    both inputs are stalled

11
XJoin Design
  • Tuples are stored in partitions (Hash Join)
  • A memory-resident (m-r) portion
  • A disk-resident (d-r) portion

12
(No Transcript)
13
Challenges in Developing XJoin
  • Manage flow of tuples between memory and
    secondary storage (when and how to do it)
  • Control background processing when inputs are
    delayed (reactive scheduling idea)
  • Ensure the full answer is produced
  • Ensure duplicate tuples are not produced
  • Provide both quick initial result as well as good
    overall throughput

14
XJoin Stages
  • XJoin proceeds in 3 stages (separate threads)

M M
M D
D D
15

1st Stage Memory-to-Memory Join
M E M O R Y
SOURCE-B
SOURCE-A
16
1st Stage Memory-to-Memory Join
  • Join processing continues as long as
  • Memory permits, and
  • One of the inputs is producing tuples
  • If memory is full, one partition is picked to be
    flushed to disk and append to the end of
    disk-resident portion
  • If no new input, then stage 1 is blocked and
    stage 2 starts

17
Why Stage 1?
  • In-memory operations are much faster and cheaper
    than on-disk operations, thus guaranteeing result
    are produced as soon as possible.

18
Question
  • What does the 2nd Stage do?
  • When does the 2nd Stage start?
  • Hints
  • What occurs when data input (tuples) are too
    large for memory?
  • Answer
  • The 2nd Stage joins Memory-to-Disk
  • Occurs when both the inputs are blocking

19
Stage 2
20
2nd Stage Memory-to-Disk Join
  • Activated when 1st Stage is blocked
  • Performs 3 steps
  • Choose the partition according to throughput and
    size of partition from one source
  • Use tuples from d-r portion to probe m-r portion
    of other source and output matches, till d-r
    completely processed
  • Check if either input resumed producing tuples.
    If yes, resume 1st Stage. If no, choose another
    d-r portion and continue the 2nd Stage.

21
Controlling 2nd Stage
  • Cost of 2nd Stage is hidden when both inputs
    experience delays
  • Tradeoff ?
  • What are the benefits of using the second stage?
  • Produce results when input sources are stalled
  • Allows varying input rates
  • What is the disadvantage?
  • The second stage must complete a d-r portion
    before checking for new input (overhead)
  • To address the tradeoff, use an activation
    threshold
  • Pick a partition likely to produce many tuples
    right now

22
3rd Stage Disk-to-Disk Join
  • Clean-up stage
  • Assume that all data for both inputs has arrived
  • Assume that 1st and 2nd stage have completed
  • Why is this step necessary?
  • Completeness of answer make sure that all result
    tuples are being produced.
  • Reason some tuples in disk-resident portions may
    not have chance to join each other.

23
Preventing Duplicates
  • When could duplicates be produced?
  • Duplicates could be produced in both the 2nd and
    3rd stages which may perform overlapping work.
  • How to address it?
  • XJoin prevents duplicates with timestamps.
  • When address this?
  • During processing when trying to join two tuples.

24
Time Stamping Part 1
  • 2 fields are added to each tuple
  • Arrival TimeStamp (ATS)
  • Indicates the time when the tuple first arrived
    in memory
  • Departure TimeStamp (DTS)
  • Indicates the time when the tuple was flushed to
    disk
  • ATS, DTS indicates when tuple was in memory
  • When did two tuples get joined in the 1st state?
  • If Tuple As DTS is within Tuple Bs ATS, DTS
  • Tuples that meet this overlap condition are not
    considered for joining at the 2nd or 3rd stage

25
Detecting Tuples Joined in 1st Stage
  • Tuples joined in first stage
  • B1 arrived after A and before A was flushed to
    disk
  • Tuples not joined in first stage
  • B2 arrived after A and after A was flushed to disk

26
Time Stamping Part 2
  • For each partition, keep track of
  • ProbeTS time when a 2nd stage probe was done
  • DTSlast the DTS of the last tuple of the
    disk-resident portion
  • Several such probes may occur
  • Keep an ordered history of such probe descriptors
  • Usage
  • All tuples before and including at time DTSlast
    were joined in stage 2 with all tuples in main
    memory at time ProbeTS

27
Detecting Tuples Joined in 2nd stage
Partition 2
DTSlast
ProbeTS
ATS
DTS
20
340
350
550
700
900
Tuple A
100
200
overlap
Partition 2
100
300
800
900
Tuple B
500
600
ATS
DTS
History list for the corresponding partition.
All A tuples in Partition 2 up to DTSlast
350, Were joined with m-r tuples that arrived
before Partition 2s ProbeTS.
28
Experiments
  • HHJ (Hybrid Hash Join)
  • XJoin (with 2nd stage and with caching)
  • XJoin (without 2nd stage)
  • XJoin (with aggressive usage of 2nd stage)

29
Case 1 Slow NetworkBoth Sources Are Slow
30
Case 1 Slow NetworkBoth Sources Are Slow
(Bursty)
  • XJoin improves delivery time of initial answers
    -gt interactive performance
  • The reactive background processing is an
    effective solution to exploit intermittent delays
    to keep continued output rates
  • Shows that 2nd stage is very useful if there is
    time for it

31
Case 2 Fast NetworkBoth Sources Are Fast
32
Case 2 Fast NetworkBoth Sources Are Fast
  • All XJoin variants deliver initial results
    earlier.
  • XJoin also can deliver the overall result in
    equal time to HHJ
  • HHJ delivers the 2nd half of the result faster
    than XJoin.
  • 2nd stage cannot be used too aggressively if new
    data is coming in continuously

33
Conclusion
  • Can be conservative on space (small footprint)
  • Can produce initial result as early as possible
  • Can hide intermittent data delays
  • Can be used in conjunction with online query
    processing to manage data streams (limited)

34
How to Further Optimize XJoin?
  • Resuming Stage 1 as soon as data arrives
  • Removing no-longer-joining tuples timely
  • More

35
References
  • Urhan, Tolga and Franklin, Michael J. XJoin
    Getting Fast Answers From Slow and Bursty
    Networks.
  • Urhan, Tolga and Franklin, Michael J. XJoin A
    Reactively-Scheduled Pipelined Join Operator.
  • Hellerstein, Franklin, Chandrasekaran, Deshpande,
    Hildrum, Madden, Raman, and Shah. Adaptive Query
    Processing Technology in Evolution. IEEE Data
    Engineering Bulletin, 2000.
  • Hellerstein and Avnur, Ron. Eddies Continuously
    Adaptive Query Processing.
  • Babu and Wisdom, Jennifer. Continuous Queries
    Over Data Streams.

36
Stream New Query Context
  • Challenges faced by XJoin
  • potentially unbounded growing join state
  • Indefinite delay of some join results
  • Solutions
  • Exploit semantic constraints to remove
    no-longer-joining data timely
  • Constraints sliding window, punctuations

37
Punctuation
  • Punctuation is predicate on stream elements that
    evaluates to false for every element following
    the punctuation.

ID
Name
Age
no more tuples for students whose age are less
than or equal to 18!
9961234
Edward
17
9961235
Justin
19
9961238
Janet
18


(0, 18
9961256
Anna
20

38
An Example
Open Stream
item_id seller_id open_price timestamp 1080
jsmith 130.00 Nov-10-03 90300 lt1080, ,
, gt 1082 melissa 20.00 Nov-10-03
91000 lt1082, , , gt
Query For each item that has at least one bid,
return its bid-increase value. Select
O.item_id, Sum (B.bid_price -
O.open_price) From Open O, Bid B Where
O.item_id B.item_id Group by O.item_id
Bid Stream
item_id bidder_id bid_price timestamp 1080
pclover 175.00 Nov-14-03 82700 1082
smartguy 30.00 Nov-14-03 83000 1080
richman 177.00 Nov-14-03 85200 lt1080, , ,
gt
Open Stream
Group-byitem_id (sum())
Joinitem_id
Out1 (item_id)
Out2 (item_id, sum)
Bid Stream
No more bids for item 1080!
39
PJoin Execution Logic
3
3
2
Join State (Memory-Resident Portion)
State of Stream A (Sa)
State of Stream B (Sb)
Hash Table
Hash Table
Purge Cand. Pool
Purge Cand. Pool

3 5 3 9 9

3

Punct. Set (PSb)
Punct. Set (PSa)
1
3
lt10
4
Join State (Disk-Resident Portion)
Hash(ta) 1
Hash Table
Hash Table
5 9 3 5
3
Tuple ta


Stream B
Stream A
40
PJoin Execution Logic
Join State (Memory-Resident Portion)
State of Stream A (Sa)
State of Stream B (Sb)
Hash Table
Hash Table
Purge Cand. Pool
Purge Cand. Pool

3 5 3 9 9


Punct. Set (PSb)
Punct. Set (PSa)
3
lt10
Join State (Disk-Resident Portion)
Hash(pa) 1
Hash Table
Hash Table
5 9 3 5
3
Punctuation pa


Stream B
Stream A
41
PJoin vs. XJoin Memory Overhead
Tuple inter-arrival 2 milliseconds Punctuation
inter-arrival 40 tuples/punctuation
42
PJoin vs. XJoin Tuple Output Rate
Tuple inter-arrival 2 milliseconds Punctuation
inter-arrival 30 tuples/punctuation
43
Conclusion
  • Memory requirement for PJoin state almost
    insignificant compare to XJoins.
  • Increase in join state of XJoin leading to
    increasing probe cost, thus affecting tuple
    output rate.
  • Eager purge is best strategy for minimizing join
    state.
  • Lazy purge with appropriate purge threshold
    provides significant advantage in increasing
    tuple output rate.
Write a Comment
User Comments (0)
About PowerShow.com