Title: XJoin: A Reactively-Scheduled Pipelined Join Operator
1XJoin A Reactively-Scheduled Pipelined Join
Operator
- IEEE Bulletin, 2000
- by Tolga Urhan and Michael J. Franklin
Based on a talk prepared by Asima Silva Leena
Razzaq
2Goal of XJoin
- Efficiently evaluate equi-join in online query
processing over distributed data sources - Optimization objectives
- Having small memory footprint
- Fast initial result delivery
- Hiding intermittent delays in data arrival
3Outline
- Hash Join History
- Motivation of XJoin
- Challenges in Developing XJoin
- Three Stages of XJoin
- Preventing Duplicates
- Experimental Results
- Conclusion
4Classic Hash Join
- 2-phase build and probe
- Only one table is hashed in memory
2. Probe
1. Build
5Hybrid Hash Join
- One table is hashed both to disk and memory
(partitions) - G. Graefe, Query Evaluation Techniques for Large
Databases. ACM 1993.
6Symmetric Hash Join (Pipelined)
- Both tables are hashed (both kept in main memory
only) - A. Wilschut, P. M.G. Apers, Dataflow Query
Execution in a Parallel Main-Memory Environment,
DPD 1991.
OUTPUT
Source S
Source R
7Problem of SHJ
- Memory intensive
- Wont work for large input streams.
- Wont allow for many joins to be processed in a
pipeline (or even in parallel).
8New Problem in Online Query Processing over
Distributed Data Sources
- Unpredictable data access due to link congestion,
load balances, etc. - Three classes of delays
- Initial Delay first tuple arrives from remote
source more slowly than usual - Slow Delivery data arrives at a constant, but
slower than expected rate - Bursty Arrival data arrives in a fluctuating
manner
9Question
- Why are delays undesirable?
- Prolong the time for first output
- Slow the processing if wait for data to first be
there before acting - If too fast, you want to avoid loosing any data
- Waste time if you sit idle while no data is
coming - Unpredictable, one single strategy wont work
10Motivation of XJoin
- Produce results incrementally when available
- Tuples returned as soon as produced
- Allow progress to be made when one or more
sources experience delays by - Background processing performed on previously
received tuples so results are produced even when
both inputs are stalled
11XJoin Design
- Tuples are stored in partitions (Hash Join)
- A memory-resident (m-r) portion
- A disk-resident (d-r) portion
12(No Transcript)
13Challenges in Developing XJoin
- Manage flow of tuples between memory and
secondary storage (when and how to do it) - Control background processing when inputs are
delayed (reactive scheduling idea) - Ensure the full answer is produced
- Ensure duplicate tuples are not produced
- Provide both quick initial result as well as good
overall throughput
14XJoin Stages
- XJoin proceeds in 3 stages (separate threads)
M M
M D
D D
151st Stage Memory-to-Memory Join
M E M O R Y
SOURCE-B
SOURCE-A
161st Stage Memory-to-Memory Join
- Join processing continues as long as
- Memory permits, and
- One of the inputs is producing tuples
- If memory is full, one partition is picked to be
flushed to disk and append to the end of
disk-resident portion - If no new input, then stage 1 is blocked and
stage 2 starts
17Why Stage 1?
- In-memory operations are much faster and cheaper
than on-disk operations, thus guaranteeing result
are produced as soon as possible.
18Question
- What does the 2nd Stage do?
- When does the 2nd Stage start?
- Hints
- What occurs when data input (tuples) are too
large for memory? - Answer
- The 2nd Stage joins Memory-to-Disk
- Occurs when both the inputs are blocking
19Stage 2
202nd Stage Memory-to-Disk Join
- Activated when 1st Stage is blocked
- Performs 3 steps
- Choose the partition according to throughput and
size of partition from one source - Use tuples from d-r portion to probe m-r portion
of other source and output matches, till d-r
completely processed - Check if either input resumed producing tuples.
If yes, resume 1st Stage. If no, choose another
d-r portion and continue the 2nd Stage.
21Controlling 2nd Stage
- Cost of 2nd Stage is hidden when both inputs
experience delays - Tradeoff ?
- What are the benefits of using the second stage?
- Produce results when input sources are stalled
- Allows varying input rates
- What is the disadvantage?
- The second stage must complete a d-r portion
before checking for new input (overhead) - To address the tradeoff, use an activation
threshold - Pick a partition likely to produce many tuples
right now
223rd Stage Disk-to-Disk Join
- Clean-up stage
- Assume that all data for both inputs has arrived
- Assume that 1st and 2nd stage have completed
- Why is this step necessary?
- Completeness of answer make sure that all result
tuples are being produced. - Reason some tuples in disk-resident portions may
not have chance to join each other.
23Preventing Duplicates
- When could duplicates be produced?
- Duplicates could be produced in both the 2nd and
3rd stages which may perform overlapping work. - How to address it?
- XJoin prevents duplicates with timestamps.
- When address this?
- During processing when trying to join two tuples.
24Time Stamping Part 1
- 2 fields are added to each tuple
- Arrival TimeStamp (ATS)
- Indicates the time when the tuple first arrived
in memory - Departure TimeStamp (DTS)
- Indicates the time when the tuple was flushed to
disk - ATS, DTS indicates when tuple was in memory
- When did two tuples get joined in the 1st state?
- If Tuple As DTS is within Tuple Bs ATS, DTS
- Tuples that meet this overlap condition are not
considered for joining at the 2nd or 3rd stage
25Detecting Tuples Joined in 1st Stage
- Tuples joined in first stage
- B1 arrived after A and before A was flushed to
disk
- Tuples not joined in first stage
- B2 arrived after A and after A was flushed to disk
26Time Stamping Part 2
- For each partition, keep track of
- ProbeTS time when a 2nd stage probe was done
- DTSlast the DTS of the last tuple of the
disk-resident portion - Several such probes may occur
- Keep an ordered history of such probe descriptors
- Usage
- All tuples before and including at time DTSlast
were joined in stage 2 with all tuples in main
memory at time ProbeTS
27Detecting Tuples Joined in 2nd stage
Partition 2
DTSlast
ProbeTS
ATS
DTS
20
340
350
550
700
900
Tuple A
100
200
overlap
Partition 2
100
300
800
900
Tuple B
500
600
ATS
DTS
History list for the corresponding partition.
All A tuples in Partition 2 up to DTSlast
350, Were joined with m-r tuples that arrived
before Partition 2s ProbeTS.
28Experiments
- HHJ (Hybrid Hash Join)
- XJoin (with 2nd stage and with caching)
- XJoin (without 2nd stage)
- XJoin (with aggressive usage of 2nd stage)
29Case 1 Slow NetworkBoth Sources Are Slow
30Case 1 Slow NetworkBoth Sources Are Slow
(Bursty)
- XJoin improves delivery time of initial answers
-gt interactive performance - The reactive background processing is an
effective solution to exploit intermittent delays
to keep continued output rates - Shows that 2nd stage is very useful if there is
time for it
31Case 2 Fast NetworkBoth Sources Are Fast
32Case 2 Fast NetworkBoth Sources Are Fast
- All XJoin variants deliver initial results
earlier. - XJoin also can deliver the overall result in
equal time to HHJ - HHJ delivers the 2nd half of the result faster
than XJoin. - 2nd stage cannot be used too aggressively if new
data is coming in continuously
33Conclusion
- Can be conservative on space (small footprint)
- Can produce initial result as early as possible
- Can hide intermittent data delays
- Can be used in conjunction with online query
processing to manage data streams (limited)
34How to Further Optimize XJoin?
- Resuming Stage 1 as soon as data arrives
- Removing no-longer-joining tuples timely
- More
35References
- Urhan, Tolga and Franklin, Michael J. XJoin
Getting Fast Answers From Slow and Bursty
Networks. - Urhan, Tolga and Franklin, Michael J. XJoin A
Reactively-Scheduled Pipelined Join Operator. - Hellerstein, Franklin, Chandrasekaran, Deshpande,
Hildrum, Madden, Raman, and Shah. Adaptive Query
Processing Technology in Evolution. IEEE Data
Engineering Bulletin, 2000. - Hellerstein and Avnur, Ron. Eddies Continuously
Adaptive Query Processing. - Babu and Wisdom, Jennifer. Continuous Queries
Over Data Streams.
36Stream New Query Context
- Challenges faced by XJoin
- potentially unbounded growing join state
- Indefinite delay of some join results
- Solutions
- Exploit semantic constraints to remove
no-longer-joining data timely - Constraints sliding window, punctuations
37Punctuation
- Punctuation is predicate on stream elements that
evaluates to false for every element following
the punctuation.
ID
Name
Age
no more tuples for students whose age are less
than or equal to 18!
9961234
Edward
17
9961235
Justin
19
9961238
Janet
18
(0, 18
9961256
Anna
20
38An Example
Open Stream
item_id seller_id open_price timestamp 1080
jsmith 130.00 Nov-10-03 90300 lt1080, ,
, gt 1082 melissa 20.00 Nov-10-03
91000 lt1082, , , gt
Query For each item that has at least one bid,
return its bid-increase value. Select
O.item_id, Sum (B.bid_price -
O.open_price) From Open O, Bid B Where
O.item_id B.item_id Group by O.item_id
Bid Stream
item_id bidder_id bid_price timestamp 1080
pclover 175.00 Nov-14-03 82700 1082
smartguy 30.00 Nov-14-03 83000 1080
richman 177.00 Nov-14-03 85200 lt1080, , ,
gt
Open Stream
Group-byitem_id (sum())
Joinitem_id
Out1 (item_id)
Out2 (item_id, sum)
Bid Stream
No more bids for item 1080!
39PJoin Execution Logic
3
3
2
Join State (Memory-Resident Portion)
State of Stream A (Sa)
State of Stream B (Sb)
Hash Table
Hash Table
Purge Cand. Pool
Purge Cand. Pool
3 5 3 9 9
3
Punct. Set (PSb)
Punct. Set (PSa)
1
3
lt10
4
Join State (Disk-Resident Portion)
Hash(ta) 1
Hash Table
Hash Table
5 9 3 5
3
Tuple ta
Stream B
Stream A
40PJoin Execution Logic
Join State (Memory-Resident Portion)
State of Stream A (Sa)
State of Stream B (Sb)
Hash Table
Hash Table
Purge Cand. Pool
Purge Cand. Pool
3 5 3 9 9
Punct. Set (PSb)
Punct. Set (PSa)
3
lt10
Join State (Disk-Resident Portion)
Hash(pa) 1
Hash Table
Hash Table
5 9 3 5
3
Punctuation pa
Stream B
Stream A
41PJoin vs. XJoin Memory Overhead
Tuple inter-arrival 2 milliseconds Punctuation
inter-arrival 40 tuples/punctuation
42PJoin vs. XJoin Tuple Output Rate
Tuple inter-arrival 2 milliseconds Punctuation
inter-arrival 30 tuples/punctuation
43Conclusion
- Memory requirement for PJoin state almost
insignificant compare to XJoins. - Increase in join state of XJoin leading to
increasing probe cost, thus affecting tuple
output rate. - Eager purge is best strategy for minimizing join
state. - Lazy purge with appropriate purge threshold
provides significant advantage in increasing
tuple output rate.