Title: On Random Sampling over Joins
1On Random Sampling over Joins
- Surajit Chaudhuri Rajeeve Motwani Vivek
Narasayya - Microsoft Research Stanford University
Microsoft Research - Compiled by
- Arjun Dasgupta
2CONTENTS
- The difficulty of join sampling
- Semantic and algorithms of sample
- Two previous sampling strategies
- New strategies for join sampling
- Experiments results
3- SAMPLE (R1gtltR2,f)
- ?
- SAMPLE (R1,f) gtlt SAMPLE (R2,f)
4STRATEGY USED
- Obtain SAMPLE (R1gtltR2,f) from non-uniform samples
of R1 and R2
5The Difficulty of Join Sampling -Example
- Suppose that we have the relations
6TECHNIQUES FOR SAMPLING
- Black Box U1 (un-weighted)
- Black Box U2 (un-weighted)
- Black Box WR1 (weighted)
- Black Box WR2 (weighted)
7Black-Box U2 Given relation R with n tuples,
generate an unweighted WR sample of size r.
- 1.
- 2. Initialize reservoir array A1..r with r
dummy values. - 3. While tuples are streaming by do begin
(a) get
next tuple t
(b)
(c) for
j1 to r set Aj to t with probability 1/N end
8Black-Box WR2 Given relation R with n tuples,
generate a weighted WR sample of size r.
- 1.
- 2. Initialize reservoir array A1r with r dummy
values. - 3. While tuples are streaming by do begin
(a) get next tuple t with weight w(t)
(b)
(c) for j1 to r do set Aj to t with prob.
w(t)/W end.
9The Classification of the Problem
- Case A No information is available for
either or . - Case B No information is available for
but indexes and /or statistics are available for
. - Case C Indexes/statistics are available for
and .
10Previous Sampling Strategies
- Strategy Naive-Sample
- 1. Compute the join .
- 2. As the tuples of J stream by, use Black-Box
U1 - or U2 to produce
.
11Previous Sampling Strategies
- Strategy Olken-Sample
- 1. Let M be an upper bound on for all
. - 2.repeat
- (a) Sample a tuple uniformly at
random. - (b) Sample a random tuple from
among all - tuples that have
. - (c) Output with probability
, and - with remaining probability reject the
sample. - Until r tuples have been produced.
12New Strategies for Join Sampling
- Strategy Stream Sample
- 1. Use Black-Box WR1 or WR2 to produce a WR
sample of size r, where the weight for
a tuple is set to
- 2. While tuples of are streaming by do begin
- (a) get next tuple and let
- (b) sample a random tuple from
among all - tuples that have
- (c) output .
end.
13New Strategies for Join Sampling
- Strategy Stream Sample is more efficiency then
Olken
1. No information is required for -
case B.
2. No tuple is
rejected after computing the join .
3.
Only one iteration is needed for each output
tuple.
14New Strategies for Join Sampling
- Strategy Group Sample
- 1. Use Black-Box WR1 or WR2 to produce a WR
sample of size r, where the weight
for a tuple is set to
. - 2. Let consist of the tuples
. Produce
whose tuples are grouped by s tuples
that generated them. - 3. Use r invocations of Black-Box U1 or U2 to
sample r sample, one of each group.
15New Strategy for Join Sampling
- Strategy Frequency-Partition-Sample
16Experimental Results
17Experimental Results
18Experimental Results
19Summery
- The difficulty of join sampling- example.
- The classification of the problem - 3 cases.
- Naive-sample
Olken-sample previous
strategies - Stream-sample
Group-sample
new strategies Frequency-partition-s
ample - Conclusion The new strategies are better then
the earlier techniques.