Title: On Random Sampling over Joins
1On Random Sampling over Joins
- Surajit Chaudhuri Rajeeve Motwani Vivek
Narasayya - Microsoft Research Stanford University
Microsoft Research
2Subtitles
- The difficulty of join sampling - Example.
- Semantic and algorithms of sample
- Two previous sampling strategies
- New strategies for join sampling
- Experiments results
3The Difficulty of Join Sampling -Example
- Suppose that we have the relations
4Black-Box U2 Given relation R with n tuples,
generate an unweighted WR sample of size r.
- 1.
- 2. Initialize reservoir array A1..r with r
dummy values. - 3. While tuples are streaming by do begin
(a) get
next tuple t
(b)
(c) for
j1 to r set Aj to t with probability 1/N end
5Black-Box WR2 Given relation R with n tuples,
generate a weighted WR sample of size r.
- 1.
- 2. Initialize reservoir array A1r with r dummy
values. - 3. While tuples are streaming by do begin
(a) get next tuple t with weight w(t)
(b)
(c) for j1 to r do set Aj to t with prob.
w(t)/W end.
6The Classification of the Problem
- Case A No information is available for
either or . - Case B No information is available for
but indexes and /or statistics are available for
. - Case C Indexes/statistics are available for
and .
7Previous Sampling Strategies
- Strategy Naive-Sample
- 1. Compute the join .
- 2. As the tuples of J stream by, use Black-Box
U1 - or U2 to produce
.
8Previous Sampling Strategies
- Strategy Olken-Sample
- 1. Let M be an upper bound on for all
. - 2.repeat
- (a) Sample a tuple uniformly at
random. - (b) Sample a random tuple from
among all - tuples that have
. - (c) Output with probability
, and - with remaining probability reject the
sample. - Until r tuples have been produced.
9New Strategies for Join Sampling
- Strategy Stream Sample is more efficiency then
Olken
1. No information is required for -
case B.
2. No tuple is
rejected after computing the join .
3.
Only one iteration is needed for each output
tuple.
10New Strategies for Join Sampling
- Strategy Stream Sample
- 1. Use Black-Box WR1 or WR2 to produce a WR
sample of size r, where the weight for
a tuple is set to
- 2. While tuples of are streaming by do begin
- (a) get next tuple and let
- (b) sample a random tuple from
among all - tuples that have
- (c) output .
end.
11New Strategies for Join Sampling
- Strategy Group Sample
- 1. Use Black-Box WR1 or WR2 to produce a WR
sample of size r, where the weight
for a tuple is set to
. - 2. Let consist of the tuples
. Produce
whose tuples are grouped by s tuples
that generated them. - 3. Use r invocations of Black-Box U1 or U2 to
sample r sample, one of each group.
12New Strategy for Join Sampling
- Strategy Frequency-Partition-Sample
13Experimental Results
14Experimental Results
15Experimental Results
16Summery
- The difficulty of join sampling- example.
- The classification of the problem - 3 cases.
- Naive-sample
Olken-sample previous
strategies - Stream-sample
Group-sample
new strategies Frequency-partition-s
ample - Conclusion The new strategies are better then
the earlier techniques.