Full Disjunctions: PolynomialDelay Iterators in Action - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Full Disjunctions: PolynomialDelay Iterators in Action

Description:

University of Toronto. Canada. Benny Kimelfeld. Hebrew University. Israel. Yehoshua Sagiv ... Climate. City. Site. Country. 4. Plaza. Toronto. diverse. Canada ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 61
Provided by: csHu
Category:

less

Transcript and Presenter's Notes

Title: Full Disjunctions: PolynomialDelay Iterators in Action


1
Full Disjunctions Polynomial-Delay Iterators in
Action
VLDB 2006 Seoul, Korea
2
Computing Full Disjunctions
  • The full disjunction is a relational operator
    that maximally combines data from several
    relations
  • It extends the natural join by allowing
    incompleteness
  • It extends the binary outerjoin to many relations
  • This paper presents algorithms and optimizations
    for computing full disjunctions
  • Theoretically, full disjunctions are more
    tractable than previously known
  • Practically, a significant improvement over the
    state-of-art, an iterator-like evaluation

3
Contents
  • Full Disjunctions
  • Complexity
  • Contributions
  • Algorithms
  • Algorithm NLOJ for Tree-Structured Schemes
  • Algorithm PDelayFD for General Schemes
  • Algorithm BiComNLOJ - Main Algorithm
  • Experimental Results
  • Conclusion

4
Contents
  • Full Disjunctions
  • Complexity
  • Contributions
  • Algorithms
  • Algorithm NLOJ for Tree-Structured Schemes
  • Algorithm PDelayFD for General Schemes
  • Algorithm BiComNLOJ - Main Algorithm
  • Experimental Results
  • Conclusion

5
The Natural Join Operator
Climates
Accommodations
Sites
Climates Accommodations Sites
6
The Natural Join Misses Information
Climates
Accommodations
Sites
Bahamas is not in Sites, so the natural join
misses it
Climates Accommodations Sites
7
The Natural Join Misses Information
Climates
Accommodations
Bahamas is not in Sites, so the natural join
misses it
Mouth Logan is not in a city, hence missed
Climates Accommodations Sites
8
The Natural Join Misses Information
A looser notion of join is neededone that
enables joining tuples from some of the tables
Climates
Accommodations
Bahamas is not in Sites, so the natural join
misses it
Mouth Logan is not in a city, hence missed
Climates Accommodations Sites
9
The Natural Join Operator
A tuple of the join corresponds to a set of
tuples from the source relations
Climates
Accommodations
Sites
Join consistent Connected No Cartesian
product Complete One tuple from each relation
Climates Accommodations Sites
10
Join-Consistent Sets of Tuples
A set T of tuples is join-consistent if every two
tuples of T are join-consistent
Two tuples t1 and t2 are join-consistent if for
every common attribute A 1. t1A and t2A are
non-null 2. t1A t2A
11
Connected Sets of Tuples
A set of tuples is connected if its join graph is
connected
The join graph of a set T of tuples
  • The nodes are the tuples of T
  • An edge between every two tuples with a common
    attribute

12
Natural Join (w/o Cartesian Product)
Each tuple of the result corresponds to a set T
of tuples from the source relations
13
Full Disjunction (Galindo-Legaria 1994)
Each tuple of the result corresponds to a set T
of tuples from the source relations
T is join consistent
1.
14
An Example of a Full Disjunction
Climates
Accommodations
Sites
R
FD(R)
15
An Example of a Full Disjunction
Climates
Accommodations
Sites
R
FD(R)
16
An Example of a Full Disjunction
Climates
Accommodations
Sites
R
FD(R)
17
An Example of a Full Disjunction
Climates
Accommodations
Sites
R
FD(R)
18
An Example of a Full Disjunction
Climates
Accommodations
Sites
R
FD(R)
19
An Example of a Full Disjunction
Climates
Accommodations
Sites
R
FD(R)
20
Padding Joined Tuple Sets with Nulls
21
The Outerjoin Operator
The outerjoin of two relations R1 and R2
22
Example of an Outerjoin
Climates
Accommodations
23
Combining Relations using Outerjoins
The outerjoin operator is not associative For
more than two relations, the result depends on
the order in which the outerjoin is applied
In general, outerjoins cannot maximally combine
relations (no matter what order is used)
Outerjoin is not suitable for combining more than
two relations!
24
Contents
  • Full Disjunctions
  • Complexity
  • Contributions
  • Algorithms
  • Algorithm NLOJ for Tree-Structured Schemes
  • Algorithm PDelayFD for General Schemes
  • Algorithm BiComNLOJ - Main Algorithm
  • Experimental Results
  • Conclusion

25
Efficiency of Evaluation
The full-disjunction operator (as well as other
operators like the Cartesian product or the
natural join) can generate an exponential (in
the input size) number of tuples
Polynomial running time is not a suitable
yardstick
The usual notion Polynomial time in the
combined size of the input and the output
26
History of Algorithms for Full Disjunctions
This paper linear dependence on F
number of relations number of tuples in the
DB number of tuples in the FD
F is typically very large Can be exponential in
the size of the database
n N F
27
Polynomial Delay
One way to obtain an evaluation with a running
time linear in the output is to devise an
algorithm that acts as an iterator with an
efficient next() operator, that is,
An enumeration algorithm that runs with
polynomial delay
An enumeration algorithm runs with polynomial
delay if the time between every two successive
answers is polynomial in the size of the input
28
Other Benefits of Polynomial Delay
  • Incremental evaluation
  • First tuples are generated quickly
  • Full disjunctions are large, yet the user need
    not wait for the whole result to be generated
  • Suitable for Web applications, where users expect
    to get the first few pages quickly
  • In addition, the user can decide anytime that
    enough information has been shown
  • Enable parallel query processing
  • While one processor generates the FD tuples,
    other processors apply further processing

29
Contents
  • Full Disjunctions
  • Complexity
  • Contributions
  • Algorithms
  • Algorithm NLOJ for Tree-Structured Schemes
  • Algorithm PDelayFD for General Schemes
  • Algorithm BiComNLOJ - Main Algorithm
  • Experimental Results
  • Conclusion

30
Main Contributions
Substantial improvement over the state-of-art is
proved theoretically and experimentally
1. First algorithm for computing full
disjunctions with polynomial delay
2. First algorithm for computing full
disjunctions in time linear in the output
3. A general optimization technique for computing
full disjunctions Division into biconnected
components
31
Contents
  • Full Disjunctions
  • Complexity
  • Contributions
  • Algorithms
  • Algorithm NLOJ for Tree-Structured Schemes
  • Algorithm PDelayFD for General Schemes
  • Algorithm BiComNLOJ - Main Algorithm
  • Experimental Results
  • Conclusion

32
Our Algorithms
Algorithm NLOJ Tree Schemes
Algorithm PDelayFD General Schemes
Division into Biconnected Components Optimization
Algorithm BiComNLOJ Main Algorithm - General
Schemes
33
Contents
  • Full Disjunctions
  • Complexity
  • Contributions
  • Algorithms
  • Algorithm NLOJ for Tree-Structured Schemes
  • Algorithm PDelayFD for General Schemes
  • Algorithm BiComNLOJ - Main Algorithm
  • Experimental Results
  • Conclusion

34
Tree Schemes
Scheme graphs w/o cycles
In the scheme graph, the relation schemes are the
nodes and there is an edge between every two
schemes with one or more common attributes
35
Left-Deep Sequence of Outerjoins
R a set of relations with a tree scheme R1,,Rn
a connected-prefix order of R
Proposition
FD(R) (((R1 R2) R3) ) Rn
1. Compute a connected-prefix order of R 2. Apply
outerjoins in a left-deep order
36
Connected-Prefix Order of Relations
R1
R5
R2
R3
R6
R4
R7
R1
R3
R2
R7
R4
R5
R6
37
Achieving Polynomial Delay
1. Compute a connected-prefix order of R 2. Apply
outerjoins in a left-deep order
R1

Problem exp. delay
Solution use iterators
38
Iterators
To obtain polynomial delay, we use iterators
  • Operate on top of an enumeration algorithm
  • Implement next() by controlling the execution

Algorithm
Iterator
next()
39
Using Iterators for Outerjoins
R1

40
Outerjoins are not Always Applicable
It is not always possible to formulate a full
disjunction as a left-deep sequence of outerjoins

Rajaraman and Ullman PODS 96 Some full
disjunctions cannot be formulated as expressions
of outerjoins (i.e., with arbitrary placement of
parentheses)
41
Contents
  • Full Disjunctions
  • Complexity
  • Contributions
  • Algorithms
  • Algorithm NLOJ for Tree-Structured Schemes
  • Algorithm PDelayFD for General Schemes
  • Algorithm BiComNLOJ - Main Algorithm
  • Experimental Results
  • Conclusion

42
About the Algorithm
  • Unlike NLOJ, the next algorithm, PDelayFD, is
    applicable to all schemes (and not just trees)
  • Algorithm PDelayFD has a polynomial delay, but
    the delay is larger than that of NLOJ
  • Nevertheless, PDelayFD by itself is a significant
    improvement over the state-of-art

43
Shifting a Maximal JCC Tuple Set T
t-shifting T
T
1. Add t to T 2. Extract max. JCC subset
containing t 3. Extend to a maximal JCC set
t-shift of T
t
t
t
44
Algorithm PDelayFD
Validate that the t-shift is not already in Q or
C
1. Generate a max. JCC set T0 2. Insert T0 into Q
PDelayFD(R) computes FD(R) with polynomial delay
C
Q
Repeat until Q is empty 1. Move some T from Q
to C 2. Print the join of T, padded with nulls
3. Insert into Q a t-shift of T for all
tuples t in the database

Output
45
Contents
  • Full Disjunctions
  • Complexity
  • Contributions
  • Algorithms
  • Algorithm NLOJ for Tree-Structured Schemes
  • Algorithm PDelayFD for General Schemes
  • Algorithm BiComNLOJ - Main Algorithm
  • Experimental Results
  • Conclusion

46
NLOJ vs. PDelayFD
?
PDelayFD
NLOJ
  • Shorter delays
  • Less space
  • Simpler to impl.

Our approach divide and conquer
47
Biconnected Components
R1
R5
R2
R3
R8
R6
R4
R7
Biconnected component
A maximal subset B of relations, s.t. the scheme
graph has two (or more) disjoint paths between
every two relations of B
48
Left-Deep Sequence of Outerjoins
R a set of relations
Theorem
There exists an (efficiently computable) order
B1,,Bk of the biconnected components of R,
s.t. FD(R) (((FD(B1) FD(B2)) )
FD(Bk)
Optimized Algorithm
1. Compute the biconnected components of R 2.
Compute the full disjunction of each component 3.
Apply outerjoins in a suitable order
49
BiComNLOJ a Naïve Attempt
Each FD(Bi) can be exponential in the input
1. Divide R into biconnected components ?
B1,Bk in a suitable order
Non-polynomial delay!
2. Compute FD(B1),,FD(Bk) - using PDelayFD

3. Using NLOJ, compute (((FD(B1)
FD(B2)) ) FD(Bk)
Solution
50
Retaining Polynomial Delay 1st Problem
R2
R6
For simplification, assume only two components
R3
R1
R7
R5
R4
R8
B1
B2
  • After generating a tuple t of FD(B1), we need to
    generate all tuples of FD(B2) that can join t
  • Non-polynomial delay if all of FD(B2) is computed
    for finding these tuples!
  • Solution
  • PDelayFD can be modified so that it generates
    only those tuples of FD(B2) that can join t

Details in the proceedings
51
Retaining Polynomial Delay 2nd Problem
R2
R6
For simplification, assume only two components
R3
R1
R7
R5
R4
R8
B1
B2
  • The last step is to generate all tuples of FD(B2)
    that cannot be joined with tuples of FD(B1)
  • However, this task is by itself NP-hard!
  • Solution When generating all tuples of FD(B2)
    that can be joined with some tuple of FD(B1), we
    collect enough information for generating the
    remaining tuples of FD(B2)

Details in the proceedings
52
Contents
  • Full Disjunctions
  • Complexity
  • Contributions
  • Algorithms
  • Algorithm NLOJ for Tree-Structured Schemes
  • Algorithm PDelayFD for General Schemes
  • Algorithm BiComNLOJ - Main Algorithm
  • Experimental Results
  • Conclusion

53
Experimental Setting
Algorithms PDelayFD, BiComNLOJ (main)
IncrementalFD (CS05, state-of-art)
PosgreSQL (open source)
HW Pentium4, 1.6GHZ, 512MB RAM
  • Synthetic data (randomly generated)
  • Fixed schemes

54
State-of-Art vs. Main Algorithm
IncrementalFD (state of art, CS05)
BiComNJOJ our main algorithm
Average Delay (msec)
Number of Tuples in each Relation
BiComNLOJ is a substantial improvement over the
state-of-art
55
Division into Biconnected Components
PDelayFD (no division to b.c.c.)
BiComNJOJ our main algorithm
Average Delay (msec)
Number of Tuples in each Relation
Division reduces delays (amount depends on the
scheme)
56
Behavior of Delay
Measure the delay before each generated tuple
IncrementalFD (state of art, CS05)
BiComNJOJ our main algorithm
Delay (msec)
Tuple Number
While IncrementalFD has a slowdown, the delay of
BiComNLOJ remains almost constant
57
Contents
  • Full Disjunctions
  • Complexity
  • Contributions
  • Algorithms
  • Algorithm NLOJ for Tree-Structured Schemes
  • Algorithm PDelayFD for General Schemes
  • Algorithm BiComNLOJ - Main Algorithm
  • Experimental Results
  • Conclusion

58
Summary
Full Disjunction An associative extension of
the outerjoin operator to an arbitrary number of
relations
3 Algorithms for computing FD
NLOJ Nested-Loop Outerjoin Tree-Structured Schemes
PDelayFD Polynomial-Delay Full Disjunction Genera
l Schemes
BiComNLOJ Combine first 2, deploy div. into
biconnected components General Schemes
59
Contributions
  • Substantial improvement of evaluation time over
    the state-of-art
  • Proved theoretically and experimentally
  • Full disjunctions can be computed with polynomial
    delay and in time linear in the output size
  • Optimization techniques for computing FDs
  • Implementation within PostgreSQL (ongoing)
  • Incorporating our algorithms into an SQL
    optimizer
  • E.g., some operators can be pushed through the FD
  • Not discussed here, appears in the proceedings

60
Thank you.
Questions?
Write a Comment
User Comments (0)
About PowerShow.com