Computing Full Disjunctions - PowerPoint PPT Presentation

About This Presentation
Title:

Computing Full Disjunctions

Description:

movie. director. acted in. w2. w3. w4. date of birth. name. language. A Query ... returns actor-movie pairs, such that the. actor played in the movie and was also ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 58
Provided by: csTor
Category:

less

Transcript and Presenter's Notes

Title: Computing Full Disjunctions


1
Computing Full Disjunctions
  • Yaron Kanza
  • Yehoshua Sagiv
  • The Selim and Rachel Benin
  • School of Engineering
  • and Computer Science
  • The Hebrew University of Jerusalem

2
Overview of the Talk
  • OR-semantics and weak semantics for querying
    incomplete data
  • Complexity of query evaluation
  • Full disjunctions as a special case of weak
    semantics
  • Generalizing full disjunctions the join
    constraints are not restricted to be equality
    constraints
  • Lower bounds for some related problems

3
Querying Incomplete Data Requires a Special
Semantics
  • Usually, answers to a query are complete
    assignments of database objects (or values) to
    the query variables
  • Consequently, partial information is lost
  • For example, dangling tuples are lost when
    joining several relations
  • The purpose of outerjoins and full disjunctions
    is to solve this problem, i.e., answers could be
    partial assignments (to some of the variables)

4
Querying Incomplete Semistructured Data
  • In semistructured data, incompleteness of data is
    prevalent
  • OR-semantics and weak semantics were introduced
    so that queries over semistructured data would
    return maximal answers rather than complete
    answers Kanza, Nutt Sagiv 1999

5
In the Semistructured Data Model
  • Both data and queries are labeled rooted directed
    graphs
  • Query nodes are variables
  • Database nodes are objects
  • Matchings are assignments of database objects to
    query variables, such that
  • The database root is assigned to the query root,
    and
  • Labels are preserved

6
1
movie
actor
movie
2
4
3
title
title
name
5
8
date of birth
year
10
Zelig
Antz
year
Woody Allen
language
11
9
1/12/1935
7
6
1998
1983
English
director
acted in
acted in
A Semistructured Database About Movies
7
A Query
v1
actor
movie
name
v3
title
director
v2
w3
w1
language
date of birth
w4
w2
acted in
Under complete semantics, the query returns
actor-movie pairs, such that the actor played in
the movie and was also the director of the movie
8
1
movie
actor
movie
2
4
3
title
title
name
5
8
date of birth
year
10
Zelig
Antz
year
Woody Allen
language
11
9
1/12/1935
7
6
1998
1983
English
director
acted in
v1
actor
acted in
movie
name
v3
title
director
v2
A complete matching of the query variables to
database objects
w3
w1
language
date of birth
w4
w2
acted in
9
Constraints on Complete Matchings
  • The root constraint is satisfied if the query
    root is mapped to the database root
  • A query edge is an edge constraint
  • A query edge with a label l is satisfied if it is
    mapped to a database edge with the same label l

Query Root
Database Root
10
1
movie
actor
movie
2
4
3
title
title
name
5
8
date of birth
year
10
Zelig
Antz
year
Woody Allen
language
11
9
1/12/1935
7
1998
1983
director
acted in
Suppose that Node 6 is missing
acted in
11
1
movie
actor
movie
2
4
3
title
title
name
5
8
date of birth
year
10
Zelig
Antz
year
Woody Allen
11
9
1/12/1935
7
1998
1983
director
acted in
v1
actor
acted in
movie
An incomplete matching
name
v3
title
director
v2
w3
w1
language
date of birth
This matching is maximal
w4
w2
acted in
12
The Reachability Constrainton Partial Matchings
  • A query node v that is mapped to a database
    object o satisfies the reachability constraint if
    there is a path from the query root to v, such
    that all edge constraints along this path are
    satisfied

13
Weak Satisfaction ofEdge Constraints
  • An edge constraint is weakly satisfied if it is
    either
  • Satisfied (as defined earlier), or
  • One (or more) of its nodes is mapped to a null
    value

14
Weak Matchings
  • A partial matching is a weak matching if
  • The root constraint is satisfied
  • The reachability constraint is satisfied by every
    query node that is mapped to a database node
  • Every edge constraint is weakly satisfied

15
1
movie
actor
movie
2
4
3
title
title
name
5
8
date of birth
year
10
Zelig
Antz
year
Woody Allen
11
9
1/12/1935
7
1998
1983
director
acted in
v1
actor
acted in
movie
name
v3
title
director
v2
A weak matching
w3
w1
language
date of birth
w4
w2
w2
acted in
16
1
movie
actor
movie
2
4
3
title
title
name
5
8
date of birth
year
10
Zelig
Antz
year
Woody Allen
11
9
1/12/1935
7
1998
1983
acted in
acted in
A Movie Database
Consider the case where the director edge is
missing
17
1
movie
actor
movie
2
4
3
title
title
name
5
8
date of birth
year
10
Zelig
Antz
year
Woody Allen
11
9
1/12/1935
7
1998
1983
acted in
v1
actor
acted in
movie
An incomplete matching that is not a weak
matching
name
v3
title
director
v2
w3
w1
language
date of birth
w4
w2
w2
acted in
18
OR Matchings
  • A partial matching is an OR matching if
  • The root constraint is satisfied
  • The reachability constraint is satisfied by every
    query node that is mapped to a database node

Differently from a weak matching, in an OR
Matching, an edge constraint does not have to be
weakly satisfied
19
Maximal Matchings
  • Matchings can be represented as tuples (where
    numbers are object ids)
  • A matching t1 subsumes a matching t2 if t1 can be
    obtained from t2 by replacing some nulls in t2
    with non-null values
  • A matching is maximal if no other matching
    subsumes it
  • A query result consists only of maximal matchings

t1(1, 5, 2, null)
t2(1, null, 2, null)
20
More Examples
21
1
movie
actor
movie
2
4
3
title
title
name
5
8
date of birth
year
10
Zelig
Antz
year
Woody Allen
language
11
9
1/12/1935
7
6
1998
1983
English
director
acted in
acted in
The Movie Database Before the Removals
22
1
In the result, the actor must be both an actor
in the movie and the director of the movie
movie
actor
movie
2
4
3
title
title
name
5
8
date of birth
year
10
Zelig
Antz
year
Woody Allen
language
11
9
1/12/1935
7
6
1998
1983
English
director
acted in
v1
actor
acted in
movie
name
v3
title
director
v2
w3
w1
language
A complete matching
It is also a maximal weak matching
It is also a maximal OR-matching
date of birth
w4
w2
acted in
23
1
In the result, if the actor and the movie are
assigned non-null values, then the actor must be
both an actor in the movie and the director of
the movie
movie
actor
movie
2
4
3
title
title
name
5
8
date of birth
year
10
Zelig
Antz
year
Woody Allen
language
11
9
1/12/1935
7
6
1998
1983
English
director
acted in
v1
actor
acted in
movie
name
v3
title
director
v2
w3
w1
language
date of birth
A second maximal weak matching
w4
w2
acted in
24
1
In the result, the actor either played in the
movie, directed the movie, or is not related at
all to the movie
movie
actor
movie
2
4
3
title
title
name
5
8
date of birth
year
10
Zelig
Antz
year
Woody Allen
language
11
9
1/12/1935
7
6
1998
1983
English
director
acted in
v1
actor
acted in
movie
name
v3
title
director
v2
w3
w1
language
date of birth
A maximal OR-matching
w4
w2
acted in
25
Complexity of Evaluating Maximal Weak
Matchingsand Maximal OR Matchings
26
Data Complexity
  • Under data complexity, the time complexity is a
    function of
  • the size of the database

27
Two Alternatives forQuery Evaluation
  • A naïve algorithm computes all matchings and then
    removes subsumed matchings
  • A better algorithm avoids computing all matchings
    ideally it only computes maximal matchings
  • Under data complexity, both algorithms are
    polynomial time

28
Input-Output Complexity
  • Under input-output complexity, the time
    complexity is a function of
  • the size of the query,
  • the size of the database, and
  • the size of the result

29
A Naïve Algorithm vs.A Better Algorithm
  • Under I-O complexity, a naïve algorithm is
    exponential
  • Is there a better algorithm with a polynomial
    time I-O complexity?
  • The answer is positive for DAG queries Kanza,
    Nutt Sagiv 1999

30
Cyclic Queries
  • Theorem For a query Q and a database D,
  • the set of all maximal weak matchings
  • can be computed in O(q3dm2) time, where
  • q is the size of the query, d is the size of the
  • database and m is the size of the result
  • (computing all maximal OR matchings has the
  • same complexity)

31
Full Disjunctions
What is the full disjunction of a set of
relations?
How are full disjunctions related to queries with
incomplete answers?
32
Movies
Actors
Acted-in
Actors-that-Directed
The Full Disjunction of the Given Relations
33
Movies
The Full Disjunction of the Given Relations
The full disjunction does not include subsumed
tuples
34
Movies
Actors
Acted-in
Actors-that-Directed
The Full Disjunction of the Given Relations
The full disjunction does not include tuples that
are based on Cartesian Product rather than join
35
In the Full Disjunctionof a Given Set of
Relations
Every tuple of the input is a part of at least
one tuple of the output
Tuples are joined as in a natural join, padded
with null values
The result includes only maximal connected
portions
36
Motivation for Full Disjunctions
  • Full disjunctions have been proposed by
    Galiando-Legaria as an alternative for outerjoins
    SIGMOD94
  • Rajaraman and Ullman suggested to use full
    disjunctions for information integration PODS96

37
Computing Full Disjunctionsfor ?-acyclic
Relation Schemas
  • Rajaraman and Ullman have shown how to evaluate
    the full disjunction by a sequence of natural
    outerjoins when the relation schemas are
    ?-acyclic
  • Hence, the full disjunction can be computed in
    polynomial time, under input-output complexity,
    when the relation schemas are ?-acyclic

38
Weak Semantics GeneralizesFull Disjunctions
  • Relations can be converted into a semistructured
    database
  • The full disjunction can be expressed as the
    union of several queries that are evaluated under
    weak semantics

39
Example
Movies
Actors
Acted-in
A node is created for each tuple
Edges are added between connected tuples, in both
directions
A root is added, and edges are added from the
root to every node
We use colors instead of labels
Creating The Database
40
Example
Movies
Actors
Acted-in
A node is created for each relation schema
Edges are added between connected schemas, in
both directions
The number of queries is equal to the number of
schemas
In each query, the root is connected to a
different schema
r
Creating The Queries
41
Example
r
Movies
Actors
Acted-in
r
Acted-in
Queries are Evaluated under Weak Semantics
Movies
Actors
42
Example
r
Movies
Actors
Acted-in
r
Acted-in
Movies
Actors
Queries are Evaluated under Weak Semantics
43
Example
r
Movies
Actors
Acted-in
r
Acted-in
Movies
Actors
Queries are Evaluated under Weak Semantics
44
Example
r
Movies
Actors
Acted-in
r
Acted-in
Movies
Actors
Queries are Evaluated under Weak Semantics
45
Example
r
Movies
Actors
Acted-in
r
Acted-in
Movies
Actors
Queries are Evaluated under Weak Semantics
46
Example
r
Movies
Actors
Acted-in
r
Acted-in
Movies
Actors
47
The Algorithm Computes Full Disjunctions in
Polynomial TimeUnder Input-Output Complexity
Theorem The full disjunction of relations r1,
, rn can be computed in O(n5s 2f 2) time, where
n is the number of relations, s is the total
size of all the relations and f is the size of
the result
48
Generalizing Full Disjunctions
  • In a full disjunction, tuples are joined
    according to equality constraints as in a natural
    join (or equi-join)
  • We can generalize full disjunctions to support
    constraints that are not merely equality among
    attributes

49
Example
Movies (m-id, title, year, language,
location) Actors (a-id, name, date-of-birth) Acted
-in (a-id, m-id, role) Actors-that-Directed
(a-id, m-id)
Historical-Events (name, date, description) Histor
ical-Sites (Country, State, City, Site)
50
The General Idea
  • A set of constraints specifies how tuples should
    be joined
  • The queries and the database are constructed
    according to the given constraints
  • A pair of nodes is connected by an edge when it
    satisfies the corresponding constraint
  • Queries are evaluated w.r.t. the database under
    weak semantics

51
Another Way of Generalizing Full Disjunctions
Use OR-Semantics
  • Generate the queries and the database as before,
    but the queries are evaluated under OR-semantics
    (rather than weak semantics)
  • This relaxes the requirement that every pair of
    tuples should be join consistent
  • Instead, a tuple of the full disjunction is only
    required to be generated by database tuples that
    form a connected subgraph, but need not be
    pairwise join consistent

52
Example
Employees (e-id, ename, city, dept-no) Departments
(dept-no, dname, building) Located-in (building,
city, street)
The Full Disjunction
53
Example
Employees (e-id, ename, city, dept-no) Departments
(dept-no, dname, building) Located-in (building,
city, street)
The Full Disjunction under OR-Semantics
54
Two Related Problems
The Projection Problem Computing the projection
of the full disjunction on a given set of
attributes
The Restriction Problem Computing only those
tuples of the full disjunction that are non-null
on a given set of attributes
The projection problem and the restriction
problem cannot be computed in polynomial time
(under input-output complexity) unless PNP
55
Conclusion
  • Cyclic queries can be computed in polynomial time
    (in the size of the query, the database and the
    result) under either OR-semantics or weak
    semantics
  • A reduction of full-disjunction evaluation to
    query evaluation under weak semantics is
    described
  • Using the reduction, full disjunctions can be
    computed in polynomial time (in the size of the
    relation schemas, the relations and the result)

56
Conclusion (continued)
  • Full disjunctions can be generalized in two ways
  • By using OR-semantics instead of weak semantics
  • By joining tuples according to general
    constraints
  • Generalized full disjunctions can be useful in
    the context of data integration from
    heterogeneous sources
  • The projection problem and the restriction
    problem have polynomial-time algorithms (under
    input-output complexity) when the relations have
    ?-acyclic schemas, but not in the general case

57
Thank You
Questions?
Write a Comment
User Comments (0)
About PowerShow.com