CPSC 504: Data Management Discussion on Chandra - PowerPoint PPT Presentation

About This Presentation
Title:

CPSC 504: Data Management Discussion on Chandra

Description:

... find a (logically) simpler expression Q' s.t. Q' Q. ... If so, can evaluate q() in O(1) time. 9/28/09. 8. Example of ... valuation that bears witness to ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 21
Provided by: kaa8
Category:

less

Transcript and Presenter's Notes

Title: CPSC 504: Data Management Discussion on Chandra


1
CPSC 504 Data ManagementDiscussion on
ChandraMerlin 1977
  • Laks V.S. Lakshmanan
  • Dept. of CS
  • UBC

2
Problems Studied
  • Efficient computation of relational queries.
  • One of the earliest and pioneering works.
  • Focus on logical optimization.
  • Complexity issues investigated.

3
Key contributions
  • How can we (logically) optimize conjunctive
    queries?
  • Optimize throw away redundant parts.
  • How can we detect/prove that some parts are
    redundant?
  • How hard is it to do so, in general, for
    relational (aka TRC/DRC/RA) queries?
  • How hard is it for CQ?
  • Is the minimized version of a (C)Q unique?
  • An elegant tool for reasoning about CQ
    minimization.

4
What are Conjunctive Queries?
  • Simply put, RA queries involving select, project,
    join/product.
  • In terms of logic, queries that involve
    conjunctions among database predicates and
    existential quantifiers.
  • In terms of Datalog, queries (without recursion
    or negation) of the form
  • p(X1, , Xn) ? q1(Y1, , Yk), , qm(Z1, , Zp),
    where Xs must be among the Ys and Zs.
  • Additional (built-in) predicates may be allowed.
  • Constant arguments, comparisons with constants,
    between variables.

5
Conjunctive Queries
  • CM77 uses (much) older notation using additional
    relational operators incl. generalized
    projection.
  • We will explain their results using the much
    simpler Datalog notation.
  • So, what is query minimization?
  • Let Q(D) denote the output of query Q on input
    database D.
  • Q1 ? Q2 ? Q1(D) ? Q2(D) on all input DB D.

6
What is query minimization?
  • Q1 ?? Q2 ? Q1(D) Q2(D) on all input DB D.
  • Note
  • set semantics works when we dont care about
    multiplicity of occurrences.
  • Doesnt work when we care about aggregates. ?
    need bag semantics.
  • Goal Given Q, find a (logically) simpler
    expression Q s.t. Q ?? Q.
  • Usually achieved by looking for redundant parts
    of Q which can be tossed.

7
Motivation for CQ Containment
  • select S.starName, M.title
  • from Movie M, Movie N, starsIn S
  • where M.titleS.title and N.yearM.year

Can be shown to be equivalent to
  • select S.starName, M.title
  • from Movie M, Movie N, starsIn S
  • where M.titleS.title

Intuition? How does this help?
8
Example of query minimization
  • E.g. q() ? r(X1,X2), r(X3,X2), r(X5,X2),
    r(X1,X4), r(X3,X4), r(X5,X4), r(X1,X6), r(X3,X6),
    r(X5,X6).
  • I.e., do there exist tuples in r which fit the
    above pattern?
  • ?a trivial O(n6) algorithm, where n constants
    in the DB!
  • Can show answer to this query is true iff r
    contains at least one tuple.
  • If so, can evaluate q() in O(1) time.

9
Example of query minimization
  • Proof Suppose q() is true. Trivially, r must
    be non-empty. Conversely, suppose r contains just
    one tuple r(1,2). Then by mapping X1, X3, X5 ? 1
    and X2, X4, X6 ? 2, we can derive the answer
    true. ?
  • Lemma 1 in paper illustrates some technical
    difficulties. Is ?? always an equivalence
    relation?
  • How can we minimize queries?
  • We have covered Sections 1-3, from a different
    perspective.

10
A model of query eval.
  • Q p(X1, , Xn) ? q1(Y1, , Yk), , qm(Z1, ,
    Zp).
  • Input DB D.
  • A valuation a function that maps variables to
    constants in D and constants in Q if any, to
    themselves.
  • Under a valuation, the atoms in Qs body are true
    OR false in D.
  • Q(D) ??(p(X1, , Xn)) ?? is a valuation that
    makes all atoms in Qs body true in D.

11
An example
  • p(X,Y) ? q(X,Z), r(Z,Y), q(X,W).
  • D q(1,2), r(2,3).
  • ?? X?1, Y?3, Z?2, W?2.
  • ?? makes Qs body true in D. Is there any other
    such valuation?
  • Q(D) p(1,3).

12
Getting back to business
  • Recall, we are trying to figure out what it takes
    for Q1 ? Q2, where both are CQs.
  • Note Q1 ? Q2 iff on every DB D, for every
    valuation ?1 that makes Q1s body true, there is
    a valuation ?2 that makes Q2s body true and
    further both map their respective heads to the
    same tuple.

13
Key result of CM77
  • Theorem Q1 ? Q2 iff there exists a homomorphism
    from Q2 to Q1.
  • Q1 p(X1,,Xn) ? g1(Y1,,Ym), , gj(Z1, ,ZK).
  • Q2 p(U1,,Un) ? s1(V1,,Vr), , si(W1,,Wt).
  • What is a homomorphism?
  • h Vars(Q2) ? Vars(Q1) s.t. h turns each atom in
    Q2s body into an atom in Q1s body and turns
    Q2s head into Q1s head.

14
Example revisited
  • Q p(X,Y) ? q(X,Z), r(Z,Y), q(X,W).
  • Q p(X,Y) ? q(X,Z), r(Z,Y).
  • Claim Q ? Q.
  • Proof Q ? Q trivially (why?). Now, Q ? Q as
    well, since h X?X, Y?Y, Z?Z, W?Z is a
    homomorphism from Q to Q.
  • Exercise Show that the CQ on page 7 is
    equivalent to q() ? r(X1,X2).

15
Proof of Homomorphism Theorem
  • Proof of (?) Suppose a homomorphism h exists
    from Q2 to Q1. Let D be any DB and p(a1, , an)
    be a tuple in Q1(D). Let ? be the valuation that
    bears witness to this. Consider the function
    ???h Vars(Q2)?Constants in D. Its a valuation
    that makes Q2s body true in D. Further,
    ???h(Q2s head) p(a1,, an).

16
Proof (contd.)
  • Proof of (??) Suppose Q1?Q2. Then Q1(D)?Q2(D),
    on every input DB D. Make up a special DB by
    freezing the vars in Q1s body. Think of each
    var as a distinct constant. Q1(D) contains
    p(x1,xn) trivially. So does Q2(D). gt ?a val.
    ? Vars(Q2)?Constants in D that witnesses this.
    But the constants are frozen versions of
    Vars(Q1). Unfreeze them. Then ? with constants
    unfrozen into vars is a homomorphism from Q2 to
    Q1.

17
Revisit our toy example
  • Q p(X,Y) ? q(X,Z), r(Z,Y), q(X,W).
  • Q p(X,Y) ? q(X,Z), r(Z,Y).
  • Frozen db D q(x,z), r(z,y).
  • Q(D) contains p(x,y). The val. witnessing this is
    essentially a homomorphism.
  • Note The proof gives us a nice (if macabre) test
    for Q1 ? Q2? Freeze the body of Q1 ? a special
    db D. Then run Q2 on D and check whether Q2(D)
    contains the frozen head of Q1.
  • The homomorphism is sometimes called a
    containment mapping (c.m.).

18
Summing up CM77
  • Key goal Minimizing CQs by removing redundant
    subgoals (and hence joins).
  • Main test is Q1 ? Q2?
  • Can check by looking for existence of a
    homomorphism, OR by running Q2 on the frozen DB,
    i.e., frozen body of Q1 and checking whether it
    contains the frozen head of Q1 (called chase).

19
Summary (contd.)
  • Paper contains a no. of complexity results.
  • Most relevant to us are
  • Testing containment of arbitrary CQs
    NP-complete.
  • Where does the complexity come from?
  • When no predicate repeats in the body of Q1
    (smaller query), PTIME.

20
Final Remarks
  • CQ containment fundamental to logical QO.
  • CQs with all kinds of bells and whistles added
    containment has been studied.
  • Containment reasoning plays a pivotal role in
    answering queries using views.
Write a Comment
User Comments (0)
About PowerShow.com