CSE 636 Data Integration - PowerPoint PPT Presentation

About This Presentation
Title:

CSE 636 Data Integration

Description:

CSE 636 Data Integration – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 18
Provided by: michailpe
Learn more at: https://cse.buffalo.edu
Category:
Tags: cse | data | integration | mpaa | posse

less

Transcript and Presenter's Notes

Title: CSE 636 Data Integration


1
CSE 636Data Integration
  • Answering Queries Using Views
  • Bucket Algorithm

2
The Bucket Algorithm
  • Each subgoal g of Q must be covered by some
    view
  • Make a list of candidates (buckets) per query
    subgoal
  • Consider combinations of candidates from
    different buckets
  • Not all combos are compatible
  • Keep the compatible ones and minimize them
  • Discard the ones contained in another
  • Take their union

3
The Bucket Algorithm
  • q(X,Y,R) - ForSale(X,Y,C,auto),
    Review(X,R,auto),
  • Y gt 1985
  • Step 1 For each subgoal, put the relevant
    sources into a bucket
  • V1(name, year) - ForSale(name, year, France,
    auto),
  • year gt 1990 would be
    relevant
  • V3(name, year) - ForSale(name, year, France,
    cheese)
  • would be irrelevant
  • Step 2 Take the Cartesian product of the buckets
  • Algorithm produces maximally contained rewriting
  • Ignores interactions between subgoals in Step 1

4
The Bucket Algorithm Example
  • V1(Std,Crs,Qtr,Title) - reg(Std,Crs,Qtr),
    course(Crs,Title),
  • Crs 500, Qtr
    Aut98
  • V2(Std,Prof,Crs,Qtr) - reg(Std,Crs,Qtr),
    teaches(Prof,Crs,Qtr)
  • V3(Std,Crs) - reg(Std,Crs,Qtr), Qtr Aut94
  • V4(Prof,Crs,Title,Qtr) - reg(Std,Crs,Qtr),
    course(Crs,Title),

  • teaches(Prof,Crs,Qtr), Qtr Aut97
  • q(S,C,P) - teaches(P,C,Q), reg(S,C,Q),
    course(C,T),
  • C 300, Q Aut95
  • Step 1 For each query subgoal, put the relevant
    sources into a bucket

5
The Bucket Algorithm Example
  • V1(Std,Crs,Qtr,Title) - reg(Std,Crs,Qtr),
    course(Crs,Title),
  • Crs 500, Qtr
    Aut98
  • V2(Std,Prof,Crs,Qtr) - reg(Std,Crs,Qtr),
    teaches(Prof,Crs,Qtr)
  • V3(Std,Crs) - reg(Std,Crs,Qtr), Qtr Aut94
  • V4(Prof,Crs,Title,Qtr) - reg(Std,Crs,Qtr),
    course(Crs,Title),

  • teaches(Prof,Crs,Qtr), Qtr Aut97
  • q(S,C,P) - teaches(P,C,Q), reg(S,C,Q),
    course(C,T),
  • C 300, Q Aut95
  • P?Prof, C?Crs, Q?Qtr
  • Note Arithmetic predicates dont pose a problem

Buckets
teaches
reg
course
V2
V4
6
The Bucket Algorithm Example
  • V1(Std,Crs,Qtr,Title) - reg(Std,Crs,Qtr),
    course(Crs,Title),
  • Crs 500, Qtr
    Aut98
  • V2(Std,Prof,Crs,Qtr) - reg(Std,Crs,Qtr),
    teaches(Prof,Crs,Qtr)
  • V3(Std,Crs) - reg(Std,Crs,Qtr), Qtr Aut94
  • V4(Prof,Crs,Title,Qtr) - reg(Std,Crs,Qtr),
    course(Crs,Title),

  • teaches(Prof,Crs,Qtr), Qtr Aut97
  • q(S,C,P) - teaches(P,C,Q), reg(S,C,Q),
    course(C,T),
  • C 300, Q Aut95
  • S?Std, C?Crs, Q?Qtr
  • Note V3 doesnt work arithmetic predicates not
    consistent
  • V4 doesnt work S not in the output of V4

Buckets
teaches
reg
course
V2
V1
V4
V2
7
The Bucket Algorithm Example
  • V1(Std,Crs,Qtr,Title) - reg(Std,Crs,Qtr),
    course(Crs,Title),
  • Crs 500, Qtr
    Aut98
  • V2(Std,Prof,Crs,Qtr) - reg(Std,Crs,Qtr),
    teaches(Prof,Crs,Qtr)
  • V3(Std,Crs) - reg(Std,Crs,Qtr), Qtr Aut94
  • V4(Prof,Crs,Title,Qtr) - reg(Std,Crs,Qtr),
    course(Crs,Title),

  • teaches(Prof,Crs,Qtr), Qtr Aut97
  • q(S,C,P) - teaches(P,C,Q), reg(S,C,Q),
    course(C,T),
  • C 300, Q Aut95
  • C?Crs, T?Title

Buckets
teaches
reg
course
V2
V1
V1
V4
V2
V4
8
The Bucket Algorithm Example
  • Step 2
  • Try all combos of views, one each from a bucket
  • Test satisfaction of arithmetic predicates in
    each case
  • e.g., two views may not overlap, i.e., they may
    be inconsistent
  • Desired rewriting union of surviving ones
  • Query rewriting 1
  • q1(S,C,P) - V2(S,P,C,Q), V1(S,C,Q,T),
    V1(S,C,Q,T)
  • no problem from arithmetic predicates (none in
    V2)
  • May or may not be minimal (why?)

teaches
reg
course
V2
V1
V1
V4
V2
V4
9
The Bucket Algorithm Example
  • Unfolding of rewriting 1
  • q1(S,C,P) - r(S,C,Q), t(P,C,Q), r(S,C,Q),
    c(C,T), r(S,C,Q),
  • c(C,T), C 500, Q Aut98, C
    500, Q Aut98
  • Black rs can be mapped to green r S?S, S?S,
    Q?Q
  • Black c can be mapped to green c just extend
    above mapping to T?T
  • Minimized unfolding of rewriting 1
  • q1m(S,C,P) - t(P,C,Q), r(S,C,Q), c(C,T), C
    500, Q Aut98
  • Minimized rewriting 1
  • q1m(S,C,P) - V2(S,P,C,Q), V1(S,C,Q,T)

10
The Bucket Algorithm Example
teaches
reg
course
  • Query Rewriting 2
  • q2(S,C,P) - V2(S,P,C,Q), V1(S,C,Q,T),
    V4(P,C,T,Q)
  • q2(S,C,P) - r(S,C,Q), t(P,C,Q), r(S,C,Q),
  • r(S,C,Q), c(C,T), C 500, Q
    Aut98,
  • r(S,C,Q), c(C,T),
    t(P,C,Q), Q Aut97
  • This combo is infeasible consider the
    conjunction of arithmetic predicates in V1 and V4
  • Query rewriting 3
  • q3(S,C,P) - V2(S,P,C,Q), V2(S,P,C,Q),
    V4(P,C,T,Q)

V2
V1
V1
V4
V2
V4
teaches
reg
course
V2
V1
V1
V4
V2
V4
11
The Bucket Algorithm Example
  • Unfolding of rewriting 3
  • q3(S,C,P) - r(S,C,Q), t(P,C,Q), r(S,C,Q),
    t(P,C,Q), r(S,C,Q),
  • c(C,T), t(P,C,Q), Q Aut97
  • The green subgoals can cover the black ones under
    the mapping S?S, S?S, P?P, P?P, Q?Q
  • Minimized rewriting 3
  • q3m(S,C,P) - V2(S,P,C,Q), V4(P,C,T,Q)
  • Verify that there are only two rewritings that
    are not covered by others
  • Maximally Contained Rewriting
  • q q1m ? q3m

12
The Bucket Algorithm Example 2
  • Query
  • q(X) - cites(X,Y), cites(Y,X), sameTopic(X,Y)
  • Views
  • V4(A) - cites(A,B), cites(B,A)
  • V5(C,D) - sameTopic(C,D)
  • V6(F,H) - cites(F,G), cites(G,H), sameTopic(F,G)
  • Note Should we list V4(X) twice in the buckets?

Buckets
cites
cites
sameTopic
V4
V4
V5
V6
V6
V6
13
The Bucket Algorithm Example 2
  • Consider all combos check for containment of
    the unfolded rewriting in Q
  • V4(X) cannot be combined with anything (why?)
  • Try q1(X) - V4(X), V4(X), V5(X,Y)
  • Try q2(X) - V4(X), V6(X,Y), V5(X,Y)
  • Does any of these work?
  • When can we discard a view from consideration?

14
The Bucket Algorithm Example 2
  • Here is a successful rewriting
  • q3(X) - V6(X,Y), V6(X,Y), V6(X,Y)
  • By itself is not contained in Q
  • But, with subgoal XY added, it is!
  • By minimizing the rewriting, we get
  • q3m(X,Y) - V6(X,X)

15
The Bucket Algorithm Example 2
  • Remarks
  • V4 didnt contribute to any rewrite, but the
    bucket algorithm doesnt recognize it ahead
  • Considerq2(X,Y) - cites(X,Y), cites(Y,X)
  • Then both cites predicates can be folded into V4
  • Not recognized by the bucket algorithm

16
The State of Affairs
  • Bucket algorithm
  • deals well with predicates, Cartesian product can
    be large (containment check required for every
    candidate rewriting)
  • Inverse rules
  • modular (extensible to binding patterns, FDs)
  • no treatment of predicates
  • resulting rewritings need significant further
    optimization
  • Neither scales up
  • The MINICON algorithm
  • change perspective look at query variables

17
References
  • Querying Heterogeneous Information Sources Using
    Source Descriptors
  • By Alon Y. Levy, Anand Rajaraman and Joann J.
    Ordille
  • VLDB, 1996
  • Laks VS Lakshmanan
  • Lecture Slides
  • Alon Halevy
  • Answering Queries Using Views A Survey
  • VLDB Journal, 2000
  • http//citeseer.ist.psu.edu/halevy00answering.html
Write a Comment
User Comments (0)
About PowerShow.com