Local as View: Some refinements

1 / 16
About This Presentation
Title:

Local as View: Some refinements

Description:

v3(c, y, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c), p = $25000 // luxury cars ... a carSale source will not present all the cars it has for sale ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Local as View: Some refinements


1
Local as View Some refinements
  • IM Filtering irrelevant sources
  • Views with restricted access patterns
  • A summary of IM

2
IM Filtering irrelevant sources
  • When there are many sources, it is important to
    weed out those that are irrelevant to a query
  • Comparison constraints can help (e.g., qu gt w98)
  • What more can be done?
  • The IM system suggests to introduce
  • classes with a class hierarchy
  • into source descriptions

3
  • Example
  • -- disjoint classes
  • Additionally, the global schema contains a
    relation
  • details(car, year, mileage, price, sellerContact)
  • c, y, mi, p,
    s
  • (we will also abbreviate class names)

car
usedCar
AmericanCar
EurpoeanCar
JapaneseCar
carForSale
newCar
GermanCar
ItalianCar
FrenchCar
4
  • The views
  • v1(c, y, mi, p, s) - details(c,y,mi,p,s),
    cFSale(c), uCar(c), ,y gt 1990
  • v2(c, y, p, s) - details(c,y,mi,p,s) ,
    cFSale(c), EurCar(c)
  • v3(c, y, p, s) - details(c,y,mi,p,s),
    cFSale(c), uCar(c), pgt 25000 // luxury cars
  • v4(c, y, p)- details(c,y,mi,p,s), cFSale(c),
    uCar(c), ylt 1980 //vintage cars
  • v5(c, y, p, s) - details(c, mc, y, p, s),
    cFSale(c), nCar(c), cToyota
  • Assume a query
  • Q q(c, mc, y, p, s) - details(c, y, mi, p, s)
    , cFSale(c), Jcar(c),
    ygt 1992 , plt 12000
  • Some candidate rewritings will be rejected, since
    they are inconsistent with Q

5
  • When a view is considered for consistency with
    Q,
  • v4 will be discarded ylt1980, ygt1992 is
    inconsistent
  • v3 will be discarded pgt25000, plt12000 is
    inconsistent
  • v2 will be discarded EurCar(c), JCar(c) is
    inconsistent
  • v5 depends on what is known about the
    relationship between Toyota and the various car
    classes
  • Reasoning about disjoint-ness of classes (given a
    hierarchy as above) is easy and efficient

6
The true story (a side trip) IM uses a (PTIME)
Description Logic for source description A DL is
a formalism that describes
classes binary relationships
intentionally. For example, a
class can be given by a name (e.g. JCar) or by an
expression that describes its properties
cheapJCar - uCar and JCar and price lt 9000 A DL
also contains containment and disjoint-ness
axioms for class expressions (containment is
called subsumption in DL jargon) To be useful, a
DL needs to support containment and disjoint-ness
queries on classes and membership queries on
individuals this is an inference problem
7
Many DLs are known Complexity (for subsumption)
ranges from polynomial (rare), to NP-complete,
to exptime-complete, to undecidable Recent
interest focuses on using DLs for the Semantic
Web The W3C OWL standard is essentially a
DL (this use is essentially the same as in IM)
That is it on DLs
8
Views with restricted access patterns
  • Many sources do not support full SQL
  • They are legacy systems, e.g.
  • finger on UNIX accepts email, returns other
    attributes
  • A bibliography source requires author, or title,
    or but does not accept a year as input
  • They do not want to disclose all their data,
    e.g.,
  • a carSale source will not present all the cars it
    has for sale
  • An airline requires from and destination as input
    for flight info
  • The questions
  • How do we describe such sources?
  • What are good rewritings and do we find them?

9
  • Restricted sources can be described by binding
    patterns
  • Two equivalent styles (there are more
    sophisticated schemes)
  • Example assume global relations
  • email(F, L, E), office(F, L, O), phone(O, P)
  • (F-first, L-last, E-email,
    O-office, P-phone)
  • The views are finger, userId, described as
    follows
  • Adding to attributes that can be given as input
  • finger(F, L, E, O, P) - email(F, L, E),
    office(F, L, O), phone(O, P)
  • userId(O, E) - office(F, L, O),
    email(F, L, E)
  • Using b, f strings on predicates, where b means
    bound (i.e., in)
  • fingerffbff(F, L, E, O, P) - email(F, L,
    E), office(F, L, O), phone(O, P)
  • userIdbf(O, E) - office(F, L, O),
    email(F, L, E)

10
  • Example, contd
  • Q qbf(O, F) - office(F, L, O) (or q(O, F)
    - office(F, L, O) )
  • Cannot be answered by using finger it requires
    E as input
  • Cannot be answered by using userId it does not
    return F
  • The following is a good rewriting
  • q(O, F)- userId(O, E), finger(F, L, E, O,
    P) jump
  • For two reasons
  • It is executable with respect to the sources
    executing the body left-to-right respects the
    access restrictions
  • O for userId from the query, E for finger
    from userId
  • Its expansion is contained in the query (check!)

11
  • These two reasons are a characterization of a
    good rewriting
  • It is executable with respect to the sources
    executing the body left-to-right respects the
    access restrictions
  • Its expansion is contained in the query (check!)
  • Indeed
  • If it is not a contained rewriting, then being
    executable is no good
  • Being contained but not executable is also no
    good

12
The IM approach After a rewriting is found to be
consistent and contained, it is checked for being
executable can the sub-goals in the body be
ordered so that the input required for each is
supplied from the query or the sub-goals to its
left
13
A summary of IM
  • Introduced (with other concurrent systems) the
    notion of LAV and query rewriting using views
  • Also, detailed source descriptions using DLs
  • An efficient algorithm for finding contained and
    executable rewritings
  • Worked well, for about 100 sources

14
Here is a graph from the paper
15
  • But
  • The fact that a contained rewriting needs a
    number of views at most the number of atoms in
    the query has been proved only for CQs , without
  • comparisons,
  • access restrictions
  • constraints on the global db
  • Does it hold for these cases? (see example in p.
    10)
  • For access restricted sources, it has been proved
    that for equivalent rewritings one needs at most
    nm views, where n is the number of atoms in the
    query, m is the number of different variables in
    it
  • The proof does not hold for contained rewritings

16
  • Even for pure CQs, is the bucket algorithm
    guaranteed to find all rewritings?
  • The answers to all these questions are negative!
  • The bucket algorithm does not find all rewritings
  • For the more general cases, longer rewritings are
    needed actually, there may be an infinite number
    of them, with no bound on length
  • There is a need for another approach
Write a Comment
User Comments (0)