Title: Local as View: Some refinements
1 Local as View Some refinements
- IM Filtering irrelevant sources
- Views with restricted access patterns
- A summary of IM
2IM Filtering irrelevant sources
- When there are many sources, it is important to
weed out those that are irrelevant to a query - Comparison constraints can help (e.g., qu gt w98)
- What more can be done?
- The IM system suggests to introduce
- classes with a class hierarchy
- into source descriptions
3- Example
- -- disjoint classes
- Additionally, the global schema contains a
relation - details(car, year, mileage, price, sellerContact)
- c, y, mi, p,
s - (we will also abbreviate class names)
car
usedCar
AmericanCar
EurpoeanCar
JapaneseCar
carForSale
newCar
GermanCar
ItalianCar
FrenchCar
4- The views
- v1(c, y, mi, p, s) - details(c,y,mi,p,s),
cFSale(c), uCar(c), ,y gt 1990 - v2(c, y, p, s) - details(c,y,mi,p,s) ,
cFSale(c), EurCar(c) - v3(c, y, p, s) - details(c,y,mi,p,s),
cFSale(c), uCar(c), pgt 25000 // luxury cars - v4(c, y, p)- details(c,y,mi,p,s), cFSale(c),
uCar(c), ylt 1980 //vintage cars - v5(c, y, p, s) - details(c, mc, y, p, s),
cFSale(c), nCar(c), cToyota - Assume a query
- Q q(c, mc, y, p, s) - details(c, y, mi, p, s)
, cFSale(c), Jcar(c),
ygt 1992 , plt 12000 - Some candidate rewritings will be rejected, since
they are inconsistent with Q
5- When a view is considered for consistency with
Q, - v4 will be discarded ylt1980, ygt1992 is
inconsistent - v3 will be discarded pgt25000, plt12000 is
inconsistent - v2 will be discarded EurCar(c), JCar(c) is
inconsistent - v5 depends on what is known about the
relationship between Toyota and the various car
classes - Reasoning about disjoint-ness of classes (given a
hierarchy as above) is easy and efficient
6The true story (a side trip) IM uses a (PTIME)
Description Logic for source description A DL is
a formalism that describes
classes binary relationships
intentionally. For example, a
class can be given by a name (e.g. JCar) or by an
expression that describes its properties
cheapJCar - uCar and JCar and price lt 9000 A DL
also contains containment and disjoint-ness
axioms for class expressions (containment is
called subsumption in DL jargon) To be useful, a
DL needs to support containment and disjoint-ness
queries on classes and membership queries on
individuals this is an inference problem
7 Many DLs are known Complexity (for subsumption)
ranges from polynomial (rare), to NP-complete,
to exptime-complete, to undecidable Recent
interest focuses on using DLs for the Semantic
Web The W3C OWL standard is essentially a
DL (this use is essentially the same as in IM)
That is it on DLs
8Views with restricted access patterns
- Many sources do not support full SQL
- They are legacy systems, e.g.
- finger on UNIX accepts email, returns other
attributes - A bibliography source requires author, or title,
or but does not accept a year as input - They do not want to disclose all their data,
e.g., - a carSale source will not present all the cars it
has for sale - An airline requires from and destination as input
for flight info - The questions
- How do we describe such sources?
- What are good rewritings and do we find them?
9- Restricted sources can be described by binding
patterns - Two equivalent styles (there are more
sophisticated schemes) - Example assume global relations
- email(F, L, E), office(F, L, O), phone(O, P)
- (F-first, L-last, E-email,
O-office, P-phone) - The views are finger, userId, described as
follows - Adding to attributes that can be given as input
- finger(F, L, E, O, P) - email(F, L, E),
office(F, L, O), phone(O, P) - userId(O, E) - office(F, L, O),
email(F, L, E) - Using b, f strings on predicates, where b means
bound (i.e., in) - fingerffbff(F, L, E, O, P) - email(F, L,
E), office(F, L, O), phone(O, P) - userIdbf(O, E) - office(F, L, O),
email(F, L, E)
10- Example, contd
- Q qbf(O, F) - office(F, L, O) (or q(O, F)
- office(F, L, O) ) - Cannot be answered by using finger it requires
E as input - Cannot be answered by using userId it does not
return F - The following is a good rewriting
- q(O, F)- userId(O, E), finger(F, L, E, O,
P) jump - For two reasons
- It is executable with respect to the sources
executing the body left-to-right respects the
access restrictions - O for userId from the query, E for finger
from userId - Its expansion is contained in the query (check!)
11- These two reasons are a characterization of a
good rewriting -
- It is executable with respect to the sources
executing the body left-to-right respects the
access restrictions - Its expansion is contained in the query (check!)
- Indeed
- If it is not a contained rewriting, then being
executable is no good - Being contained but not executable is also no
good
12The IM approach After a rewriting is found to be
consistent and contained, it is checked for being
executable can the sub-goals in the body be
ordered so that the input required for each is
supplied from the query or the sub-goals to its
left
13A summary of IM
- Introduced (with other concurrent systems) the
notion of LAV and query rewriting using views - Also, detailed source descriptions using DLs
- An efficient algorithm for finding contained and
executable rewritings - Worked well, for about 100 sources
14Here is a graph from the paper
15- But
- The fact that a contained rewriting needs a
number of views at most the number of atoms in
the query has been proved only for CQs , without
- comparisons,
- access restrictions
- constraints on the global db
- Does it hold for these cases? (see example in p.
10) - For access restricted sources, it has been proved
that for equivalent rewritings one needs at most
nm views, where n is the number of atoms in the
query, m is the number of different variables in
it - The proof does not hold for contained rewritings
16- Even for pure CQs, is the bucket algorithm
guaranteed to find all rewritings? - The answers to all these questions are negative!
- The bucket algorithm does not find all rewritings
- For the more general cases, longer rewritings are
needed actually, there may be an infinite number
of them, with no bound on length - There is a need for another approach
-