Local as View: Some refinements

1 / 16

About This Presentation

Title:

Local as View: Some refinements

Description:

v3(c, y, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c), p = $25000 // luxury cars ... a carSale source will not present all the cars it has for sale ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 17

Provided by: off9

more less

Transcript and Presenter's Notes

Title: Local as View: Some refinements

1
Local as View Some refinements

IM Filtering irrelevant sources
Views with restricted access patterns
A summary of IM

2
IM Filtering irrelevant sources

When there are many sources, it is important to
weed out those that are irrelevant to a query
Comparison constraints can help (e.g., qu gt w98)
What more can be done?
The IM system suggests to introduce
classes with a class hierarchy
into source descriptions

Example
-- disjoint classes
Additionally, the global schema contains a
relation
details(car, year, mileage, price, sellerContact)
c, y, mi, p,
s
(we will also abbreviate class names)

car
usedCar
AmericanCar
EurpoeanCar
JapaneseCar
carForSale
newCar
GermanCar
ItalianCar
FrenchCar
4

The views
v1(c, y, mi, p, s) - details(c,y,mi,p,s),
cFSale(c), uCar(c), ,y gt 1990
v2(c, y, p, s) - details(c,y,mi,p,s) ,
cFSale(c), EurCar(c)
v3(c, y, p, s) - details(c,y,mi,p,s),
cFSale(c), uCar(c), pgt 25000 // luxury cars
v4(c, y, p)- details(c,y,mi,p,s), cFSale(c),
uCar(c), ylt 1980 //vintage cars
v5(c, y, p, s) - details(c, mc, y, p, s),
cFSale(c), nCar(c), cToyota
Assume a query
Q q(c, mc, y, p, s) - details(c, y, mi, p, s)
, cFSale(c), Jcar(c),
ygt 1992 , plt 12000
Some candidate rewritings will be rejected, since
they are inconsistent with Q

When a view is considered for consistency with
Q,
v4 will be discarded ylt1980, ygt1992 is
inconsistent
v3 will be discarded pgt25000, plt12000 is
inconsistent
v2 will be discarded EurCar(c), JCar(c) is
inconsistent
v5 depends on what is known about the
relationship between Toyota and the various car
classes
Reasoning about disjoint-ness of classes (given a
hierarchy as above) is easy and efficient

6
The true story (a side trip) IM uses a (PTIME)
Description Logic for source description A DL is
a formalism that describes
classes binary relationships
intentionally. For example, a
class can be given by a name (e.g. JCar) or by an
expression that describes its properties
cheapJCar - uCar and JCar and price lt 9000 A DL
also contains containment and disjoint-ness
axioms for class expressions (containment is
called subsumption in DL jargon) To be useful, a
DL needs to support containment and disjoint-ness
queries on classes and membership queries on
individuals this is an inference problem
7
Many DLs are known Complexity (for subsumption)
ranges from polynomial (rare), to NP-complete,
to exptime-complete, to undecidable Recent
interest focuses on using DLs for the Semantic
Web The W3C OWL standard is essentially a
DL (this use is essentially the same as in IM)
That is it on DLs
8
Views with restricted access patterns

Many sources do not support full SQL
They are legacy systems, e.g.
finger on UNIX accepts email, returns other
attributes
A bibliography source requires author, or title,
or but does not accept a year as input
They do not want to disclose all their data,
e.g.,
a carSale source will not present all the cars it
has for sale
An airline requires from and destination as input
for flight info
The questions
How do we describe such sources?
What are good rewritings and do we find them?

Restricted sources can be described by binding
patterns
Two equivalent styles (there are more
sophisticated schemes)
Example assume global relations
email(F, L, E), office(F, L, O), phone(O, P)
(F-first, L-last, E-email,
O-office, P-phone)
The views are finger, userId, described as
follows
Adding to attributes that can be given as input
finger(F, L, E, O, P) - email(F, L, E),
office(F, L, O), phone(O, P)
userId(O, E) - office(F, L, O),
email(F, L, E)
Using b, f strings on predicates, where b means
bound (i.e., in)
fingerffbff(F, L, E, O, P) - email(F, L,
E), office(F, L, O), phone(O, P)
userIdbf(O, E) - office(F, L, O),
email(F, L, E)

Example, contd
Q qbf(O, F) - office(F, L, O) (or q(O, F)
- office(F, L, O) )
Cannot be answered by using finger it requires
E as input
Cannot be answered by using userId it does not
return F
The following is a good rewriting
q(O, F)- userId(O, E), finger(F, L, E, O,
P) jump
For two reasons
It is executable with respect to the sources
executing the body left-to-right respects the
access restrictions
O for userId from the query, E for finger
from userId
Its expansion is contained in the query (check!)

These two reasons are a characterization of a
good rewriting
It is executable with respect to the sources
executing the body left-to-right respects the
access restrictions
Its expansion is contained in the query (check!)
Indeed
If it is not a contained rewriting, then being
executable is no good
Being contained but not executable is also no
good

12
The IM approach After a rewriting is found to be
consistent and contained, it is checked for being
executable can the sub-goals in the body be
ordered so that the input required for each is
supplied from the query or the sub-goals to its
left
13
A summary of IM

Introduced (with other concurrent systems) the
notion of LAV and query rewriting using views
Also, detailed source descriptions using DLs
An efficient algorithm for finding contained and
executable rewritings
Worked well, for about 100 sources

14
Here is a graph from the paper
15

But
The fact that a contained rewriting needs a
number of views at most the number of atoms in
the query has been proved only for CQs , without
comparisons,
access restrictions
constraints on the global db
Does it hold for these cases? (see example in p.
10)
For access restricted sources, it has been proved
that for equivalent rewritings one needs at most
nm views, where n is the number of atoms in the
query, m is the number of different variables in
it
The proof does not hold for contained rewritings

Even for pure CQs, is the bucket algorithm
guaranteed to find all rewritings?
The answers to all these questions are negative!
The bucket algorithm does not find all rewritings
For the more general cases, longer rewritings are
needed actually, there may be an infinite number
of them, with no bound on length
There is a need for another approach

Write a Comment

User Comments (0)