Views as Incomplete Databases Certain - PowerPoint PPT Presentation

About This Presentation

Title:

Views as Incomplete Databases Certain

Description:

V may also be mixed: some views are sound, others are complete ... Since we have only the views, this is the set of possible databases. ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 20

Provided by: off9

Category:

more less

Transcript and Presenter's Notes

Title: Views as Incomplete Databases Certain

1
Views as Incomplete Databases Certain
Possible Answers

Views an incomplete representation
Certain and possible answers
Complexity results for certain answers

2
Views an incomplete representation

Given a view def V, view extension I
Sound V I is contained in V(D)
Complete V I contains V(D)
Precise V I V(D)
V may also be mixed some views are sound, others
are complete
In general, more than one db D may exist s.t.

Example teams in World Cup Soccer Tournament
Global scheme Team(country, group) (gr
assignment for 1st round)
Source1 S-C(C) the countries that participate
Source2 S-Q(C) -- countries that participated
in qualifying games
Source3 S-T(C) teams whose games will be on
T.V
For all three, the logical mapping is
v(X) - Team(X, Y)

4
Given V (including a specification in s/c/p) and
I poss(V,I) D D is a db for which I is a
possible view Since we have only the views, this
is the set of possible databases. For sound views
an infinite set For complete views contains
the empty db For precise views may be empty
-- inconsistent views Example v1(X, Y)
- R(X, Y, Z), v1(a, b), (b, c) v2(X,Z)
- R(X, Y, Z), v2(a, d), (c, e) The above
changes when the global db is known to satisfy
constraints (e.g. keys)
5
Certain and possible answers

Now, assume also a query Q
cert(Q, V, I) seems easier to compute, always
finite
poss(Q, V, I) may be infinite
and where do we obtain
values not in I?
A possible approach a finite representation of a
possibly infinite family of partially unknown
databases

6
We concentrate on certain answers -- an absolute
notion of answering queries using views Cert(Q,
V, I) depends on soundness/completeness of views
Example global p(x, y) v1(x) - p(x,
y), v2(y)- p(x, y) I v1(a), v2(b)
Q q(x, y) - p(x, y) Sound views cert(Q, V,
I) is empty Precise views cert(Q, V, I) is
(a, b)
7
An issue in query processing For same example,
let Q s(x) - p(x, y) To allow relational
algebra manipulation of certain answers, we need
more than a simple relational representation! We
need algorithms for performing operations on
representations of partially unknown dbs (not in
this course)
8

From now sound views, certain answers
Was investigated for
views defined in L1, query defined in L2,
where
L1, L2 in CQ, CQ!, NR-Datalog, Datalog, FO
Results include
Complexity lower bounds
Algorithms upper bounds

9
Complexity results for certain answers

Thm for V in L1 , Q in L2, the following are
equivalent
(a) computing cert(Q, V, I)
(b) deciding containment is Q1 (in L1)
contained in Q2 (in L2)?
(a) is decidable iff (b) is
When decidable,
combined complexity of (a) query complexity
of (b)
data complexity of (a) lt query complexity of (b)
Data complexity function of db size
Query complexity function of query size
Combined both

10
Proof (sketch) ? given t, how hard to decide
if t is in cert(Q, V, I)? Let I vi(tij),
define Q by Q contains the rules that define
V, and one more large rule
(t follows from facts in
I) Claim Hence deciding if t in cert(Q, V, I)
is no harder than this containment (Note for L1
CQ, need to massage Q into CQ)
11

How hard to check containment of Q1 in Q2?
let p be a new predicate
Define V by rules of Q1, and v(c) - q1(X),
p(X) ,
let I v(c)
Define Q by rules of Q2 , and q(c) - q2(X),
p(X)
Then (c) is in cert(Q, V, I) iff Q1 is
contained in Q2

Consequences computing certain answers
(depends on L1, L2)
Is undecidable for Datalog, FO
decidable if one side lt datalog, other
side lt nr-datalog
For decidable cases, the above gives combined
complexity,
We are interested more in data complexity here
it is
Co-NP data complexity is bad impractical to
compute, no datalog plan!
We will not prove co-NP complexity results

same
13

Claim For Q in Datalog, V in CQ(!), let V be
the same view def, with inequalities omitted
Then cert(Q, V, I) cert(Q, V, I)
(Computing the certain answers from I using V w/o
the inequalities gives same results)
Proof
(b) If t is in cert(Q, V, I), then
for
each D in poss(V, I), t in Q(D)
If D also in poss(V, I) -- fine
If D not in poss(V, I), exists larger D in
poss(V, I) s.t. t is in Q(D)
Hence, t is in cert(Q, V, I)

Proof of last claim
some s in I, but s not in V(D), because of some
inequality
Since s is in V(D), inequality involves
attribute in view body
can add some tuples to D so obtain D1, s.t. s
is in V(D1)
adding for all such s gives D that contains
D, s.t. D is in poss(V, I)
If t in Q(D), since Q has no inequalities, t
also in Q(D)

For CQ views, Datalog queries,
Query plan datalog program P on V
exp(P) replace views by their definitions
(using fresh names for existential
variables)
P is maximally-contained in Q
exp(P)(D) is contained in Q(D)
exp(P)(D) is contained in ep(P)(D) for all other
plans P
Such a plan is best among all plans
(This is a language-dependent notion given a
more expressive language, P may not be best any
more)
But, if a plan delivers cert(Q, V, I) it is
absolutely best

Thm For CQ sound views, Datalog queries,
the inverse rules algorithm computes cert(Q, V,
I)
(Thus, for this case, a Datalog query plan can
give the absolute best possible answer)
Corollary If P is max-cont(Q) then, for all
view instances, I P(I) cert(Q,
V, I)
we proceed to prove the theorem

Def A tableau is a collection of atoms, with
constants and variables
A tableau T represents a db D
there is a valuation from T
into D
Rep(T) D for some h, D contains H(T)

Claim For a Datalog query Q, tableau T
cert(Q, rep(T)) the tuples w/o variables
in Q(T)
Proof
Can consider only D in rep(T) s.t. D h(T)
every tuple in Q(D) but not in Q(D) where D
is larger than h(T) is not in cert(Q, rep(T))
(b) For such D, h(Q(T)) Q(D)
? a ground tuple in Q(T) is in cert(Q, rep(T))
(c) For a non-ground t tuple in Q(T), can find
D1, D2 in rep(T) that give different values to
variables in t
? no instance of this tuple is in cert(Q, rep(T))

The inverse rules of V create from a view I a
database with elements that are skolem functions.
Consider each skolem term to be a distinct
variable
This is a tableau T(V, I)
Claim T(V, I) represents poss(V, I)
Proof easy
Corollary is cert(Q,
V, I)
This is precisely what the inverse rule algorithm
produces
For each I, the inverse rules produce T(V, I),
then apply Q
end of
story
Next one more (last) algorithm, for CQ queries
and views, that is fastest so far