Title: Views as Incomplete Databases Certain
1 Views as Incomplete Databases Certain
Possible Answers
- Views an incomplete representation
- Certain and possible answers
- Complexity results for certain answers
2Views an incomplete representation
- Given a view def V, view extension I
- Sound V I is contained in V(D)
- Complete V I contains V(D)
- Precise V I V(D)
- V may also be mixed some views are sound, others
are complete - In general, more than one db D may exist s.t.
3- Example teams in World Cup Soccer Tournament
- Global scheme Team(country, group) (gr
assignment for 1st round) - Source1 S-C(C) the countries that participate
- Source2 S-Q(C) -- countries that participated
in qualifying games - Source3 S-T(C) teams whose games will be on
T.V - For all three, the logical mapping is
- v(X) - Team(X, Y)
-
4Given V (including a specification in s/c/p) and
I poss(V,I) D D is a db for which I is a
possible view Since we have only the views, this
is the set of possible databases. For sound views
an infinite set For complete views contains
the empty db For precise views may be empty
-- inconsistent views Example v1(X, Y)
- R(X, Y, Z), v1(a, b), (b, c) v2(X,Z)
- R(X, Y, Z), v2(a, d), (c, e) The above
changes when the global db is known to satisfy
constraints (e.g. keys)
5Certain and possible answers
- Now, assume also a query Q
- cert(Q, V, I) seems easier to compute, always
finite - poss(Q, V, I) may be infinite
- and where do we obtain
values not in I? - A possible approach a finite representation of a
possibly infinite family of partially unknown
databases
6We concentrate on certain answers -- an absolute
notion of answering queries using views Cert(Q,
V, I) depends on soundness/completeness of views
Example global p(x, y) v1(x) - p(x,
y), v2(y)- p(x, y) I v1(a), v2(b)
Q q(x, y) - p(x, y) Sound views cert(Q, V,
I) is empty Precise views cert(Q, V, I) is
(a, b)
7An issue in query processing For same example,
let Q s(x) - p(x, y) To allow relational
algebra manipulation of certain answers, we need
more than a simple relational representation! We
need algorithms for performing operations on
representations of partially unknown dbs (not in
this course)
8- From now sound views, certain answers
- Was investigated for
- views defined in L1, query defined in L2,
where - L1, L2 in CQ, CQ!, NR-Datalog, Datalog, FO
- Results include
- Complexity lower bounds
- Algorithms upper bounds
9Complexity results for certain answers
- Thm for V in L1 , Q in L2, the following are
equivalent - (a) computing cert(Q, V, I)
- (b) deciding containment is Q1 (in L1)
contained in Q2 (in L2)? - (a) is decidable iff (b) is
- When decidable,
- combined complexity of (a) query complexity
of (b) - data complexity of (a) lt query complexity of (b)
- Data complexity function of db size
- Query complexity function of query size
- Combined both
10Proof (sketch) ? given t, how hard to decide
if t is in cert(Q, V, I)? Let I vi(tij),
define Q by Q contains the rules that define
V, and one more large rule
(t follows from facts in
I) Claim Hence deciding if t in cert(Q, V, I)
is no harder than this containment (Note for L1
CQ, need to massage Q into CQ)
11- How hard to check containment of Q1 in Q2?
- let p be a new predicate
- Define V by rules of Q1, and v(c) - q1(X),
p(X) , - let I v(c)
- Define Q by rules of Q2 , and q(c) - q2(X),
p(X) - Then (c) is in cert(Q, V, I) iff Q1 is
contained in Q2
12- Consequences computing certain answers
(depends on L1, L2) - Is undecidable for Datalog, FO
- decidable if one side lt datalog, other
side lt nr-datalog - For decidable cases, the above gives combined
complexity, - We are interested more in data complexity here
it is - Co-NP data complexity is bad impractical to
compute, no datalog plan! - We will not prove co-NP complexity results
same
13- Claim For Q in Datalog, V in CQ(!), let V be
the same view def, with inequalities omitted - Then cert(Q, V, I) cert(Q, V, I)
- (Computing the certain answers from I using V w/o
the inequalities gives same results) - Proof
- (b) If t is in cert(Q, V, I), then
for
each D in poss(V, I), t in Q(D) - If D also in poss(V, I) -- fine
- If D not in poss(V, I), exists larger D in
poss(V, I) s.t. t is in Q(D) - Hence, t is in cert(Q, V, I)
14- Proof of last claim
- some s in I, but s not in V(D), because of some
inequality - Since s is in V(D), inequality involves
attribute in view body - can add some tuples to D so obtain D1, s.t. s
is in V(D1) - adding for all such s gives D that contains
D, s.t. D is in poss(V, I) - If t in Q(D), since Q has no inequalities, t
also in Q(D)
15- For CQ views, Datalog queries,
- Query plan datalog program P on V
- exp(P) replace views by their definitions
- (using fresh names for existential
variables) - P is maximally-contained in Q
- exp(P)(D) is contained in Q(D)
- exp(P)(D) is contained in ep(P)(D) for all other
plans P - Such a plan is best among all plans
- (This is a language-dependent notion given a
more expressive language, P may not be best any
more) - But, if a plan delivers cert(Q, V, I) it is
absolutely best
16- Thm For CQ sound views, Datalog queries,
- the inverse rules algorithm computes cert(Q, V,
I) - (Thus, for this case, a Datalog query plan can
give the absolute best possible answer) - Corollary If P is max-cont(Q) then, for all
view instances, I P(I) cert(Q,
V, I) - we proceed to prove the theorem
-
17- Def A tableau is a collection of atoms, with
constants and variables - A tableau T represents a db D
- there is a valuation from T
into D - Rep(T) D for some h, D contains H(T)
18- Claim For a Datalog query Q, tableau T
- cert(Q, rep(T)) the tuples w/o variables
in Q(T) - Proof
- Can consider only D in rep(T) s.t. D h(T)
- every tuple in Q(D) but not in Q(D) where D
is larger than h(T) is not in cert(Q, rep(T)) - (b) For such D, h(Q(T)) Q(D)
- ? a ground tuple in Q(T) is in cert(Q, rep(T))
- (c) For a non-ground t tuple in Q(T), can find
D1, D2 in rep(T) that give different values to
variables in t - ? no instance of this tuple is in cert(Q, rep(T))
19- The inverse rules of V create from a view I a
database with elements that are skolem functions. - Consider each skolem term to be a distinct
variable - This is a tableau T(V, I)
- Claim T(V, I) represents poss(V, I)
- Proof easy
- Corollary is cert(Q,
V, I) - This is precisely what the inverse rule algorithm
produces - For each I, the inverse rules produce T(V, I),
then apply Q - end of
story - Next one more (last) algorithm, for CQ queries
and views, that is fastest so far