Title: CS240A: Databases and Knowledge Bases Recursion and Stratification
1CS240A Databases and Knowledge BasesRecursion
and Stratification
Notes From Chapter 8 of Advanced Database
Systems by Zaniolo, Ceri, Faloutsos, Snodgrass,
Subrahmanian and Zicari. Morgan Kaufmann, 1997
- Carlo Zaniolo
- Department of Computer Science
- University of California, Los Angeles
- December, 2001.
2Transitive Closure Queries
- Transitive closure of the graph arc(X, Y)
- path(X,Y) arc(X,Y).
- path(X,Z) arc(X,Y), path(Y, Z).
- Transitive Closure of the graph arc(X, Y)
- path(X,Y) arc(X,Y).
- path(X,Z) path(X,Y), arc(Y, Z).
- Transitive Closure of the graph arc(X, Y)
- path(X,Y) arc(X,Y).
- path(X,Z) path(X,Y), path(Y, Z).
-
3Relational Tables for a BoM application
4Subparts
- All subparts a transitive-closure query
- all_subparts(Part, Sub)
- assembly(Part, Sub, _).
- all_subparts(Part, Sub2)
- all_subparts(Part, Sub1),
- assembly(Sub1, Sub2, _ ).
- For each part, basic or otherwise, find its basic
subparts. A basic part is a subpart of itself - basic_subparts(BasicP, BasicP)
- part_cost(BasicP,_ , _, _).
- basic_subparts(Prt, BasicP) assembly(Prt,
SubP, _), - basic_subparts(SubP, BasicP).
5Negation and Recursion
- For each basic part find the least time needed
for delivery - fastest(Part, Time) part_cost(Part,
Sup1,Cost,Time), Øfaster(Part, Time). - faster(Part, Time) part_cost(Part,
Sup2, Cost, Time), part_cost(Part,Sup
1,Cost,Time1), Time1ltTime. -
6Negation and Rercusion (cont.)
- Times required for basic subparts of the given
assembly - timeForbasic(AssPart, BasicSub, Time)
basic_subparts(AssPart,BasicSub),
fastest(BasicSub, Time). - The maximum time required for basic subparts of
the given assembly - howsoon(AssPart, Time)
timeForbasic(AssPart, _, Time),
Ølarger(AssPart, Time). - larger(Part, Time) timeForbasic(Part, _ ,
Time), timeForbasic(Part, _ , Time1), Time1 gt
Time. - Note to compute howsoon you must first compute
larger completely.
7Predicate Dependency Graph
- The Predicate Dependency Graph for a program P is
a graph having as nodes the names of the
predicates in P. The graph contains an arc a b
if there exists a rule with goal name a and
head-name b. If the goal is negated then the arc
is marked as a negative arc. - The nodes and arcs of the strong components of
pdg(P), respectively, identify the recursive
predicates and recursive rules of P. - A program is said to be stratifiable when none of
its negative arcs belongs to a strong component.o
- Programs which are stratifiable, have a clear
meaning non-stratifiable programs are often
ill-defined from a semantic viewpoint.
8PCG for howsoon
howsoon
?
larger
timeForbasic
fastest
?
basic_subpart
faster
assembly
part_cost
9Stratification
- By sorting on pdg(P), the nodes of P can
partitioned into a finite set of n strata 1, ...
, n, such that, for each rule r Î P, the
predicate-name, of the head of r belongs to a
stratum that - is ³ to each stratum containing some positive
goal, and also - is strictly gt than each stratum containing some
negated goal. - A stratification of a program will be called
strict every stratum either contains a single
predicate or a set of predicates that are
mutually recursive.
10One-at-the-Time Computations needed for
aggregates
- Set aggregates, such as count or sum, in SQL,
require that the element of a set be visited
one-at-the-time. (These aggregates also require
arithmetic predicates, that we will consider
later.) - Counting the elements in a set modulo an integer
does not require arithmetic, but still requires
the elements of the set be visited
one-at-the-time. - The parity query how many tuples in the base
relation br(X)an even number of an odd number?
11The parity query how many tuples in the base
relation br
- between(X, Z) br(X), br(Y), br(Z), XltY, Y
ltZ. - next(X, Y) br(X), br(Y), X lt Y
Øbetween(X,Y). - next(nil, X) br(X), Øsmaller(X).
- smaller(X) br(X), br(Y), Y lt X.
- even(nil). even(Y)
odd(X), next(X, Y). - odd(Y) even(X), next(X, Y).
- br_is_even even(X), Ønext(X,Y).
- next sorts the elements of br into an ascending
chain, where the first link of the chain connects
the distinguished node nil to the least element
in br (third rule in the example). This works
only if br is totally ordered.
12Expressive Power
- Data Complexity query languages are viewed as
mappings from the DB to the answer. The big O is
evaluated in terms of the size of the database,
which is always finite. - DB-PTIME Polynomial Data Complexity w.r.t. DB
size - 1. Use Turing machines as the general model of
computation and encode the database as a tape of
length n - 2. Then any computable function on the database
can be encoded as a Turing machine - 3. some of these machines halt (complete their
computation), in O(n) steps, others in an an
exponential number of steps, others never
terminate. - 4. The set machines that halt in a number of
steps which is polynomial in n defines the class
of DB-PTIME functions. - Number of tuples, data-items, bytes what DB size
are we talking about?
13Polynomial Data Complexity
- Are relational algebra expressions evaluable in
DB-PTIME? - Yes, and actually we use indices and query
optimizers to keep exponents and coefficient
small. - But these languages cannot express DB-PTIME. For
instance they cannot express transitive closures,
or aggregates (thus the most frequently used
aggregates were added to SQL in ad hoc fashion).
14The Expressive Power Hierarchy
- Safe, stratified Datalog programs
- Can still be computed in polynomial time
- expresses every DB-PTIME query thus
- They DB-PTIME complete.
- But this is only true if we assume that there
exists a total order in the databases - Desiderata Order-independence property of
queries (genericity) I.e., queries insensitive
to constant renaming. - To express all DB-PTIME queries under genericity,
a non-deterministic construct such as choice is
needed (subject covered in ADS book, but not in
CS240A) - DB-PTIME completeness is well below Turing
completenessfor that you need and infinite
universe.
15Functors and Complex Terms
- Flat parts, their number, shape and weight,
following the schema part( Part, Shape ,
Weight) - part(202, circle(11), actualkg(0.034)).
- part(121, rectangle(10, 20), unitkg(2.1)).
- part_weight(No, Kilos )
- part(No, , actualkg(Kilos)).
- part_weight(No, Kilos )
- part(No, Shape, unitkg(K)),
- area(Shape, Area), Kilos K Area.
- area(circle(Dmtr), A) A Dmtr Dmtr 3.14/4.
- area(rectangle(Base, Height), A) A
BaseHeight. - The complex terms circle(11), actualkg(34),
rectangle(10, 20), and unitkg(2.1) are in logical
parlance called functions (A functor followed by
a list of arguments in parentheses).
16Functors (cont.)
- In actual applications, these complex terms do
not represent functions to be evaluated they are
instead used as variable length sub-records. - Thus, circle(11) and rectangle(10, 20),
respectively, denote that the shape of our first
part is a circle with diameter 20 cm, while the
shape of the second part is a rectangle with base
10 cm and height 20 cm. Any number of
sub-arguments is allowed in such complex terms,
recursively. - Objects of arbitrary complexity, including solid
objects, can be nested and represented in this
fashion. Functors are then used as case
discriminants.
17Lists
- is the empty list.
- Head Tail represents a non-empty list.
- Example mary, mike, seattle
- Is a shorthand for mary,mike, seattle,
- A list-based representation for suppliers of
top_tubepart_sup_list(top_tube,cinelli,columbus
,mavic). - Lists are only syntactic sugaring for a
particular function symbol.
18Normalizing a nested relation into a flat
relation
- flatten(P, S, L)
- part_sup_list(P, S L). flatten(P, S, L)
- flatten(P, _, S L).
- ps(Part, Sup) flatten(Part, Sup, _).
- This program applied to the previous fact yields.
- ps(top_tube, cinelli) ps(top_tube, columbus)
ps(top_tube, mavic)
19How to Reconstruct the Nested Relation
- between(P, X, Z) ps(P, X), ps(P, Y),
ps(P, Z), X lt Y, Y lt Z. - smaller(P, X) ps(P, X), ps(P, Y), Y lt X.
- nested(P, X) ps(P, X), Øsmaller(P, X).
- nested(P, YXW) nested(P, XW), ps(P,
Y), X lt Y, Øbetween(P, X,Y). - ps_nested(P, W) nested(P, W),
Ønested(P, XW).
20Conclusion
- Recursion and stratified negation assure
DB-PTIME completeness for Datalog - Practical systems such as LDL also support
function symbols, arithmetic expressions, and
many other constructs.