CS240A: Databases and Knowledge Bases Recursion and Stratification - PowerPoint PPT Presentation

About This Presentation
Title:

CS240A: Databases and Knowledge Bases Recursion and Stratification

Description:

Recursion and Stratification Notes From Chapter 8 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari. – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 21
Provided by: Fushen6
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: CS240A: Databases and Knowledge Bases Recursion and Stratification


1
CS240A Databases and Knowledge BasesRecursion
and Stratification
Notes From Chapter 8 of Advanced Database
Systems by Zaniolo, Ceri, Faloutsos, Snodgrass,
Subrahmanian and Zicari. Morgan Kaufmann, 1997
  • Carlo Zaniolo
  • Department of Computer Science
  • University of California, Los Angeles
  • December, 2001.

2
Transitive Closure Queries
  • Transitive closure of the graph arc(X, Y)  
  • path(X,Y) arc(X,Y).
  • path(X,Z) arc(X,Y), path(Y, Z).
  • Transitive Closure of the graph arc(X, Y)  
  • path(X,Y) arc(X,Y).
  • path(X,Z) path(X,Y), arc(Y, Z).
  • Transitive Closure of the graph arc(X, Y)
  • path(X,Y) arc(X,Y).
  • path(X,Z) path(X,Y), path(Y, Z).
  •  

3
Relational Tables for a BoM application
4
Subparts
  • All subparts a transitive-closure query
  • all_subparts(Part, Sub)
  • assembly(Part, Sub, _).
  • all_subparts(Part, Sub2)
  • all_subparts(Part, Sub1),
  • assembly(Sub1, Sub2, _ ).
  • For each part, basic or otherwise, find its basic
    subparts. A basic part is a subpart of itself
  • basic_subparts(BasicP, BasicP)
  • part_cost(BasicP,_ , _, _).
  • basic_subparts(Prt, BasicP) assembly(Prt,
    SubP, _),
  • basic_subparts(SubP, BasicP).

5
Negation and Recursion
  • For each basic part find the least time needed
    for delivery  
  • fastest(Part, Time) part_cost(Part,
    Sup1,Cost,Time),   Øfaster(Part, Time).
  • faster(Part, Time) part_cost(Part,
    Sup2, Cost, Time), part_cost(Part,Sup
    1,Cost,Time1), Time1ltTime.  
  •  

6
Negation and Rercusion (cont.)
  • Times required for basic subparts of the given
    assembly
  • timeForbasic(AssPart, BasicSub, Time)
    basic_subparts(AssPart,BasicSub),
      fastest(BasicSub, Time).  
  • The maximum time required for basic subparts of
    the given assembly
  • howsoon(AssPart, Time)
    timeForbasic(AssPart, _, Time),
    Ølarger(AssPart, Time).
  • larger(Part, Time) timeForbasic(Part, _ ,
    Time), timeForbasic(Part, _ , Time1), Time1 gt
    Time.
  • Note to compute howsoon you must first compute
    larger completely.

7
Predicate Dependency Graph
  • The Predicate Dependency Graph for a program P is
    a graph having as nodes the names of the
    predicates in P. The graph contains an arc a b
    if there exists a rule with goal name a and
    head-name b. If the goal is negated then the arc
    is marked as a negative arc.
  • The nodes and arcs of the strong components of
    pdg(P), respectively, identify the recursive
    predicates and recursive rules of P.
  • A program is said to be stratifiable when none of
    its negative arcs belongs to a strong component.o
  • Programs which are stratifiable, have a clear
    meaning non-stratifiable programs are often
    ill-defined from a semantic viewpoint.

8
PCG for howsoon
howsoon
?
larger
timeForbasic
fastest
?
basic_subpart
faster
assembly
part_cost
9
Stratification
  • By sorting on pdg(P), the nodes of P can
    partitioned into a finite set of n strata 1, ...
    , n, such that, for each rule r Î P, the
    predicate-name, of the head of r belongs to a
    stratum that
  • is ³ to each stratum containing some positive
    goal, and also
  • is strictly gt than each stratum containing some
    negated goal.
  • A stratification of a program will be called
    strict every stratum either contains a single
    predicate or a set of predicates that are
    mutually recursive.

10
One-at-the-Time Computations needed for
aggregates
  • Set aggregates, such as count or sum, in SQL,
    require that the element of a set be visited
    one-at-the-time. (These aggregates also require
    arithmetic predicates, that we will consider
    later.)
  • Counting the elements in a set modulo an integer
    does not require arithmetic, but still requires
    the elements of the set be visited
    one-at-the-time.
  • The parity query how many tuples in the base
    relation br(X)an even number of an odd number?

11
The parity query how many tuples in the base
relation br
  • between(X, Z) br(X), br(Y), br(Z), XltY, Y
    ltZ.
  • next(X, Y) br(X), br(Y), X lt Y
    Øbetween(X,Y).
  • next(nil, X) br(X), Øsmaller(X).
  • smaller(X) br(X), br(Y), Y lt X.
  • even(nil). even(Y)
    odd(X), next(X, Y).
  • odd(Y) even(X), next(X, Y).
  • br_is_even even(X), Ønext(X,Y).
  • next sorts the elements of br into an ascending
    chain, where the first link of the chain connects
    the distinguished node nil to the least element
    in br (third rule in the example). This works
    only if br is totally ordered.

12
Expressive Power
  • Data Complexity query languages are viewed as
    mappings from the DB to the answer. The big O is
    evaluated in terms of the size of the database,
    which is always finite.
  • DB-PTIME Polynomial Data Complexity w.r.t. DB
    size
  • 1.   Use Turing machines as the general model of
    computation and encode the database as a tape of
    length n
  • 2.   Then any computable function on the database
    can be encoded as a Turing machine
  • 3.   some of these machines halt (complete their
    computation), in O(n) steps, others in an an
    exponential number of steps, others never
    terminate.
  • 4.   The set machines that halt in a number of
    steps which is polynomial in n defines the class
    of DB-PTIME functions.
  • Number of tuples, data-items, bytes what DB size
    are we talking about?

13
Polynomial Data Complexity
  • Are relational algebra expressions evaluable in
    DB-PTIME?
  • Yes, and actually we use indices and query
    optimizers to keep exponents and coefficient
    small.
  • But these languages cannot express DB-PTIME. For
    instance they cannot express transitive closures,
    or aggregates (thus the most frequently used
    aggregates were added to SQL in ad hoc fashion).

14
The Expressive Power Hierarchy
  • Safe, stratified Datalog programs
  • Can still be computed in polynomial time
  • expresses every DB-PTIME query thus
  • They DB-PTIME complete.
  • But this is only true if we assume that there
    exists a total order in the databases
  • Desiderata Order-independence property of
    queries (genericity) I.e., queries insensitive
    to constant renaming.
  • To express all DB-PTIME queries under genericity,
    a non-deterministic construct such as choice is
    needed (subject covered in ADS book, but not in
    CS240A)
  • DB-PTIME completeness is well below Turing
    completenessfor that you need and infinite
    universe.

15
Functors and Complex Terms
  • Flat parts, their number, shape and weight,
    following the schema part( Part,  Shape ,
     Weight)  
  • part(202, circle(11), actualkg(0.034)).
  • part(121, rectangle(10, 20), unitkg(2.1)).  
  • part_weight(No, Kilos )
  • part(No, , actualkg(Kilos)).
  • part_weight(No, Kilos )
  • part(No, Shape, unitkg(K)),
  • area(Shape, Area), Kilos K Area.  
  • area(circle(Dmtr), A) A Dmtr Dmtr 3.14/4.
  • area(rectangle(Base, Height), A) A
    BaseHeight.
  • The complex terms circle(11), actualkg(34),
    rectangle(10, 20), and unitkg(2.1) are in logical
    parlance called functions (A functor followed by
    a list of arguments in parentheses).

16
Functors (cont.)
  • In actual applications, these complex terms do
    not represent functions to be evaluated they are
    instead used as variable length sub-records.
  • Thus, circle(11) and rectangle(10, 20),
    respectively, denote that the shape of our first
    part is a circle with diameter 20 cm, while the
    shape of the second part is a rectangle with base
    10 cm and height 20 cm. Any number of
    sub-arguments is allowed in such complex terms,
    recursively.
  • Objects of arbitrary complexity, including solid
    objects, can be nested and represented in this
    fashion. Functors are then used as case
    discriminants.

17
Lists
  • is the empty list.
  • Head Tail represents a non-empty list.
  • Example mary, mike, seattle
  • Is a shorthand for mary,mike, seattle, 
  • A list-based representation for suppliers of
    top_tubepart_sup_list(top_tube,cinelli,columbus
    ,mavic).
  • Lists are only syntactic sugaring for a
    particular function symbol.

18
Normalizing a nested relation into a flat
relation
  • flatten(P, S, L)
  • part_sup_list(P, S L). flatten(P, S, L)
  • flatten(P, _, S L).
  • ps(Part, Sup) flatten(Part, Sup, _).
  • This program applied to the previous fact yields.
  • ps(top_tube, cinelli) ps(top_tube, columbus)
    ps(top_tube, mavic)

19
How to Reconstruct the Nested Relation
  • between(P, X, Z) ps(P, X), ps(P, Y),
    ps(P, Z), X lt Y, Y lt Z.
  • smaller(P, X) ps(P, X), ps(P, Y), Y lt X.
  • nested(P, X) ps(P, X), Øsmaller(P, X).
  • nested(P, YXW) nested(P, XW), ps(P,
    Y), X lt Y,  Øbetween(P, X,Y).
  • ps_nested(P, W) nested(P, W),
    Ønested(P, XW).

20
Conclusion
  • Recursion and stratified negation assure
    DB-PTIME completeness for Datalog
  • Practical systems such as LDL also support
    function symbols, arithmetic expressions, and
    many other constructs.
Write a Comment
User Comments (0)
About PowerShow.com