Deductive databases - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Deductive databases

Description:

rules to generate derived facts (extensional relations) Database is ... Debating on whether or not this is correct is pointless; both perspectives are useful ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 74
Provided by: toon3
Category:

less

Transcript and Presenter's Notes

Title: Deductive databases


1
Deductive databases
  • Toon Calders
  • t.calders_at_tue.nl

2
Motivation Deductive DB
  • Motivation is two-fold
  • add deductive capabilities to databases the
    database contains
  • facts (intensional relations)
  • rules to generate derived facts (extensional
    relations)
  • Database is knowledge base
  • Extend the querying
  • datalog allows for recursion

3
Motivation Deductive DB
  • Datalog as engine of deductive databases
  • similarities with Prolog
  • has facts and rules
  • rules define -possibly recursive- views
  • Semantics not always clear
  • safety
  • negation
  • recursion

4
Outline
  • Syntax of the Datalog language
  • Semantics of a Datalog program
  • Relational algebra safe Datalog with negation
    and without recursion
  • Optimization techniques
  • Conclusions

5
Syntax of Datalog
  • Datalog query/program
  • facts ? traditional relational tables
  • rules ? define intensional views
  • Rules
  • if-then rules
  • can contain recursion
  • can contain negations
  • Semantics of program can be ambiguous

6
Syntax of Datalog
  • Example
  • father(X,Y) - person(X,m), parent(X,Y).
  • grandson(X,Y) - parent(Y,Z), parent(Z,X),
    person(X,m).
  • hbrothers(X,Y) - person(X,m), person(Y,m),
    parent(Z,X), parent(Z,Y).

7
Syntax of Datalog
  • Variables X, Y
  • Constants m, f, rita,
  • Positive literal p(t1,,tn)
  • p is the name of a relation (EDB or IDB)
  • t1, , tn constants or variables
  • Negative literal not p(t1, , tn)
  • Rule h - l1, , ln
  • h positive literal, l1, , ln literals

In Datalog Correct negation ( In contrast to
Prologs negation by failure )
8
Syntax of Datalog
  • Rule can be recursive
  • Arithmetic operations considered as special
    predicates
  • AltB smaller(A,B)
  • ABC plus(A,B,C)

9
Outline
  • Syntax of the Datalog language
  • Semantics of a Datalog program
  • non-recursive
  • recursive datalog
  • aggregation
  • Relational algebra safe Datalog with negation
    and without recursion
  • Optimization techniques
  • Conclusions

10
Semantics of Non-Recursive Datalog Programs
  • Ground instantiation of a rule h - l1, , ln
    replace every variable in the rule by a constant
  • Example
  • father(X,Y) - person(X,m), parent(X,Y)
  • instantiation
  • father(toon,an) - person(toon,m),
    parent(toon,an).

11
Semantics of Non-Recursive Datalog Programs
  • Let I be a set of facts
  • The body of a rule instantiation R is satisfied
    by I if
  • every positive literal in the body of R is in I
  • no negative literal in the body of R is in I
  • Example
  • person(toon,m), parent(toon,an) not satisfied by
    the facts given before

12
Semantics of Non-Recursive Datalog Programs
  • Let I be a set of facts
  • R is a rule h - l1, , ln
  • Infer(R,I) h
  • h - l1, , ln is a ground instantiation of R
  • l1 ln is satisfied by I
  • RR1, , RnInfer(R,I) Infer(R1,I) ? ?
    Infer(Rn,I)

13
Semantics of Non-Recursive Datalog Programs
  • A rule h - l1, , ln is in layer 1
  • l1, , ln only involve extensional predicates
  • A rule h - l1, , ln is in layer i
  • for all 0ltjlti, it is not in layer j
  • l1, , ln only involve predicates that are
    extensional and in the layers 1, , i-1

14
Semantics of Non-Recursive Datalog Programs
  • Let I0 be the facts in a datalog program
  • Let R1 be the rules at layer 1
  • Let Rn be the rules at layer n
  • I1 I0 ? Infer(R1, I0)
  • I2 I1 ? Infer(R2, I1)
  • In In-1 ? Infer(Rn, In-1)

15
Semantics of Non-Recursive Datalog Programs
  • Example

father(X,Y) - person(X,m), parent(X,Y). grandfa
ther(X,Y) - father(X,Y), parent(Y,Z). hbrothers
(X,Y) - person(X,m), person(Y,m),
parent(Z,X), parent(Z,Y).
16
Semantics of Non-Recursive Datalog Programs
  • Example

Stratum 0
father(X,Y) - person(X,m), parent(X,Y). grandfa
ther(X,Y) - father(X,Y), parent(Y,Z). hbrothers
(X,Y) - person(X,m), person(Y,m),
parent(Z,X), parent(Z,Y).
17
Semantics of Non-Recursive Datalog Programs
Stratum 1
  • Example

father alex toon jan an toon bernd toon
mattijs hbrothers bernd mattijs matti
js bernd mattijs mattijs bernd bernd
Stratum 0
father(X,Y) - person(X,m), parent(X,Y). grandfa
ther(X,Y) - father(X,Y), parent(Y,Z). hbrothers
(X,Y) - person(X,m), person(Y,m),
parent(Z,X), parent(Z,Y).
18
Semantics of Non-Recursive Datalog Programs
Stratum 1
Stratum 2
  • Example

father grandfather alex toon
jan mattijs jan an jan bernd toon bernd
alex mattijs toon mattijs alex bernd
hbrothers bernd mattijs mattijs bernd mat
tijs mattijs bernd bernd
Stratum 0
father(X,Y) - person(X,m), parent(X,Y). grandfa
ther(X,Y) - father(X,Y), parent(Y,Z). hbrothers(
X,Y) - person(X,m), person(Y,m),
parent(Z,X), parent(Z,Y).
19
Caveat Correct Negation
  • Negation in Datalog ? Negation in Prolog
  • Prolog negation (Negation by failure)
  • not(p(X)) is true if we fail to prove p(X)
  • Datalog negation (Correct negation)
  • not(p(X)) binds X to a value such that p(X) does
    not hold.

20
Caveat Correct Negation
  • Example
  • father(a,b). person(a). person(b).
  • nfather(X) - person(X), not( father (X,Y) ),
    person(Y).
  • Datalog
  • ? nfather(X) ? (a), (b)
  • Prolog
  • ? nfather(X) ? (b)
  • ? person(a), not(father(a,a)), person(a) ? yes

21
Caveat Correct Negation
  • Prolog
  • Order of the clauses is important
  • nfather(X) - person(X), not( father (X,Y) ),
    person(Y).
  • versus
  • nfather(X) - person(X), person(Y), not(
    father (X,Y) ).
  • Order of the rules is important
  • Datalog
  • Order not important
  • More declarative

22
Caveat Correct Negation
  • Difference is not fundamental
  • Prolog
  • nfather(X) - person(X), not( father (X,Y) ).
  • ?
  • Datalog
  • nfather(X) - person(X), not(father_of_someone(X)
    ).
  • father_of_someone(X) - father (X,Y).

23
Caveat Correct Negation
  • Difference is not fundamental
  • Many systems that claim to implement Datalog,
    actually implement negation by failure.
  • Debating on whether or not this is correct is
    pointless both perspectives are useful
  • Check on beforehand how an engine implements
    negation
  • Throughout the course, in all exercises, in the
    exam, , we assume correct negation.

24
Safety
  • A rule can make no sense if variables appear in
    funny ways
  • Examples
  • S(x) - R(y)
  • S(x) - not R(x)
  • S(x) - R(y), xlty
  • In each of these cases the result is infinite
    even if the relation R is finite

25
Safety
  • Even when not leading to infinite relations, such
    Datalog Programs can be domain-dependent.
  • Example
  • s(a,b). s(a,a). r(a). r(b).
  • t(X) - not(s(X,Y)), r(X).
  • If domain is a,b
  • only t(b) holds.
  • If domain is a,b,c
  • not only t(b), but also t(a) holds
  • ( Ground instantiation t(a) - not(s(a,c)),
    r(a). )

26
Safety
  • Therefore, we will only consider rules that are
    safe.
  • A rule h - l1, , ln is safe if
  • every variable in the head of the rule also in a
    non-arithmetic positive literale in body
  • every variable in a negative literal of the body
    also in some positive literal of the body

27
Model-Theoretic Semantics
  • A model M of a Datalog program is
  • An instatiation of all intensional relations in
    the program
  • That satisfies all rules in the program
  • If the body of a ground instantiation of a rule
    holds in M, also the head must hold
  • Some models are special

28
Model-Theoretic Semantics
  • father(a,b).
  • person(X) - father(X,Y).
  • person(Y) - father(X,Y).
  • M1 father person
  • a b a
  • b
  • M2 father person
  • a b a
  • b a b
  • a a

29
Model-Theoretic Semantics
  • A model is minimal if  we cannot remove tuples 
  • M1 father person
  • a b a
  • b
  • M2 father person
  • a b a
  • b a b
  • a a

Minimal
Not Minimal
30
Model-Theoretic Semantics
  • For non-recursive, safe datalog programs
    semantics is well defined
  • The model all facts that can be derived from
    the program
  • Closed-World Assumption if a fact cannot be
    derived from the database, then it is not true
  • Is a minimal model

31
Model-Theoretic Semantics
  • Minimal model is, however, not necessarily unique
  • Example
  • r(a).
  • t(X) - r(X), not s(X).
  • minimal models
  • M1 M2
  • r s t r s t
  • a a a a

32
Outline
  • Syntax of the Datalog language
  • Semantics of a Datalog program
  • non-recursive
  • recursive datalog
  • aggregation
  • Relational algebra safe Datalog with negation
    and without recursion
  • Optimization techniques
  • Conclusions

33
Semantics of Recursive Datalog Programs
  • g(a,b). g(b,c). g(a,d).
  • reach(X,X) - g(X,Y). reach(Y,Y) - g(X,Y)
  • reach(X,Y) - g(X,Y).
  • reach(X,Z) - reach(X,Y), reach(Y,Z).
  • Fixpoint of a set of rules R, starting with set
    of facts I
  • repeat
  • Old_I II I ? infer(R,I)
  • until I Old_I
  • Always termination (inflationary fixpoint)

34
Semantics of Recursive Datalog Programs
  • g(a,b). g(b,c). g(a,d).
  • reach(X,X) - g(X,Y). reach(Y,Y) - g(X,Y)
  • reach(X,Y) - g(X,Y).
  • reach(X,Z) - reach(X,Y), reach(Y,Z).
  • Step 0 reach
  • Step 1 reach (a,a), (b,b), (c,c), (d,d),
    (a,b), (b,c), (a,d)
  • Step 2 reach (a,a), (b,b), (c,c), (d,d),
    (a,b), (b,c), (a,d), (a,c)
  • Step 3 reach (a,a), (b,b), (c,c), (d,d),
    (a,b), (b,c), (a,d), (a,c) STOP

35
Semantics of Recursive Datalog Programs
  • Datalog without negation
  • Always a unique minimal model.
  • Semantics of recursive datalog with negation is
    less clear.
  • Example
  • T(a).
  • R(X) - T(X), not S(X).
  • S(X) - T(X), not R(X).
  • What about R(a)? S(a)?

36
Semantics of Recursive Datalog Programs
  • For some classes of Datalog queries with negation
    still a natural semantics can be defied
  • Important class stratified programs
  • T depends on S if some rule with T in the head
    contains S or (recursively) some predicate that
    depends on S, in the body.
  • Stratified program If T depends on (not S),
    then S cannot depend on T or (not T).

37
Semantics of Recursive Datalog Programs
  • The program
  • T(a).
  • R(X) - T(X), not S(X).
  • S(X) - T(X), not R(X).
  • is not stratified
  • R depends negatively on S
  • S depends negatively on R

R T S
38
Semantics of Recursive Datalog Programs
  • g(a,b). g(b,c). g(a,d).
  • reach(X,X) - g(X,Y).
  • reach(Y,Y) - g(X,Y).
  • reach(X,Y) - g(X,Y).
  • reach(X,Z) - reach(X,Y),
  • reach(Y,Z).
  • node(X) - g(X,Y).
  • node(Y) - g(X,Y).
  • unreach(X,Y) - node(X), node(Y), not
    reach(X,Y).

g reach node unreach
39
Semantics of Recursive Datalog Programs
  • If a program is stratified, the tables in the
    program can be partitioned into strata
  • Stratum 0 All database tables.
  • Stratum I Tables defined in terms of tables in
    Stratum I and lower strata.
  • If T depends on (not S), S is in lower stratum
    than T.

40
Semantics of Recursive Datalog Programs
  • g(a,b). g(b,c). g(a,d).
  • reach(X,X) - g(X,Y).
  • reach(Y,Y) - g(X,Y).
  • reach(X,Y) - g(X,Y).
  • reach(X,Z) - reach(X,Y),
  • reach(Y,Z).
  • node(X) - g(X,Y).
  • node(Y) - g(X,Y).
  • unreach(X,Y) - node(X), node(Y), not
    reach(X,Y).

0
g reach node unreach
1
2
41
Semantics of Recursive Datalog Programs
  • Semantics of a stratified program given by
  • First, compute the least fixpoint of all tables
    in Stratum 1. (Stratum 0 tables are fixed.)
  • Then, compute the least fixpoint of tables in
    Stratum 2 then the lfp of tables in Stratum 3,
    and so on, stratum-by-stratum.

42
Semantics of Recursive Datalog Programs
  • Fixpoint of a set of rules R, starting with set
    of facts I
  • repeat
  • Old_I II I ? infer(R,I)
  • until I Old_I
  • Fixpoint within one stratum always terminates
  • Due to monotonicity within the strata
  • Only positive dependence between tables in
    stratum l.
  • Due to finite program, number of strata isfinite
    as well

43
Semantics of Recursive Datalog Programs
  • Stratum 0 g(a,b). g(b,c). g(a,d).
  • Stratum 1 node(a), node(b), node(c),
    node(d),reach(a,a), reach(b,b), reach(c,c),
    reach(d,d), reach(a,b), reach(b,c),
  • Stratum 2
  • unreach(b,a), unreach(c,a),

44
Outline
  • Syntax of the Datalog language
  • Semantics of a Datalog program
  • non-recursive
  • recursive datalog
  • aggregation
  • Relational algebra safe Datalog with negation
    and without recursion
  • Optimization techniques
  • Conclusions

45
Aggregate Operators
Degree(X, SUM(ltYgt)) - g(X,Y).
  • The lt gt notation in the head indicates
    grouping the remaining arguments (X, in this
    example) are the GROUP BY fields.
  • In order to apply such a rule, must have all of
    relation g available.
  • Stratification with respect to use of lt gt is
    similar to negation.

46
Aggregate Operators
  • bi(X,Y) - g(X,Y). g
  • bi(Y,X) - g(X,Y).
  • Degree(X, SUM(ltYgt)) - bi(X,Y). bi
  • degree

47
Aggregate Operators
  • bi(X,Y) - g(X,Y). g
  • bi(Y,X) - g(X,Y).
  • Degree(X, SUM(ltYgt)) - bi(X,Y). bi
  • degree

0
1
2
48
Aggregate Operators
  • bi(X,Y) - g(X,Y). g
  • bi(Y,X) - g(X,Y).
  • Degree(X, SUM(ltYgt)) - bi(X,Y). bi
  • degree
  • Compute stratum by stratum
  • Assume strata 1 ? k fixed when computing k1

0
1
2
49
Aggregate Operators
  • r(a,b). r(a,c). s(a,d).
  • t(X,SUM(ltYgt)) - r(X,Y).
  • r(X,Y) - t(X,Z), Z2, s(X,Y).

50
Aggregate Operators
  • r(a,b). r(a,c). s(a,d). t
  • t(X,SUM(ltYgt)) - r(X,Y).
  • r(X,Y) - t(X,Z), Z2, s(X,Y). r s

51
Aggregate Operators
  • r(a,b). r(a,c). s(a,d). t
  • t(X,SUM(ltYgt)) - r(X,Y).
  • r(X,Y) - t(X,Z), Z2, s(X,Y). r s
  • a is aggregating over a moving target
  • Step 1 t(a,2) is added
  • Step 2 r(a,d) is added
  • Step 3 t(a,3) added, t(a,2) no longer true
    hence r(a,d) should not have been added

52
Outline
  • Syntax of the Datalog language
  • Semantics of a Datalog program
  • Relational algebra Safe Datalog with negation
    and without recursion
  • Optimization techniques
  • Conclusions

53
RA Non-Recursive Datalog
  • Every operator of RA can be simulated by
    non-recursive datalog
  • Project on the first attribute of the ternary
    relation r query (A) r(A, B, C).
  • Cartesian product of relations r1 and r2.
  • query (X1, X2, ..., Xn, Y1, Y1, Y2, ..., Ym )
    r1 (X1, X2, ..., Xn ), r2 (Y1, Y2, ..., Ym
    ).
  • Union of relations r1 and r2.
  • query (X1, X2, ..., Xn ) r1 (X1, X2, ..., Xn
    ), query (X1, X2, ..., Xn ) r2 (X1, X2, ...,
    Xn ),
  • Set difference of r1 and r2.
  • query (X1, X2, ..., Xn ) r1(X1, X2, ..., Xn
    ), not r2 (X1, X2, ..., Xn )

54
RA Non-Recursive Datalog
  • Every operator of RA can be simulated by
    non-recursive datalog
  • Result of our construction is always safe, and
    equivalent for stratified semantics
  • ?13 ((?1R) x R)
  • ?
  • query1(A) - R(A,B).
  • query2(A,B,C) - query1(A), R(B,C).
  • result(A,B,A) - query2(A,B,A)

55
RA Non-Recursive Datalog
  • Every rule can be expressed by one RA expression
  • Translate every atom separately
  • Negation/arithmetic use complement
    construction
  • Essential safety
  • Combine atoms with Cartesian product
  • Do the joins with a selection
  • Project on the relevant attributes
  • Strata determine the order of evaluation
  • Because of no recursion every rule only executed
    once.

56
RA Non-Recursive Datalog
  • sister(X,Y) - person(X,f), parent(Z,X),
    parent(Z,Y), not(XY).
  • person(X,f) ?2f Person
  • parent(Z,X) and parent(Z,Y) Parent
  • not(XY) complement construction
  • X comes from parent(Z,X) ? ?2 Parent
  • Y from parent(Z,Y) ? ?2 Parent
  • not(XY) ? ?1?2 (?2 Parent x ?2 Parent)

57
RA Non-Recursive Datalog
  • sister(X,Y) - person(X,f), parent(Z,X),
    parent(Z,Y), not(XY).
  • ?1,6
  • ?14, 35, 17, 68
  • (?2f Person x Parent x Parent
  • x ?1?2 (?2 Parent x ?2 Parent))

58
RA Non-Recursive Datalog
  • Hence, the following two are equivalent in
    expressive power
  • Safe Datalog with negation, without recursion or
    aggregation, under the stratified semantics
  • Relational Algebra
  • Every rule separately can be expressed by a
    relational algebra expression
  • Makes it very suitable for implementation on top
    of a relational database

59
Outline
  • Syntax of the Datalog language
  • Semantics of a Datalog program
  • Relational algebra Datalog with negation and
    without recursion
  • Optimization techniques
  • Conclusions

60
Evaluation of Datalog Programs
  • Running example
  • root(r). child(r,a). child(r,b). child(a,c).
  • child(a,d). child(c,e). child(d,f). child(b,h).
  • sg(X,Y) - root(X),root(Y).
  • sg(X,Y) - child(X,U), sg(U,V), child(Y,V).

r
a
b
c
d
h
f
e
61
Evaluation of Datalog Programs Issues
  • Repeated inferences recursive rules are
    repeatedly applied in the naïve way same
    inferences in several iterations.
  • Unnecessary inferences if we just want to find
    sg of a particular node, say e, computing the
    fixpoint of the sg program and then selecting
    tuples with e in the first column is wasteful, in
    that we compute many irrelevant facts.

62
Evaluation of Datalog Programs
  • Running example
  • Query ? sg(e,X)
  • (r, r)
  • (a,a), (b,b), (a,b), (b,a)
  • (c,c), (c,d), (c,h), (d,c), (d,d),
  • (e,e), (f,f), (e,f), (f,e)

r
a
b
c
d
h
f
e
63
Avoiding Repeated Inferences
  • Seminaive Fixpoint Evaluation Avoid repeated
    inferences at least one of the body facts
    generated in the most recent iteration.
  • For each recursive table P, use a table delta_P.
  • Rewrite the program to use the delta tables.
  • A second evaluation of the rule
  • r(X,Y) - s(X), t(Y), u(X,Z), v(Y,Z).
  • only gives new tuples (X,Y) for ground
    instantiations in which at least one of the atoms
    is new.

64
Avoiding Unnecessary Inferences
  • Still, in the running example
  • many unnecessary deductions when query is ?
    sg(e,X)
  • Compare with top-down
  • as in Prolog
  • only facts that are connected to the ultimate
    goal are being considered

65
The Prolog Way
  • sg(X,Y) - root(X),root(Y).
  • sg(X,Y) - child(X,U), sg(U,V), child(Y,V).
  • ? sg(e,X).
  • try root(e) FAIL
  • try child(e,U)
  • ? Uc
  • try sg(c,V)
  • try root(e) FAIL
  • try child(c,U)
  • ? Ua

r
a
b
c
d
h
f
e
66
The Prolog Way
  • sg(X,Y) - root(X),root(Y).
  • sg(X,Y) - child(X,U), sg(U,V), child(Y,V).
  • ? sg(e,X).
  • try root(e) FAIL
  • try child(e,U)
  • ? Uc
  • try sg(c,V)
  • try root(e) FAIL
  • try child(c,U)
  • ? Ua

r
a
b
c
d
h
f
e
67
Magic Sets Idea
  • We want to do something similar for Datalog
  • Idea Define a filter table computes all
    relevant values, restricts the computation of
    sg(e,X).
  • sg(X,Y) - m(X), root(X), root(Y).
  • sg(X,Y) - m(X), child(X,U), sg(U,V),child(Y,V).
  • m(X) - m(Y), child(Y,X).
  • m(e).

68
Magic Sets
  • It is always possible to do this in such a way
    that bottom-up becomes as efficient as top-down!
  • Different proposals exist in literature
  • how to introduce the magic filters

69
Optimization Techniques
  • Many other techniques exist as well
  • Standard relational indexing techniques
  • (partly) materializing intensional relations on
    beforehand
  • Trade-off memory ?? query time performance
  • (See also the OLAP-part for a similar technique)
  • Different representations for relations
  • BDD (Stanford)

70
Outline
  • Syntax of the Datalog language
  • Semantics of a Datalog program
  • Relational algebra Datalog with negation and
    without recursion
  • Optimization techniques
  • Conclusions

71
Conclusions
  • Datalog adds deductive capabilities to databases
  • extensional relations
  • intensional relations
  • Datalog without recursion, with negation
  • safety requirement
  • stratification
  • equal in power to relational algebra
  • Closed World Assumption

72
Conclusions
  • Datalog without Negation
  • Always a unique minimal model
  • Datalog with negation and recursion
  • semantics not always clear
  • stratified negation
  • Evaluation of datalog queries
  • without negation RA-optimization
  • with recursion
  • semi-naive recursion
  • magic sets

73
Conclusions
  • Very nice idea, but
  • Deductive databases did not make it as a database
    paradigm
  • Yet, many ideas survived
  • recursion in SQL
  • And others may re-surface in future.
  • Increasing need for adding meta-information in
    databases
Write a Comment
User Comments (0)
About PowerShow.com