Learning Stochastic Logic Programs - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Learning Stochastic Logic Programs

Description:

Stochastic, from the Greek 'stochos' or 'aim, guess', means of, relating to, or ... two predicate symbols King' and Human' two constant John' and Arthur' 0.5 : ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 24
Provided by: scSn
Category:

less

Transcript and Presenter's Notes

Title: Learning Stochastic Logic Programs


1
Learning Stochastic Logic Programs
  • written by Stephen Muggleton
  • Taeyoung Jeong

2
Contents
  • Stochastic
  • Logic Programming
  • Stochastic Logic Programming
  • Reconsideration of Bayes Theorem
  • Results

3
Stochastic
  • Stochastic, from the Greek "stochos" or "aim,
    guess", means of, relating to, or characterized
    by conjecture and randomness. A stochastic
    process is one whose behavior is
    non-deterministic in that a state does not fully
    determine its next state. (wikipedia.org,
    stochastic)
  • Fuzzy? Monte-Carlo? Statistic?

4
Logic Program
  • Given theory P, set of clauses
  • Find a Model M, that satisfies a clause G

5
Logic Program Example
  • 1) ancestor(x,y)?parent(x,y)2)
    ancestor(x,y)?parent(x,z)?anc(z,y)3) parent(A,B)
    4) parent(B,C)
  • Find u,v that satisfies ancestor(u,v)

6
Proof in LP Resolution
  • Assume that for all (u,v), ancestor(u,v) is
    false. It is the same to the sentence, false
    ? ancestor(u,v)
  • If we prove this sentence is false for all (u,v),
    it means that ancestor(u,v) is true for all
    (u,v).

7
Proof in LP Resolution
  • Refutation Completeness If a (set of) sentence
    is not satisfiable, the resolution will always be
    able to derive a contradiction.(To prove
    Refutation Completeness, we use Herbrands
    Theorem, Ground Resolution Theorem, Lifting
    Lemma, etc)
  • If we can derive the contradiction within the
    restriction D, that is the prove for it is
    satisfiable for D.

8
Logic Program Example again.
  • 1) ancestor(x,y)?parent(x,y)2)
    ancestor(x,y)?parent(x,z)?anc(z,y)3) parent(A,B)
    4) parent(B,C)
  • Find u,v that satisfies ancestor(u,v)

9
Sample continues
  • false?ancestor(u,v)
  • substitute u/x1,v/y1 with 2)false?parent(x1,z1)
    ?ancestor(z1,y1)
  • with 3) x1/A,z1/B, false?ancestor(B,y1)
  • 1) x2/B,y2/y1, false?parent(B,y1)
  • 4) y1/c makes, false?true Contradiction!
  • linear resolution ltG, C2, C3, C1, C4gt applied.
  • u/x1,v/y1 x1/A,z1/B x2/B,y2/y1 y1/c gt
    uA, vC

10
Stochastic Logic Program
  • Given SLP S, a set of labeled clauses pC, where
    C is clause and p is probability. For each
    predicate symbol p, sum of probability pp must be
    pp 1.
  • example (complete pp 1)0.5 coin(head)
    ?0.5 coin(tail) ?
  • example (incomplete pp lt 1)0.3 likes(X,Y) ?
    pet(Y,X), cat(Y)

11
Proof of SLP
  • refutation sequence ltG, C1, C2.... gt becomes
    lt1G, p1C1, p2C2, .... gt
  • Every step, pG and qCk makes pqR.
  • We can find incomplete probability Q(aS)
    ?DS,(?a)Q(DS,(?a)), for all derivations DS,(?a)
    available.

12
Model of SLP
  • We want to find a distributional L-model M, which
    is Q(aS)M(a) for each ground term a.
    (unreasonable definition?)
  • Because we dont have the complete information,
    we cannot bound Q(aS).

13
Stochastic LP Model Example 1
  • 0.5 coin(head) ?0.5 coin(tail) ?
  • Q(coin(head)S) 0.5Q(coin(tail)S) 0.5
  • 0.5coin(head), 0.5coin(tail) is a model
    of S.
  • 0.4coin(head), 0.6coin(tail) is not a
    model of S.

14
Stochastic LP Model Example 2
  • Suppose we have in language L,two predicate
    symbols King and Humantwo constant John
    and Arthur0.5 Human(X) ? King(X)0.5
    King(John)
  • 0.5 Human(John) ? King(John)0.5
    King(John)--------------------------0.25
    Human(John)

15
Example 2 Truth table(?)
  • Q(Human(John)S) 0.25Q(King(John)S) 0.5
  • ?(Human(John)?King(John)) ? ?Human(John)?King(Jo
    hn)

16
Example 2 goes on
  • We cannot say about the incomplete area, because
    we dont have full information.
  • 1Human(John), 0Human(Arthur)1King(John),
    0King(Arthur) is a model of S.
  • 0.1Human(John), 0.9Human(Arthur)0.5King(John),
    0.5King(Arthur) is not a model of S.

17
With Bayes theorem
  • Bayes Theorem
  • Our object Maximize p(SE)
  • We will take the random examples, thus p(ES)
    ?p(eiS) ?Q(eiS)

18
What is good learning?
  • Find sentences that covers all the example
  • Sentence should be small enough
  • Not losing the generality Low derivation cost
    (leads to the error!)
  • Size of sentence and the generality of the
    hypothesis is in the relation of
    tradeoff.(Learning from Positive Data,
    S.Muggleton. 2000)

19
Bayes Theorm cont.
  • With logarithm, it is converted into log2p(S)
    means number of bits needed to represent S, while
    ?log2Q(eiS) is to represent derivations.
  • We can calculate ?Q(eiS) in a short time, with
    LP resolution!

20
So the author
  • Designed an SLP search strategy, like LP
    construction or Parameter estimation, by
    redefining the compression function with the
    User-defined evaluation function in Progol4.5

21
Works like this
22
Conclusion
  • Works efficient, finds meaningful solutions
  • Cannot find optimal solutionsa) LP construction
    is approximate since it involves greedy
    clause-by-clause construction.b) Parameter
    Estimation is only optimal in the case that each
    positive example has a unique derivation.

23
Reference
  • Learning Stochastic Logic Programs. S.Muggleton.
  • Learning from Positive Data. S.Muggleton.
  • Semantics and derivation from Stochastic Logic
    Programs. S.Muggleton.
  • ltArtificial Intelligence, a Modern Approach. 2nd
    editiongt. S.Russell P.Norvig.
Write a Comment
User Comments (0)
About PowerShow.com