Learning Stochastic Logic Programs - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Learning Stochastic Logic Programs

Description:

Stochastic, from the Greek 'stochos' or 'aim, guess', means of, relating to, or ... two predicate symbols King' and Human' two constant John' and Arthur' 0.5 : ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 24

Provided by: scSn

Category:

more less

Transcript and Presenter's Notes

Title: Learning Stochastic Logic Programs

1
Learning Stochastic Logic Programs

written by Stephen Muggleton
Taeyoung Jeong

2
Contents

Stochastic
Logic Programming
Stochastic Logic Programming
Reconsideration of Bayes Theorem
Results

3
Stochastic

Stochastic, from the Greek "stochos" or "aim,
guess", means of, relating to, or characterized
by conjecture and randomness. A stochastic
process is one whose behavior is
non-deterministic in that a state does not fully
determine its next state. (wikipedia.org,
stochastic)
Fuzzy? Monte-Carlo? Statistic?

4
Logic Program

Given theory P, set of clauses
Find a Model M, that satisfies a clause G

5
Logic Program Example

1) ancestor(x,y)?parent(x,y)2)
ancestor(x,y)?parent(x,z)?anc(z,y)3) parent(A,B)
4) parent(B,C)
Find u,v that satisfies ancestor(u,v)

6
Proof in LP Resolution

Assume that for all (u,v), ancestor(u,v) is
false. It is the same to the sentence, false
? ancestor(u,v)
If we prove this sentence is false for all (u,v),
it means that ancestor(u,v) is true for all
(u,v).

7
Proof in LP Resolution

Refutation Completeness If a (set of) sentence
is not satisfiable, the resolution will always be
able to derive a contradiction.(To prove
Refutation Completeness, we use Herbrands
Theorem, Ground Resolution Theorem, Lifting
Lemma, etc)
If we can derive the contradiction within the
restriction D, that is the prove for it is
satisfiable for D.

8
Logic Program Example again.

1) ancestor(x,y)?parent(x,y)2)
ancestor(x,y)?parent(x,z)?anc(z,y)3) parent(A,B)
4) parent(B,C)
Find u,v that satisfies ancestor(u,v)

9
Sample continues

false?ancestor(u,v)
substitute u/x1,v/y1 with 2)false?parent(x1,z1)
?ancestor(z1,y1)
with 3) x1/A,z1/B, false?ancestor(B,y1)
1) x2/B,y2/y1, false?parent(B,y1)
4) y1/c makes, false?true Contradiction!
linear resolution ltG, C2, C3, C1, C4gt applied.
u/x1,v/y1 x1/A,z1/B x2/B,y2/y1 y1/c gt
uA, vC

10
Stochastic Logic Program

Given SLP S, a set of labeled clauses pC, where
C is clause and p is probability. For each
predicate symbol p, sum of probability pp must be
pp 1.
example (complete pp 1)0.5 coin(head)
?0.5 coin(tail) ?
example (incomplete pp lt 1)0.3 likes(X,Y) ?
pet(Y,X), cat(Y)

11
Proof of SLP

refutation sequence ltG, C1, C2.... gt becomes
lt1G, p1C1, p2C2, .... gt
Every step, pG and qCk makes pqR.
We can find incomplete probability Q(aS)
?DS,(?a)Q(DS,(?a)), for all derivations DS,(?a)
available.

12
Model of SLP

We want to find a distributional L-model M, which
is Q(aS)M(a) for each ground term a.
(unreasonable definition?)
Because we dont have the complete information,
we cannot bound Q(aS).

13
Stochastic LP Model Example 1

0.5 coin(head) ?0.5 coin(tail) ?
Q(coin(head)S) 0.5Q(coin(tail)S) 0.5
0.5coin(head), 0.5coin(tail) is a model
of S.
0.4coin(head), 0.6coin(tail) is not a
model of S.

14
Stochastic LP Model Example 2

Suppose we have in language L,two predicate
symbols King and Humantwo constant John
and Arthur0.5 Human(X) ? King(X)0.5
King(John)
0.5 Human(John) ? King(John)0.5
King(John)--------------------------0.25
Human(John)

15
Example 2 Truth table(?)

Q(Human(John)S) 0.25Q(King(John)S) 0.5
?(Human(John)?King(John)) ? ?Human(John)?King(Jo
hn)

16
Example 2 goes on

We cannot say about the incomplete area, because
we dont have full information.
1Human(John), 0Human(Arthur)1King(John),
0King(Arthur) is a model of S.
0.1Human(John), 0.9Human(Arthur)0.5King(John),
0.5King(Arthur) is not a model of S.

17
With Bayes theorem

Bayes Theorem
Our object Maximize p(SE)
We will take the random examples, thus p(ES)
?p(eiS) ?Q(eiS)

18
What is good learning?

Find sentences that covers all the example
Sentence should be small enough
Not losing the generality Low derivation cost
(leads to the error!)
Size of sentence and the generality of the
hypothesis is in the relation of
tradeoff.(Learning from Positive Data,
S.Muggleton. 2000)

19
Bayes Theorm cont.

With logarithm, it is converted into log2p(S)
means number of bits needed to represent S, while
?log2Q(eiS) is to represent derivations.
We can calculate ?Q(eiS) in a short time, with
LP resolution!

20
So the author

Designed an SLP search strategy, like LP
construction or Parameter estimation, by
redefining the compression function with the
User-defined evaluation function in Progol4.5

21
Works like this
22
Conclusion

Works efficient, finds meaningful solutions
Cannot find optimal solutionsa) LP construction
is approximate since it involves greedy
clause-by-clause construction.b) Parameter
Estimation is only optimal in the case that each
positive example has a unique derivation.

23
Reference

Learning Stochastic Logic Programs. S.Muggleton.
Learning from Positive Data. S.Muggleton.
Semantics and derivation from Stochastic Logic
Programs. S.Muggleton.
ltArtificial Intelligence, a Modern Approach. 2nd
editiongt. S.Russell P.Norvig.

Write a Comment

User Comments (0)