Title: Introduction to PLL
1Overview
- Introduction to PLL
- Foundations of PLL
- Logic Programming, Bayesian Networks, Hidden
Markov Models, Stochastic Grammars - Frameworks of PLL
- Independent Choice Logic,Stochastic Logic
Programs, PRISM, - Bayesian Logic Programs, Probabilistic Logic
Programs,Probabilistic Relational Models - Logical Hidden Markov Models
- Applications
2Probabilistic Logic Programs (PLPs)
Haddawy, Ngo
- Atoms set of similar RVs
- First arguments RV
- Last argument state
- Clause CPD entry
e b 0.9
- Probability distribution over Herbrand
interpretations
0.1 burglary(true). 0.9
burglary(false). 0.01 earthquake(true). 0.99
earthquake(false). 0.9 alarm(true) -
burglary(true), earthquake(true). ...
burglary(true) and burglary(false) true in the
same interpretation ?
false - burglary(true), burglary(false). burglary
(true) burglary(false) - true. false -
earthquake(true), earthquake(false). ...
Integrity constraints
3Probabilistic Logic Programs (PLPs)
Haddawy, Ngo
father(rex,fred). mother(ann,fred).
father(brian,doro). mother(utta, doro).
father(fred,henry). mother(doro,henry).
Qualitative Part Quantitative Part
1.0 mc(P,a) - mother(M,P), pc(M,a),mc(M,a). 0.0
mc(P,b) - mother(M,P), pc(M,a),mc(M,a). ... 0.
5 pc(P,a) - father(F,P), pc(F,0),mc(F,a). 0.5
pc(P,0) - father(F,P), pc(F,0),mc(F,a). ... 1.0
bt(P,a) - mc(P,a),pc(P,a)
Variable Binding
false - pc(P,a),pc(P,b), pc(P,0). pc(P,a)pc(P,b)
pc(P,0) - person(P). ...
4Probabilistic Logic Programs (PLPs)
Haddawy, Ngo
father(rex,fred). mother(ann,fred).
father(brian,doro). mother(utta, doro).
father(fred,henry). mother(doro,henry).
1.0 mc(P,a) - mother(M,P), pc(M,a),mc(M,a). 0.0
mc(P,b) - mother(M,P), pc(M,a),mc(M,a). ... 0.
5 pc(P,a) - father(F,P), pc(F,0),mc(F,a). 0.5
pc(P,0) - father(F,P), pc(F,0),mc(F,a). ... 1.0
bt(P,a) - mc(P,aa),pc(P,aa)
mc(ann)
mc(rex)
pc(rex)
pc(ann)
mc(brian)
pc(brian)
mc(utta)
pc(utta)
pc(fred)
pc(doro)
mc(fred)
mc(doro)
bt(brian)
bt(utta)
bt(rex)
bt(ann)
mc(henry)
pc(henry)
bt(fred)
bt(doro)
bt(henry)
false - pc(P,a),pc(P,b), pc(P,0). pc(P,a)pc(P,b)
pc(P,0) - person(P). ...
5Probabilistic Logic Programs (PLPs)
Haddawy, Ngo
- Unique probability distribution over Herbrand
interpretations - finite branching factor, finite proofs, no
self-dependency - Atoms States
- Integrity constraints encode mutually excl.
states - BN used to do inference
- Functors / Turing-complete programming language
- BNs, HMMs, DBNs, SCFGs, ...
- No learning
6Probabilistic Relational Models (PRMs)
Getoor,Koller, Pfeffer
- Database theory
- Entity-Relationship Models
- Attributes RV
Database
alarm system
Earthquake
Burglary
Table
Alarm
MaryCalls
JohnCalls
Attribute
7Probabilistic Relational Models (PRMs)
Getoor,Koller, Pfeffer
(Father)
(Mother)
Bloodtype
Bloodtype
M-chromosome
M-chromosome
P-chromosome
P-chromosome
Person
Person
M-chromosome
P-chromosome
Bloodtype
Person
8Probabilistic Relational Models (PRMs)
Getoor,Koller, Pfeffer
father(Father,Person).
(Father)
(Mother)
mother(Mother,Person).
Bloodtype
Bloodtype
M-chromosome
M-chromosome
P-chromosome
P-chromosome
Person
Person
bt(Person,BT).
M-chromosome
P-chromosome
pc(Person,PC).
mc(Person,MC).
Bloodtype
Person
Dependencies (CPDs associated with)
bt(Person,BT) - pc(Person,PC), mc(Person,MC).
pc(Person,PC) - pc_father(Father,PCf),
mc_father(Father,MCf).
View
pc_father(Person,PCf) father(Father,Person),pc(
Father,PC). ...
9Probabilistic Relational Models (PRMs)
Getoor,Koller, Pfeffer
father(rex,fred). mother(ann,fred).
father(brian,doro). mother(utta, doro).
father(fred,henry). mother(doro,henry).
pc_father(Person,PCf) father(Father,Person),pc(
Father,PC). ...
mc(Person,MC) pc_mother(Person,PCm),
pc_mother(Person,MCm).
pc(Person,PC) pc_father(Person,PCf),
mc_father(Person,MCf).
bt(Person,BT) pc(Person,PC), mc(Person,MC).
State
RV
10Probabilistic Relational Models (PRMs)
Getoor,Koller, Pfeffer
- Database View
- Unique Probability Distribution over finite
Herbrand interpretations - No self-dependency
- Discrete and continuous RV
- BN used to do inference
- Highlight Graphical Representation
- Focus on class level
- BNs
- Learning
11Bayesian Logic Programs (BLPs)
Kersting, De Raedt
Rule Graph
earthquake/0
burglary/0
alarm/0
maryCalls/0
johnCalls/0
alarm - earthquake, burglary.
12Bayesian Logic Programs (BLPs)
Kersting, De Raedt
Rule Graph
pc/1
mc/1
bt/1
variable
bt(Person) - pc(Person),mc(Person).
13Bayesian Logic Programs (BLPs)
Kersting, De Raedt
pc/1
mc/1
bt/1
mc(Person) mother(Mother,Person),
pc(Mother),mc(Mother).
pc(Person) father(Father,Person),
pc(Father),mc(Father).
bt(Person) pc(Person),mc(Person).
14Bayesian Logic Programs (BLPs)
Kersting, De Raedt
father(rex,fred). mother(ann,fred).
father(brian,doro). mother(utta, doro).
father(fred,henry). mother(doro,henry).
mc(Person) mother(Mother,Person),
pc(Mother),mc(Mother).
pc(Person) father(Father,Person),
pc(Father),mc(Father).
bt(Person) pc(Person),mc(Person).
Bayesian Network induced over least Herbrand model
15Bayesian Logic Programs (BLPs)
Kersting, De Raedt
- Unique probability distribution over Herbrand
interpretations - Finite branching factor, finite proofs, no
self-dependency - Highlight
- Separation of qualitative and quantitative parts
- Functors
- Graphical Representation
- Discrete and continuous RV
- BNs, DBNs, HMMs, SCFGs, Prolog ...
- Turing-complete programming language
- Learning
16Declaritive Semantics
- Dependency Graph
- (possibly infite) Bayesian network
consequence operator
If the body of C holds then the head holds, too
mc(fred) is true because mother(ann,fred)
mc(ann),pc(ann) are true
17Procedural Semantics
P(bt(ann)) ?
18Procedural Semantics
Bayes rule
P(bt(ann) bt(fred))
P(bt(ann), bt(fred)) ?
mc(ann)
pc(ann)
mc(rex)
pc(rex)
mc(brian)
pc(brian)
mc(utta)
pc(utta)
pc(fred)
pc(doro)
mc(fred)
mc(doro)
bt(brian)
bt(rex)
bt(utta)
bt(ann)
mc(henry)
pc(henry)
bt(fred)
bt(doro)
bt(henry)
19Queries using And/Or trees
P(bt(fred)) ?
bt(fred)
Or node is proven if at least one of its
successors is provable. And node is proven if all
of its successors are provable.
pc(fred), mc(fred)
pc(fred)
mc(fred)
father(rex,fred),mc(rex),pc(rex)
mother(ann,fred),mc(ann),pc(ann)
mc(ann)
pc(ann)
mc(rex)
father(rex,fred)
pc(rex)
mother(ann,fred)
pc(fred)
mc(fred)
mc(rex)
mc(ann)
bt(ann)
pc(rex)
pc(ann))
bt(fred)
...
20Combining Partial Knowledge
...
discusses/2
read/1
prepared/2
passes/1
21Combining Partial Knowledge
Topic
discusses
Book
prepared
read
Student
prepared(Student,Topic) read(Student,Book),
discusses(Book,Topic).
- variable of parents for prepared/2 due to
read/2 - whether a student prepared a topic depends on the
books she read - CPD only for one book-topic pair
22Combining Rules
Topic
P(AB) and P(AC)
discusses
Book
prepared
read
CR
Student
prepared(Student,Topic) read(Student,Book),
discusses(Book,Topic).
P(AB,C)
- Any algorithm which
- has an empty output if and only if the input is
empty - combines a set of CPDs into a single (combined)
CPD - E.g. noisy-or, regression, ...
23Aggregates
- Map multisets of values to summary values (e.g.,
sum, average, max, cardinality)
24Aggregates
- Map multisets of values to summary values (e.g.,
sum, average, max, cardinality)
grade_avg/1
Deterministic
25Summary Model-Theoretic
Underlying logic pogram
If the body holds then the head holds, too.
Consequence operator
Conditional independencies encoded in the
induced BN structure
Local probability models
(macro) CPDs
noisy-or, ...
CRs
Joint probability distribution over the least
Herbrand interpretation
26Stochastic Relational Models (SRMs)
- Type I, i.e., frequencies in databases
- Probability that a select-join query succeeds
- Independently sample tuples ri from Ri select as
values for Ai the values r.Ai
27Stochastic Relational Models (SRMs)
WHO Mortality Database
country.name
death.cause
pers.sex
pers.jCountry
pers.dyear
pers.jDeath
pers.dage
query(persdage75-79y,deathcausek)
0.012 query(persdage85-89y,deathcausek)
0.0012 query(persdage75-79y,deathcauser)
0.02 query(persdage85-89y,deathcauser)
0.114
query(persdage1-4y) 0.00201 query(persdage
25-29y) 7.110-5 query(persdage75-79y)
0.12 query(persdage85-89y) 0.176
28Learning Tasks
Learning Algorithm
Database
Model
- Parameter Estimation
- Numerical Optimization Problem
- Model Selection
- Combinatorical Search
29Differences between SL and PLL ?
- Representation (cf. above)
- Structure on the search space becomes more
complex - operators for traversing the space
-
- Algorithms remain essentially the same
30What is the data about? Model Theoretic
E
Earthquake
Burglary
Alarm
JohnCalls
MaryCalls
Model(1) earthquakeyes, burglaryno, alarm?, mar
ycallsyes, johncallsno
Model(3) earthquake?, burglary?, alarmyes, mary
callsyes, johncallsyes
Model(2) earthquakeno, burglaryno, alarmno, mar
ycallsno, johncallsno
31What is the data about? Model Theoretic
- Data case
- Random Variable States (partial) Herbrand
interpretation - Akin to learning from interpretations in ILP
Background m(ann,dorothy), f(brian,dorothy), m(cec
ily,fred), f(henry,fred), f(fred,bob), m(kim,bob),
...
Model(2) bt(cecily)ab, pc(henry)a, mc(fred)?, b
t(kim)a, pc(bob)b
Model(1) pc(brian)b, bt(ann)a, bt(brian)?, bt(d
orothy)a
Model(3) pc(rex)b, bt(doro)a, bt(brian)?
Bloodtype example
32Parameter Estimation Model Theoretic
Database D
Learning Algorithm
Parameter Q
Underlying Logic program L
33Parameter Estimation Model Theoretic
- Estimate the CPD q entries that best fit the data
- Best fit ML parameters q
- q argmaxq P( data logic program,
q) - argmaxq log P( data logic
program, q) - Reduces to problem to estimate parameters of a
Bayesian networks - given structure,
- partially observed random varianbles
34Parameter Estimation Model Theoretic
35Excourse Decomposable CRs
E
- Parameters of the clauses and not of the support
network.
Multiple ground instance of the same clause
Deterministic CPD for Combining Rule
36Parameter Estimation Model Theoretic
37Parameter Estimation Model Theoretic
Parameter tighting
38EM Model Theoretic
EM-algorithm iterate until convergence
Logic Program L
Expectation
Initial Parameters q0
Current Model (M,qk)
Expected counts of a clause
Maximization
Update parameters (ML, MAP)
39Model Selection Model Theoretic
Database
Learning Algorithm
Language Bayesian bt/1, pc/1,mc/1 Background
Knowledge Logical mother/2, father/2
40Model Selection Model Theoretic
- Combination of ILP and BN learning
- Combinatorical search for hypo M s.t.
- M logically covers the data D
- M is optimal w.r.t. some scoring function score,
i.e., M argmaxM score(M,D). - Highlights
- Refinement operators
- Background knowledge
- Language biase
- Search bias
41Refinement Operators
- Add a fact, delete a fact or refine an existing
clause - Specialization
- Add atom
- apply a substitution X / Y where X,Y already
appear in atom - apply a substitution X / f(Y1, , Yn) where
Yi new variables - apply a substitution X / c where c is a
constant - Generalization
- delete atom
- turn term into variable
- p(a,f(b)) becomes p(X,f(b)) or p(a,f(X))
- p(a,a) becomes p(X,X) or p(a,X) or p(X,a)
- replace two occurences of variable X into X1 and
X2 - p(X,X) becomes p(X1,X2)
42Example
43Example
44Example
45Example
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
46Example
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
47Example
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
48Example
E
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
...
49Bias
- Many clauses can be eliminated a priori
- Due to type structure of clauses
- e.g. atom(compound,atom, charge),
- bond(compound,atom,atom,bondtype)
- active(compound)
- eliminate e.g.
- active(C) - atom(X,C,5)
- not conform to type structure
50Bias - continued
- or to modes of predicates determines calling
pattern in queries - input - output
- mode(atom(,-,-))
- mode(bond(,,-,-))
- all variables in head are (input)
- active(C) - bond(C,A1,A2,T) not mode conform
- because A1 does not exist in left part of clause
and argument declared - active(C) - atom(C,A,P), bond(C,A,A2,double)
mode conform.
51Conclusions on Learning
- Algorithms remain essentially the same
- Not single edges but bunches of edges are
modified - Structure on the search space becomes more
complex
Refinement Operators
Scores
Statistical Learning
Inductive Logic Programming/ Multi-relational
Data Mining
Independency
Bias
Priors
Background Knowledge
52Overview
- Introduction to PLL
- Foundations of PLL
- Logic Programming, Bayesian Networks, Hidden
Markov Models, Stochastic Grammars - Frameworks of PLL
- Independent Choice Logic,Stochastic Logic
Programs, PRISM, - Bayesian Logic Programs, Probabilistic Logic
Programs,Probabilistic Relational Models - Logical Hidden Markov Models
- Applications
53Logical (Hidden) Markov Models
Each state is trained independently No sharing of
experience, large state space
54Logical (Hidden) Markov Models
(0.7) dept(D) -gt course(D,C). (0.2)
dept(D) -gt lecturer(D,L).
... (0.3) course(D,C) -gt lecturer(D,L). (0.3)
course(D,C) -gt dept(D). (0.3) course(D,C) -gt
course(D,C). ... (0.1)
lecturer(D,L) -gt course(D,C).
...
Abstract states
55Logical (Hidden) Markov Models
- So far, only transitions between abstract states
- Needed possible transitions and their
probabilities for any ground state
lecturer(D,L)
Possible instantiations for each arguments
cs,math,bio,... x luc, wolfram, ...
Chance of instations
P(lecturer(cs,luc))
56Logical (Hidden) Markov Models
RMMs Anderson et al. 03 Probability
Estimation Trees
lecturer(D,L)
LOHMMs Kersting et al. 03 Naive Bayes
P(D)
P(L)
57What is the data about? Intermediate
E
- Data case
- (partial) traces (or derivations)
- Akin to Shapiros algorithmic program debuging
Trace(1) dept(cs),course(cs,dm),lecturer(pedro,cs)
,...
Trace(2) dept(bio), course(bio,genetics),lecturer(
mendel,bio), ...
Trace(3) dept(cs),course(cs,stats),dept(cs),couse(
cs,ml), ...
58What is the data about? Proof Theoretic
E
1.0 S ? NP, VP 1/3 NP ? i 1/3 NP ? Det,
N 1/3 NP ? NP, PP 1.0 Det ? the 0.5 N
? man 0.5 N ? telescope 0.5 VP ? V, NP
0.5 VP ? VP, PP 1.0 PP ? P, NP 1.0 V ?
saw 1.0 P ? with
Example(1) s(I, saw, the, man,).
Example(2) s(the, man, saw, the man,).
Example(3) s(I, saw, the, man, with, the
telescope,).
definite clause grammar
59What is the data about? Proof Theoretic
E
- Data case
- ground fact (or even clauses)
- Akin to learning from entailment (ILP)
Background m(ann,dorothy), f(brian,dorothy), m(cec
ily,fred), f(henry,fred), f(fred,bob), m(kim,bob),
...
Example(3) bt(brian)ab
Example(1) bt(ann)a.
Example(2) bt(fred)ab
Example(5) mc(dorothy)b
Example(4) pc(brian)a
Bloodtype example