Learning the Structure of Markov Logic Networks - PowerPoint PPT Presentation

1 / 60

About This Presentation

Title:

Learning the Structure of Markov Logic Networks

Description:

Many SRL approaches combine a logical language and Bayesian networks ... FO: FOIL. AL: Aleph. 48. Systems. MLN(SLB) MLN(SLS) KB. CL. FO. AL. MLN(KB) MLN(CL) MLN(FO) ... – PowerPoint PPT presentation

Number of Views:154

Avg rating:3.0/5.0

Slides: 61

Provided by: csWash

Category:

more less

Transcript and Presenter's Notes

Title: Learning the Structure of Markov Logic Networks

1
Learning the Structure of Markov Logic Networks

Stanley Kok Pedro DomingosDept. of Computer
Science and Eng.University of Washington
2
Overview

Motivation
Background
Structure Learning Algorithm
Experiments
Future Work Conclusion

3
Motivation

Statistical Relational Learning (SRL)
combines the benefits of
Statistical Learning uses probability to handle
uncertainty in a robust and principled way
Relational Learning models domains with multiple
relations

4
Motivation

Many SRL approaches combine a logical language
and Bayesian networks
e.g. Probabilistic Relational Models
Friedman et al., 1999
The need to avoid cycles in Bayesian networks
causes many difficulties Taskar et al., 2002
Started using Markov networks instead

5
Motivation

Relational Markov Networks Taskar et al., 2002
conjunctive database queries Markov networks
Require space exponential in the size of the
cliques
Markov Logic Networks Richardson Domingos,
2004
First-order logic Markov networks
Compactly represent large cliques
Did not learn structure (used external ILP system)

6
Motivation

Relational Markov Networks Taskar et al., 2002
conjunctive database queries Markov networks
Require space exponential in the size of the
cliques
Markov Logic Networks Richardson Domingos,
2004
First-order logic Markov networks
Compactly represent large cliques
Did not learn structure (used external ILP
system)
This paper develops a fast algorithm that
learns MLN structure
Most powerful SRL learner to date

7
Overview

Motivation
Background
Structure Learning Algorithm
Experiments
Future Work Conclusion

8
Markov Logic Networks

First-order KB set of hard constraints
Violate one formula, a world has zero probability
MLNs soften constraints
OK to violate formulas
The fewer formulas a world violates,
the more probable it is
Gives each formula a weight,
reflects how strong a constraint it is

9
MLN Definition

A Markov Logic Network (MLN) is a set of pairs
(F, w) where
F is a formula in first-order logic
w is a real number
Together with a finite set of constants,it
defines a Markov network with
One node for each grounding of each predicate
in the MLN
One feature for each grounding of each formula F
in the MLN, with the corresponding weight w

10
Ground Markov Network
AdvisedBy(S,P) ) Student(S) Professor(P)
2.7
constants STAN, PEDRO
AdvisedBy(STAN,STAN)
Student(STAN)
Professor(STAN)
AdvisedBy(STAN,PEDRO)
AdvisedBy(PEDRO,STAN)
Professor(PEDRO)
Student(PEDRO)
AdvisedBy(PEDRO,PEDRO)
11
MLN Model
12
MLN Model
Vector of value assignments to ground predicates
13
MLN Model
Vector of value assignments to ground predicates
Partition function. Sums over all possible value
assignments to ground predicates
14
MLN Model
Vector of value assignments to ground predicates
Weight of ith formula
Partition function. Sums over all possible value
assignments to ground predicates
15
MLN Model
Vector of value assignments to ground predicates
Weight of ith formula
Partition function. Sums over all possible value
assignments to ground predicates
of true groundings of ith formula
16
MLN Weight Learning

Likelihood is concave function of weights
Quasi-Newton methods to find optimal weights
e.g. L-BFGS Liu Nocedal, 1989

17
MLN Weight Learning

Likelihood is concave function of weights
Quasi-Newton methods to find optimal weights
e.g. L-BFGS Liu Nocedal, 1989

SLOW P-complete
18
MLN Weight Learning

Likelihood is concave function of weights
Quasi-Newton methods to find optimal weights
e.g. L-BFGS Liu Nocedal, 1989

SLOW P-complete
SLOW P-complete
19
MLN Weight Learning

RD used pseudo-likelihood Besag, 1975

20
MLN Weight Learning

RD used pseudo-likelihood Besag, 1975

21
MLN Structure Learning

RD learned MLN structure in two disjoint
steps
Learn first-order clauses with an off-the-shelf
ILP system (CLAUDIEN De Raedt Dehaspe, 1997)
Learn clause weights by optimizing
pseudo-likelihood
Unlikely to give best results because CLAUDIEN
find clauses that hold with some
accuracy/frequency
in the data
dont find clauses that maximize datas
(pseudo-)likelihood

22
Overview

Motivation
Background
Structure Learning Algorithm
Experiments
Future Work Conclusion

23
MLN Structure Learning

This paper develops an algorithm that
Learns first-order clauses by directly optimizing
pseudo-likelihood
Is fast enough
Performs better than RD, pure ILP,
purely KB and purely probabilistic approaches

24
Structure Learning Algorithm

High-level algorithm
REPEAT
MLN Ã MLN FindBestClauses(MLN)
UNTIL FindBestClauses(MLN) returns NULL
FindBestClauses(MLN)
Create candidate clauses
FOR EACH candidate clause c
Compute increase in evaluation measure
of adding c to MLN
RETURN k clauses with greatest increase

25
Structure Learning

Evaluation measure
Clause construction operators
Search strategies
Speedup techniques

26
Evaluation Measure

RD used pseudo-log-likelihood
This gives undue weight to predicates with large
of groundings

27
Evaluation Measure

Weighted pseudo-log-likelihood (WPLL)
Gaussian weight prior
Structure prior

28
Evaluation Measure

Weighted pseudo-log-likelihood (WPLL)
Gaussian weight prior
Structure prior

weight given to predicate r
29
Evaluation Measure

Weighted pseudo-log-likelihood (WPLL)
Gaussian weight prior
Structure prior

weight given to predicate r
sums over groundings of predicate r
30
Evaluation Measure

Weighted pseudo-log-likelihood (WPLL)
Gaussian weight prior
Structure prior

CLL conditional log-likelihood
weight given to predicate r
sums over groundings of predicate r
31
Clause Construction Operators

Add a literal (negative/positive)
Remove a literal
Flip signs of literals
Limit of distinct variables to restrict search
space

32
Beam Search

Same as that used in ILP rule induction
Repeatedly find the single best clause

33
Shortest-First Search (SFS)

Start from empty or hand-coded MLN
FOR L Ã 1 TO MAX_LENGTH
Apply each literal addition deletion to
each clause to create clauses of length L
Repeatedly add K best clauses of length L
to the MLN until no clause of length L
improves WPLL
Similar to Della Pietra et al. (1997),
McCallum (2003)

34
Speedup Techniques

FindBestClauses(MLN)
Creates candidate clauses
FOR EACH candidate clause c
Compute increase in WPLL (using L-BFGS)
of adding c to MLN
RETURN k clauses with greatest increase

35
Speedup Techniques

FindBestClauses(MLN)
Creates candidate clauses
FOR EACH candidate clause c
Compute increase in WPLL (using L-BFGS)
of adding c to MLN
RETURN k clauses with greatest increase

SLOW Many candidates
36
Speedup Techniques

FindBestClauses(MLN)
Creates candidate clauses
FOR EACH candidate clause c
Compute increase in WPLL (using L-BFGS)
of adding c to MLN
RETURN k clauses with greatest increase

SLOW Many candidates
SLOW Many CLLs
SLOW Each CLL involves a P-complete problem
37
Speedup Techniques

FindBestClauses(MLN)
Creates candidate clauses
FOR EACH candidate clause c
Compute increase in WPLL (using L-BFGS)
of adding c to MLN
RETURN k clauses with greatest increase

NOT THAT FAST
SLOW Many candidates
SLOW Many CLLs
SLOW Each CLL involves a P-complete problem
38
Speedup Techniques

Clause Sampling
Predicate Sampling
Avoid Redundancy
Loose Convergence Thresholds
Ignore Unrelated Clauses
Weight Thresholding

39
Speedup Techniques

Clause Sampling
Predicate Sampling
Avoid Redundancy
Loose Convergence Thresholds
Ignore Unrelated Clauses
Weight Thresholding

40
Speedup Techniques

Clause Sampling
Predicate Sampling
Avoid Redundancy
Loose Convergence Thresholds
Ignore Unrelated Clauses
Weight Thresholding

41
Speedup Techniques

Clause Sampling
Predicate Sampling
Avoid Redundancy
Loose Convergence Thresholds
Ignore Unrelated Clauses
Weight Thresholding

42
Speedup Techniques

Clause Sampling
Predicate Sampling
Avoid Redundancy
Loose Convergence Thresholds
Ignore Unrelated Clauses
Weight Thresholding

43
Speedup Techniques

Clause Sampling
Predicate Sampling
Avoid Redundancy
Loose Convergence Thresholds
Ignore Unrelated Clauses
Weight Thresholding

44
Overview

Motivation
Background
Structure Learning Algorithm
Experiments
Future Work Conclusion

45
Experiments

UW-CSE domain
22 predicates, e.g., AdvisedBy(X,Y), Student(X),
etc.
10 types, e.g., Person, Course, Quarter, etc.
ground predicates ¼ 4 million
true ground predicates ¼ 3000
Handcrafted KB with 94 formulas
Each student has at most one advisor
If a student is an author of a paper, so is her
advisor
Cora domain
Computer science research papers
Collective deduplication of author, venue, title

46
Systems

MLN(SLB) structure learning with beam search
MLN(SLS) structure learning with SFS

47
Systems

MLN(SLB)
MLN(SLS)

KB hand-coded KB CL CLAUDIEN FO FOIL AL Aleph
48
Systems

MLN(SLB)
MLN(SLS)

KB CL FO AL
MLN(KB) MLN(CL) MLN(FO) MLN(AL)
49
Systems

MLN(SLB)
MLN(SLS)

NB Naïve Bayes BN Bayesian networks
KB CL FO AL
MLN(KB) MLN(CL) MLN(FO) MLN(AL)
50
Methodology

UW-CSE domain
DB divided into 5 areas
AI, Graphics, Languages, Systems, Theory
Leave-one-out testing by area
Measured
average CLL of the ground predicates
average area under the precision-recall curve of
the ground predicates (AUC)

51
UW-CSE
CLL
MLN(SLB)
MLN(SLS)
MLN(CL)
MLN(FO)
MLN(AL)
MLN(KB)
CL
FO
AL
KB
AUC
MLN(SLS)
MLN(SLB)
MLN(KB)
MLN(CL)
MLN(AL)
MLN(FO)
CL
FO
AL
KB
52
UW-CSE
CLL
MLN(SLB)
MLN(SLS)
MLN(CL)
MLN(FO)
MLN(AL)
MLN(KB)
CL
FO
AL
KB
AUC
MLN(SLS)
MLN(SLB)
MLN(KB)
MLN(CL)
MLN(AL)
MLN(FO)
CL
FO
AL
KB
53
UW-CSE
CLL
MLN(SLB)
MLN(SLS)
MLN(CL)
MLN(FO)
MLN(AL)
MLN(KB)
CL
FO
AL
KB
AUC
MLN(SLS)
MLN(SLB)
MLN(KB)
MLN(CL)
MLN(AL)
MLN(FO)
CL
FO
AL
KB
54
UW-CSE
CLL
MLN(SLB)
MLN(SLS)
MLN(CL)
MLN(FO)
MLN(AL)
MLN(KB)
CL
FO
AL
KB
AUC
MLN(SLS)
MLN(SLB)
MLN(KB)
MLN(CL)
MLN(AL)
MLN(FO)
CL
FO
AL
KB
55
UW-CSE
CLL
MLN(SLS)
MLN(SLB)
NB
BN
AUC
MLN(SLS)
MLN(SLB)
NB
BN
56
Timing

MLN(SLS) on UW-CSE
Cluster of 15 dual-CPUs 2.8 GHz Pentium 4
machines
Without speedups did not finish in 24 hrs
With speedups 5.3 hrs

57
Lesion Study

Disable one speedup technique at a time SFS

UW-CSE (one-fold)
Hour
all speedups
no clause sampling
no weight thresholding
no predicate sampling
dont avoid redundancy
no loose converg. threshold
58
Overview

Motivation
Background
Structure Learning Algorithm
Experiments
Future Work Conclusion

59
Future Work

Speed up counting of true groundings of clause
Probabilistically bound the loss in accuracy due
to subsampling
Probabilistic predicate discovery

60
Conclusion

Markov logic networks a powerful combination
of first-order logic and probability
Richardson Domingos (2004) did not learn
MLN structure
We develop an algorithm that automatically learns
both first-order clauses and their weights
We develop speedup techniques to make our
algorithm fast enough to be practical
We show experimentally that our algorithm
outperforms
Richardson Domingos
Pure ILP
Purely KB approaches
Purely probabilistic approaches
(For software, email koks_at_cs.washington.edu)