Learning the Structure of Markov Logic Networks - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Learning the Structure of Markov Logic Networks

Description:

Many SRL approaches combine a logical language and Bayesian networks ... FO: FOIL. AL: Aleph. 48. Systems. MLN(SLB) MLN(SLS) KB. CL. FO. AL. MLN(KB) MLN(CL) MLN(FO) ... – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 61
Provided by: csWash
Category:

less

Transcript and Presenter's Notes

Title: Learning the Structure of Markov Logic Networks


1
Learning the Structure of Markov Logic Networks

Stanley Kok Pedro DomingosDept. of Computer
Science and Eng.University of Washington
2
Overview
  • Motivation
  • Background
  • Structure Learning Algorithm
  • Experiments
  • Future Work Conclusion

3
Motivation
  • Statistical Relational Learning (SRL)
  • combines the benefits of
  • Statistical Learning uses probability to handle
    uncertainty in a robust and principled way
  • Relational Learning models domains with multiple
    relations

4
Motivation
  • Many SRL approaches combine a logical language
    and Bayesian networks
  • e.g. Probabilistic Relational Models
  • Friedman et al., 1999
  • The need to avoid cycles in Bayesian networks
    causes many difficulties Taskar et al., 2002
  • Started using Markov networks instead

5
Motivation
  • Relational Markov Networks Taskar et al., 2002
  • conjunctive database queries Markov networks
  • Require space exponential in the size of the
    cliques
  • Markov Logic Networks Richardson Domingos,
    2004
  • First-order logic Markov networks
  • Compactly represent large cliques
  • Did not learn structure (used external ILP system)

6
Motivation
  • Relational Markov Networks Taskar et al., 2002
  • conjunctive database queries Markov networks
  • Require space exponential in the size of the
    cliques
  • Markov Logic Networks Richardson Domingos,
    2004
  • First-order logic Markov networks
  • Compactly represent large cliques
  • Did not learn structure (used external ILP
    system)
  • This paper develops a fast algorithm that
  • learns MLN structure
  • Most powerful SRL learner to date

7
Overview
  • Motivation
  • Background
  • Structure Learning Algorithm
  • Experiments
  • Future Work Conclusion

8
Markov Logic Networks
  • First-order KB set of hard constraints
  • Violate one formula, a world has zero probability
  • MLNs soften constraints
  • OK to violate formulas
  • The fewer formulas a world violates,
  • the more probable it is
  • Gives each formula a weight,
  • reflects how strong a constraint it is

9
MLN Definition
  • A Markov Logic Network (MLN) is a set of pairs
    (F, w) where
  • F is a formula in first-order logic
  • w is a real number
  • Together with a finite set of constants,it
    defines a Markov network with
  • One node for each grounding of each predicate
  • in the MLN
  • One feature for each grounding of each formula F
    in the MLN, with the corresponding weight w

10
Ground Markov Network
AdvisedBy(S,P) ) Student(S) Professor(P)
2.7
constants STAN, PEDRO
AdvisedBy(STAN,STAN)
Student(STAN)
Professor(STAN)
AdvisedBy(STAN,PEDRO)
AdvisedBy(PEDRO,STAN)
Professor(PEDRO)
Student(PEDRO)
AdvisedBy(PEDRO,PEDRO)
11
MLN Model
12
MLN Model
Vector of value assignments to ground predicates
13
MLN Model
Vector of value assignments to ground predicates
Partition function. Sums over all possible value
assignments to ground predicates
14
MLN Model
Vector of value assignments to ground predicates
Weight of ith formula
Partition function. Sums over all possible value
assignments to ground predicates
15
MLN Model
Vector of value assignments to ground predicates
Weight of ith formula
Partition function. Sums over all possible value
assignments to ground predicates
of true groundings of ith formula
16
MLN Weight Learning
  • Likelihood is concave function of weights
  • Quasi-Newton methods to find optimal weights
  • e.g. L-BFGS Liu Nocedal, 1989

17
MLN Weight Learning
  • Likelihood is concave function of weights
  • Quasi-Newton methods to find optimal weights
  • e.g. L-BFGS Liu Nocedal, 1989

SLOW P-complete
18
MLN Weight Learning
  • Likelihood is concave function of weights
  • Quasi-Newton methods to find optimal weights
  • e.g. L-BFGS Liu Nocedal, 1989

SLOW P-complete
SLOW P-complete
19
MLN Weight Learning
  • RD used pseudo-likelihood Besag, 1975

20
MLN Weight Learning
  • RD used pseudo-likelihood Besag, 1975

21
MLN Structure Learning
  • RD learned MLN structure in two disjoint
    steps
  • Learn first-order clauses with an off-the-shelf
  • ILP system (CLAUDIEN De Raedt Dehaspe, 1997)
  • Learn clause weights by optimizing
  • pseudo-likelihood
  • Unlikely to give best results because CLAUDIEN
  • find clauses that hold with some
    accuracy/frequency
  • in the data
  • dont find clauses that maximize datas
  • (pseudo-)likelihood

22
Overview
  • Motivation
  • Background
  • Structure Learning Algorithm
  • Experiments
  • Future Work Conclusion

23
MLN Structure Learning
  • This paper develops an algorithm that
  • Learns first-order clauses by directly optimizing
    pseudo-likelihood
  • Is fast enough
  • Performs better than RD, pure ILP,
  • purely KB and purely probabilistic approaches

24
Structure Learning Algorithm
  • High-level algorithm
  • REPEAT
  • MLN Ã MLN FindBestClauses(MLN)
  • UNTIL FindBestClauses(MLN) returns NULL
  • FindBestClauses(MLN)
  • Create candidate clauses
  • FOR EACH candidate clause c
  • Compute increase in evaluation measure
  • of adding c to MLN
  • RETURN k clauses with greatest increase

25
Structure Learning
  • Evaluation measure
  • Clause construction operators
  • Search strategies
  • Speedup techniques

26
Evaluation Measure
  • RD used pseudo-log-likelihood
  • This gives undue weight to predicates with large
    of groundings

27
Evaluation Measure
  • Weighted pseudo-log-likelihood (WPLL)
  • Gaussian weight prior
  • Structure prior

28
Evaluation Measure
  • Weighted pseudo-log-likelihood (WPLL)
  • Gaussian weight prior
  • Structure prior

weight given to predicate r
29
Evaluation Measure
  • Weighted pseudo-log-likelihood (WPLL)
  • Gaussian weight prior
  • Structure prior

weight given to predicate r
sums over groundings of predicate r
30
Evaluation Measure
  • Weighted pseudo-log-likelihood (WPLL)
  • Gaussian weight prior
  • Structure prior

CLL conditional log-likelihood
weight given to predicate r
sums over groundings of predicate r
31
Clause Construction Operators
  • Add a literal (negative/positive)
  • Remove a literal
  • Flip signs of literals
  • Limit of distinct variables to restrict search
    space


32
Beam Search
  • Same as that used in ILP rule induction
  • Repeatedly find the single best clause


33
Shortest-First Search (SFS)
  • Start from empty or hand-coded MLN
  • FOR L Ã 1 TO MAX_LENGTH
  • Apply each literal addition deletion to
  • each clause to create clauses of length L
  • Repeatedly add K best clauses of length L
  • to the MLN until no clause of length L
  • improves WPLL
  • Similar to Della Pietra et al. (1997),
  • McCallum (2003)


34
Speedup Techniques
  • FindBestClauses(MLN)
  • Creates candidate clauses
  • FOR EACH candidate clause c
  • Compute increase in WPLL (using L-BFGS)
  • of adding c to MLN
  • RETURN k clauses with greatest increase

35
Speedup Techniques
  • FindBestClauses(MLN)
  • Creates candidate clauses
  • FOR EACH candidate clause c
  • Compute increase in WPLL (using L-BFGS)
  • of adding c to MLN
  • RETURN k clauses with greatest increase

SLOW Many candidates
36
Speedup Techniques
  • FindBestClauses(MLN)
  • Creates candidate clauses
  • FOR EACH candidate clause c
  • Compute increase in WPLL (using L-BFGS)
  • of adding c to MLN
  • RETURN k clauses with greatest increase

SLOW Many candidates
SLOW Many CLLs
SLOW Each CLL involves a P-complete problem
37
Speedup Techniques
  • FindBestClauses(MLN)
  • Creates candidate clauses
  • FOR EACH candidate clause c
  • Compute increase in WPLL (using L-BFGS)
  • of adding c to MLN
  • RETURN k clauses with greatest increase

NOT THAT FAST
SLOW Many candidates
SLOW Many CLLs
SLOW Each CLL involves a P-complete problem
38
Speedup Techniques
  • Clause Sampling
  • Predicate Sampling
  • Avoid Redundancy
  • Loose Convergence Thresholds
  • Ignore Unrelated Clauses
  • Weight Thresholding

39
Speedup Techniques
  • Clause Sampling
  • Predicate Sampling
  • Avoid Redundancy
  • Loose Convergence Thresholds
  • Ignore Unrelated Clauses
  • Weight Thresholding

40
Speedup Techniques
  • Clause Sampling
  • Predicate Sampling
  • Avoid Redundancy
  • Loose Convergence Thresholds
  • Ignore Unrelated Clauses
  • Weight Thresholding

41
Speedup Techniques
  • Clause Sampling
  • Predicate Sampling
  • Avoid Redundancy
  • Loose Convergence Thresholds
  • Ignore Unrelated Clauses
  • Weight Thresholding

42
Speedup Techniques
  • Clause Sampling
  • Predicate Sampling
  • Avoid Redundancy
  • Loose Convergence Thresholds
  • Ignore Unrelated Clauses
  • Weight Thresholding

43
Speedup Techniques
  • Clause Sampling
  • Predicate Sampling
  • Avoid Redundancy
  • Loose Convergence Thresholds
  • Ignore Unrelated Clauses
  • Weight Thresholding

44
Overview
  • Motivation
  • Background
  • Structure Learning Algorithm
  • Experiments
  • Future Work Conclusion

45
Experiments
  • UW-CSE domain
  • 22 predicates, e.g., AdvisedBy(X,Y), Student(X),
    etc.
  • 10 types, e.g., Person, Course, Quarter, etc.
  • ground predicates ¼ 4 million
  • true ground predicates ¼ 3000
  • Handcrafted KB with 94 formulas
  • Each student has at most one advisor
  • If a student is an author of a paper, so is her
    advisor
  • Cora domain
  • Computer science research papers
  • Collective deduplication of author, venue, title

46
Systems
  • MLN(SLB) structure learning with beam search
  • MLN(SLS) structure learning with SFS

47
Systems
  • MLN(SLB)
  • MLN(SLS)

KB hand-coded KB CL CLAUDIEN FO FOIL AL Aleph
48
Systems
  • MLN(SLB)
  • MLN(SLS)

KB CL FO AL
MLN(KB) MLN(CL) MLN(FO) MLN(AL)
49
Systems
  • MLN(SLB)
  • MLN(SLS)

NB Naïve Bayes BN Bayesian networks
KB CL FO AL
MLN(KB) MLN(CL) MLN(FO) MLN(AL)
50
Methodology
  • UW-CSE domain
  • DB divided into 5 areas
  • AI, Graphics, Languages, Systems, Theory
  • Leave-one-out testing by area
  • Measured
  • average CLL of the ground predicates
  • average area under the precision-recall curve of
    the ground predicates (AUC)

51
UW-CSE
CLL
MLN(SLB)
MLN(SLS)
MLN(CL)
MLN(FO)
MLN(AL)
MLN(KB)
CL
FO
AL
KB
AUC
MLN(SLS)
MLN(SLB)
MLN(KB)
MLN(CL)
MLN(AL)
MLN(FO)
CL
FO
AL
KB
52
UW-CSE
CLL
MLN(SLB)
MLN(SLS)
MLN(CL)
MLN(FO)
MLN(AL)
MLN(KB)
CL
FO
AL
KB
AUC
MLN(SLS)
MLN(SLB)
MLN(KB)
MLN(CL)
MLN(AL)
MLN(FO)
CL
FO
AL
KB
53
UW-CSE
CLL
MLN(SLB)
MLN(SLS)
MLN(CL)
MLN(FO)
MLN(AL)
MLN(KB)
CL
FO
AL
KB
AUC
MLN(SLS)
MLN(SLB)
MLN(KB)
MLN(CL)
MLN(AL)
MLN(FO)
CL
FO
AL
KB
54
UW-CSE
CLL
MLN(SLB)
MLN(SLS)
MLN(CL)
MLN(FO)
MLN(AL)
MLN(KB)
CL
FO
AL
KB
AUC
MLN(SLS)
MLN(SLB)
MLN(KB)
MLN(CL)
MLN(AL)
MLN(FO)
CL
FO
AL
KB
55
UW-CSE
CLL
MLN(SLS)
MLN(SLB)
NB
BN
AUC
MLN(SLS)
MLN(SLB)
NB
BN
56
Timing
  • MLN(SLS) on UW-CSE
  • Cluster of 15 dual-CPUs 2.8 GHz Pentium 4
    machines
  • Without speedups did not finish in 24 hrs
  • With speedups 5.3 hrs

57
Lesion Study
  • Disable one speedup technique at a time SFS

UW-CSE (one-fold)
Hour
all speedups
no clause sampling
no weight thresholding
no predicate sampling
dont avoid redundancy
no loose converg. threshold
58
Overview
  • Motivation
  • Background
  • Structure Learning Algorithm
  • Experiments
  • Future Work Conclusion

59
Future Work
  • Speed up counting of true groundings of clause
  • Probabilistically bound the loss in accuracy due
    to subsampling
  • Probabilistic predicate discovery

60
Conclusion
  • Markov logic networks a powerful combination
  • of first-order logic and probability
  • Richardson Domingos (2004) did not learn
  • MLN structure
  • We develop an algorithm that automatically learns
    both first-order clauses and their weights
  • We develop speedup techniques to make our
    algorithm fast enough to be practical
  • We show experimentally that our algorithm
    outperforms
  • Richardson Domingos
  • Pure ILP
  • Purely KB approaches
  • Purely probabilistic approaches
  • (For software, email koks_at_cs.washington.edu)
Write a Comment
User Comments (0)
About PowerShow.com