Machine Learning For the Web: A Unified View

About This Presentation

Title:

Machine Learning For the Web: A Unified View

Description:

Includes joint work with Stanley Kok, Daniel Lowd, Hoifung Poon, ... The lifted network construction algo. finds it. BP on lifted network gives same result as ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 74

Provided by: pedr47

Category:

more less

Transcript and Presenter's Notes

Title: Machine Learning For the Web: A Unified View

1
Machine LearningFor the WebA Unified View

Pedro Domingos
Dept. of Computer Science Eng.
University of Washington
Includes joint work with Stanley Kok, Daniel
Lowd,Hoifung Poon, Matt Richardson, Parag
Singla,Marc Sumner, and Jue Wang

2
Overview

Motivation
Background
Markov logic
Inference
Learning
Software
Applications
Discussion

3
Web Learning Problems

Hypertext classification
Search ranking
Personalization
Recommender systems
Wrapper induction
Information extraction
Information integration
Deep Web
Semantic Web

Ad placement
Content selection
Auctions
Social networks
Mass collaboration
Spam filtering
Reputation systems
Performance optimization
Etc.

4
Machine Learning Solutions

Naïve Bayes
Logistic regression
Max. entropy models
Bayesian networks
Markov random fields
Log-linear models
Exponential models
Gibbs distributions
Boltzmann machines
ERGMs

Hidden Markov models
Cond. random fields
SVMs
Neural networks
Decision trees
K-nearest neighbor
K-means clustering
Mixture models
LSI
Etc.

5
How Do We Make Sense of This?

Does a practitioner have to learn all the
algorithms?
And figure out which one to use each time?
And which variations to try?
And how to frame the problem as ML?
And how to incorporate his/her knowledge?
And how to glue the pieces together?
And start from scratch each time?
There must be a better way

6
Characteristics of Web Problems

Samples are not i.i.d.(objects depend on each
other)
Objects have lots of structure (or none at all)
Multiple problems are tied together
Massive amounts of data (but unlabeled)
Rapid change
Too many opportunities . . .and not enough
experts

7
We Need a Language

That allows us to easily define standard models
That provides a common framework
That is automatically compiled into learning and
inference code that executes efficiently
That makes it easy to encode practitioners
knowledge
That allows models to be composedand reused

8
Markov Logic

Syntax Weighted first-order formulas
Semantics Templates for Markov nets
Inference Lifted belief propagation, etc.
Learning Voted perceptron, pseudo-likelihood,
inductive logic programming
Software Alchemy
Applications Information extraction,text
mining, social networks, etc.

9
Overview

Motivation
Background
Markov logic
Inference
Learning
Software
Applications
Discussion

10
Markov Networks

Undirected graphical models

Cancer
Smoking
Cough
Asthma

Potential functions defined over cliques

Smoking Cancer ?(S,C)
False False 4.5
False True 4.5
True False 2.7
True True 4.5
11
Markov Networks

Undirected graphical models

Cancer
Smoking
Cough
Asthma

Log-linear model

Weight of Feature i
Feature i
12
First-Order Logic

Symbols Constants, variables, functions,
predicatesE.g. Anna, x, MotherOf(x), Friends(x,
y)
Logical connectives Conjunction, disjunction,
negation, implication, quantification, etc.
Grounding Replace all variables by
constantsE.g. Friends (Anna, Bob)
World Assignment of truth values to all ground
atoms

13
Example Friends Smokers
14
Example Friends Smokers
15
Example Friends Smokers
16
Overview

Motivation
Background
Markov logic
Inference
Learning
Software
Applications
Discussion

17
Markov Logic

A logical KB is a set of hard constraintson the
set of possible worlds
Lets make them soft constraintsWhen a world
violates a formula,It becomes less probable, not
impossible
Give each formula a weight(Higher weight ?
Stronger constraint)

18
Definition

A Markov Logic Network (MLN) is a set of pairs
(F, w) where
F is a formula in first-order logic
w is a real number
Together with a set of constants,it defines a
Markov network with
One node for each grounding of each predicate in
the MLN
One feature for each grounding of each formula F
in the MLN, with the corresponding weight w

19
Example Friends Smokers
20
Example Friends Smokers
Two constants Anna (A) and Bob (B)
21
Example Friends Smokers
Two constants Anna (A) and Bob (B)
Smokes(A)
Smokes(B)
Cancer(A)
Cancer(B)
22
Example Friends Smokers
Two constants Anna (A) and Bob (B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
23
Example Friends Smokers
Two constants Anna (A) and Bob (B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
24
Example Friends Smokers
Two constants Anna (A) and Bob (B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
25
Markov Logic Networks

MLN is template for ground Markov nets
Probability of a world x
Typed variables and constants greatly reduce size
of ground Markov net
Functions, existential quantifiers, etc.
Infinite and continuous domains

Weight of formula i
No. of true groundings of formula i in x
26
Relation to Statistical Models

Special cases
Markov networks
Markov random fields
Bayesian networks
Log-linear models
Exponential models
Max. entropy models
Gibbs distributions
Boltzmann machines
Logistic regression
Hidden Markov models
Conditional random fields

Markov logic allows objects to be interdependent
(non-i.i.d.)
Markov logic makes it easy to combine and reuse
these models

27
Relation to First-Order Logic

Infinite weights ? First-order logic
Satisfiable KB, positive weights ? Satisfying
assignments Modes of distribution
Markov logic allows contradictions between
formulas

28
Overview

Motivation
Background
Markov logic
Inference
Learning
Software
Applications
Discussion

29
Inference

MAP/MPE state
MaxWalkSAT
LazySAT
Marginal and conditional probabilities
MCMC Gibbs, MC-SAT, etc.
Knowledge-based model construction
Lifted belief propagation

30
Inference

MAP/MPE state
MaxWalkSAT
LazySAT
Marginal and conditional probabilities
MCMC Gibbs, MC-SAT, etc.
Knowledge-based model construction
Lifted belief propagation

31
Lifted Inference

We can do inference in first-order logic without
grounding the KB (e.g. resolution)
Lets do the same for inference in MLNs
Group atoms and clauses into indistinguishable
sets
Do inference over those
First approach Lifted variable elimination(not
practical)
Here Lifted belief propagation

32
Belief Propagation
Features (f)
Nodes (x)
33
Lifted Belief Propagation
Features (f)
Nodes (x)
34
Lifted Belief Propagation
Features (f)
Nodes (x)
35
Lifted Belief Propagation
?,? Functions of edge counts
?
?
Features (f)
Nodes (x)
36
Lifted Belief Propagation

Form lifted network composed of supernodesand
superfeatures
Supernode Set of ground atoms that all send
andreceive same messages throughout BP
Superfeature Set of ground clauses that all send
and receive same messages throughout BP
Run belief propagation on lifted network
Guaranteed to produce same results as ground BP
Time and memory savings can be huge

37
Forming the Lifted Network

1. Form initial supernodesOne per predicate and
truth value(true, false, unknown)
2. Form superfeatures by doing joins of their
supernodes
3. Form supernodes by projectingsuperfeatures
down to their predicatesSupernode Groundings
of a predicate with same number of projections
from each superfeature
4. Repeat until convergence

38
Theorem

There exists a unique minimal lifted network
The lifted network construction algo. finds it
BP on lifted network gives same result ason
ground network

39
Representing SupernodesAnd Superfeatures

List of tuples Simple but inefficient
Resolution-like Use equality and inequality
Form clusters (in progress)

40
Overview

Motivation
Background
Markov logic
Inference
Learning
Software
Applications
Discussion

41
Learning

Data is a relational database
Closed world assumption (if not EM)
Learning parameters (weights)
Generatively
Discriminatively
Learning structure (formulas)

42
Generative Weight Learning

Maximize likelihood
Use gradient ascent or L-BFGS
No local maxima
Requires inference at each step (slow!)

No. of true groundings of clause i in data
Expected no. true groundings according to model
43
Pseudo-Likelihood

Likelihood of each variable given its neighbors
in the data Besag, 1975
Does not require inference at each step
Consistent estimator
Widely used in vision, spatial statistics, etc.
But PL parameters may not work well forlong
inference chains

44
Discriminative Weight Learning

Maximize conditional likelihood of query (y)
given evidence (x)
Approximate expected counts by counts in MAP
state of y given x

No. of true groundings of clause i in data
Expected no. true groundings according to model
45
Voted Perceptron

Originally proposed for training HMMs
discriminatively Collins, 2002
Assumes network is linear chain

wi ? 0 for t ? 1 to T do yMAP ? Viterbi(x)
wi ? wi ? counti(yData) counti(yMAP) return
?t wi / T
46
Voted Perceptron for MLNs

HMMs are special case of MLNs
Replace Viterbi by MaxWalkSAT
Network can now be arbitrary graph

wi ? 0 for t ? 1 to T do yMAP ?
MaxWalkSAT(x) wi ? wi ? counti(yData)
counti(yMAP) return ?t wi / T
47
Structure Learning

Generalizes feature induction in Markov nets
Any inductive logic programming approach can be
used, but . . .
Goal is to induce any clauses, not just Horn
Evaluation function should be likelihood
Requires learning weights for each candidate
Turns out not to be bottleneck
Bottleneck is counting clause groundings
Solution Subsampling

48
Structure Learning

Initial state Unit clauses or hand-coded KB
Operators Add/remove literal, flip sign
Evaluation function Pseudo-likelihood
Structure prior
Search
Beam Kok Domingos, 2005
Shortest-first Kok Domingos, 2005
Bottom-up Mihalkova Mooney, 2007

49
Overview

Motivation
Background
Markov logic
Inference
Learning
Software
Applications
Discussion

50
Alchemy

Open-source software including
Full first-order logic syntax
MAP and marginal/conditional inference
Generative discriminative weight learning
Structure learning
Programming language features

alchemy.cs.washington.edu
51
Overview

Motivation
Background
Markov logic
Inference
Learning
Software
Applications
Discussion

52
Applications

Information extraction
Entity resolution
Link prediction
Collective classification
Web mining
Natural language processing

Social network analysis
Ontology refinement
Activity recognition
Intelligent assistants
Etc.

53
Information Extraction
Parag Singla and Pedro Domingos,
Memory-Efficient Inference in Relational
Domains (AAAI-06). Singla, P., Domingos, P.
(2006). Memory-efficent inference in relatonal
domains. In Proceedings of the Twenty-First
National Conference on Artificial
Intelligence (pp. 500-505). Boston, MA AAAI
Press. H. Poon P. Domingos, Sound and
Efficient Inference with Probabilistic and
Deterministic Dependencies, in Proc. AAAI-06,
Boston, MA, 2006. P. Hoifung (2006). Efficent
inference. In Proceedings of the Twenty-First
National Conference on Artificial Intelligence.
54
Segmentation
Author
Title
Venue
Parag Singla and Pedro Domingos,
Memory-Efficient Inference in Relational
Domains (AAAI-06). Singla, P., Domingos, P.
(2006). Memory-efficent inference in relatonal
domains. In Proceedings of the Twenty-First
National Conference on Artificial
Intelligence (pp. 500-505). Boston, MA AAAI
Press. H. Poon P. Domingos, Sound and
Efficient Inference with Probabilistic and
Deterministic Dependencies, in Proc. AAAI-06,
Boston, MA, 2006. P. Hoifung (2006). Efficent
inference. In Proceedings of the Twenty-First
National Conference on Artificial Intelligence.
55
Entity Resolution
Parag Singla and Pedro Domingos,
Memory-Efficient Inference in Relational
Domains (AAAI-06). Singla, P., Domingos, P.
(2006). Memory-efficent inference in relatonal
domains. In Proceedings of the Twenty-First
National Conference on Artificial
Intelligence (pp. 500-505). Boston, MA AAAI
Press. H. Poon P. Domingos, Sound and
Efficient Inference with Probabilistic and
Deterministic Dependencies, in Proc. AAAI-06,
Boston, MA, 2006. P. Hoifung (2006). Efficent
inference. In Proceedings of the Twenty-First
National Conference on Artificial Intelligence.
56
Entity Resolution
Parag Singla and Pedro Domingos,
Memory-Efficient Inference in Relational
Domains (AAAI-06). Singla, P., Domingos, P.
(2006). Memory-efficent inference in relatonal
domains. In Proceedings of the Twenty-First
National Conference on Artificial
Intelligence (pp. 500-505). Boston, MA AAAI
Press. H. Poon P. Domingos, Sound and
Efficient Inference with Probabilistic and
Deterministic Dependencies, in Proc. AAAI-06,
Boston, MA, 2006. P. Hoifung (2006). Efficent
inference. In Proceedings of the Twenty-First
National Conference on Artificial Intelligence.
57
State of the Art

Segmentation
HMM (or CRF) to assign each token to a field
Entity resolution
Logistic regression to predict same
field/citation
Transitive closure
Alchemy implementation Seven formulas

58
Types and Predicates
token Parag, Singla, and, Pedro, ... field
Author, Title, Venue citation C1, C2,
... position 0, 1, 2, ... Token(token,
position, citation) InField(position, field,
citation) SameField(field, citation,
citation) SameCit(citation, citation)
59
Types and Predicates
token Parag, Singla, and, Pedro, ... field
Author, Title, Venue, ... citation C1, C2,
... position 0, 1, 2, ... Token(token,
position, citation) InField(position, field,
citation) SameField(field, citation,
citation) SameCit(citation, citation)
Optional
60
Types and Predicates
token Parag, Singla, and, Pedro, ... field
Author, Title, Venue citation C1, C2,
... position 0, 1, 2, ... Token(token,
position, citation) InField(position, field,
citation) SameField(field, citation,
citation) SameCit(citation, citation)
Evidence
61
Types and Predicates
token Parag, Singla, and, Pedro, ... field
Author, Title, Venue citation C1, C2,
... position 0, 1, 2, ... Token(token,
position, citation) InField(position, field,
citation) SameField(field, citation,
citation) SameCit(citation, citation)
Query
62
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
63
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
64
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
65
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
66
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
67
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
68
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
69
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
!Token(.,i,c) ltgt InField(i1,f,c) f ! f
gt (!InField(i,f,c) v !InField(i,f,c)) Token(
t,i,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
70
Results Segmentation on Cora
71
ResultsMatching Venues on Cora
72
Overview

Motivation
Background
Markov logic
Inference
Learning
Software
Applications
Discussion

73
Conclusion

Web provides plethora of learning problems
Machine learning provides plethora of solutions
We need a unifying language
Markov logic Use weighted first-order logicto
define statistical models
Efficient inference and learning algorithms(but
Web scale still requires manual coding)
Many successful applications(e.g., information
extraction)
Open-source software / Web site Alchemy

alchemy.cs.washington.edu

Write a Comment

User Comments (0)