Conditional Random Fields A probabilistic graphical model Stefan Mutter

1 / 28

About This Presentation

Title:

Conditional Random Fields A probabilistic graphical model Stefan Mutter

Description:

Conditional Random Fields - A probabilistic graphical model Stefan Mutter ... T. Minka. Discriminative models, not discriminative training. ... –

Number of Views:457

Avg rating:3.0/5.0

Slides: 29

Provided by: stefan97

Category:

more less

Transcript and Presenter's Notes

Title: Conditional Random Fields A probabilistic graphical model Stefan Mutter

1
Conditional Random Fields - A probabilistic
graphical model

Stefan Mutter

2
Motivation
Bayesian Network
Naive Bayes
Logistic Regression
Linear Chain Conditional Random Field
Hidden Markov Model
Markov Random Field
General Conditional Random Field
3
Outline

different views on building a conditional random
field (CRF)
from directed to undirected graphical models
from generative to discriminative models
sequence models
from HMMs to CRFs
CRFs and maximum entropy markov models (MEMM)
parameter estimation / inference
applications

4
Overview directed graphical models
Bayesian Network
Naive Bayes
Logistic Regression
Linear Chain Conditional Random Field
Hidden Markov Model
Markov Random Field
General Conditional Random Field
5
Bayesian Networks directed graphical models

in general
a graphical model - family of probability
distributions that factorise according to an
underlying graph
one-to-one correspondence between
nodes and random variables
a set V of random variables consisting of a set X
of input variables and a set Y of output
variables to predict
independence assumption using topological
ordering
a node is v conditionally independent of its
predecessors given its direct parents p(v)
(Markov blanket)
direct probabilistic interpretation
family of distributions factorises into

6
Overview undirected graphical models
Bayesian Network
Naive Bayes
Logistic Regression
Linear Chain Conditional Random Field
Hidden Markov Model
Markov Random Field
General Conditional Random Field
7
Markov Random Field undirected graphical models

undirected graph for joint probability p(x)
allows no direct probabilistic interpretation
define potential functions ? on maximal cliques A
map joint assignment to non-negative real number
requires normalisation

?green
?red
8
Markov Random Fields and CRFs

A CRF is a Markov Random Field globally
conditioned on X
How do the potential functions ? look like?

9
Overview generative ? discriminative models
Bayesian Network
?
Naive Bayes
Logistic Regression
Linear Chain Conditional Random Field
?
Hidden Markov Model
Markov Random Field
General Conditional Random Field
10
Generative models

based on joint probability distribution p(y,x)
includes a model of p(x) which is not needed for
classification
interdependent features
either enhance model structure to represent them
complexity problems
or make simplifying independence assumptions
e.g. naive bayes once the class label is known,
all features are independent

11
Discriminative models

based directly on conditional probability p(yx)
need no model for p(x)
simply
make independence assumptions among y but not
among x
in general

computed by inference
conditional approach more freedom to fit data
12
Naive bayes and logistic regression (1)

naive bayes and logistic regression are
generative-discriminative pair
naive bayes
It can be shown that a gaussian naive bayes
classifier implies the parametric form of p(yx)
of its discriminative pair logistic regression!

LR is a MRF globally conditioned on X Use
log-linear model as potential functions in
CRFs LR is a very simple CRF
13
Naive bayes and logistic regression (2)

if GNB assumptions hold, then GNB and LR converge
asymptotically toward identical classifiers
in generative models set of parameters must
represent input distribution and conditional
well.
in discriminative models are not as strongly tied
to their input distribution
e.g. LR fits its parameter to the data although
the naive bayes assumption might be violated
in other words there are more (complex) joint
models than GNB whose conditional also have the
LR form
GNB and LR mirror relationship between HMM and
linear chain CRF

14
Overview sequence models
Bayesian Network
Naive Bayes
Logistic Regression
Linear Chain Conditional Random Field
Hidden Markov Model
Markov Random Field
General Conditional Random Field
15
Sequence models HMMs

power of graphical models model many
interdependent variables
HMM models joint distribution
uses two independence assumptions to do it
tractably
given the direct predecessor, each state is
independent of his ancestors
each observation depends only on current state

16
From HMMs to linear chain CRFs (1)

key conditional distribution p(yx) of an HMM is
a CRF with a particular choice of feature
function
parameters are not required to be log
probabilities, therefore introduce normalisation
using feature functions

with
17
From HMMs to linear chain CRFs (2)

last step write conditional probability for the
HMM
This is a linear chain CRF that includes features
only HMM features, richer features are possible

18
Linear chain conditional random fields

Definition
for general CRFs use arbitrary cliques

with
19
Side trip maximum entropy markov models

entropy - measure of the uniformity of a
distribution
maximum entropy model maximises entropy, subject
to constraints imposed by training data
model conditional probabilities of reaching a
state given an observation o and previous state
s instead of joint probabilities
observations on transitions
split P(ss,o) in S separately trained
transition functions Ps(so)
leads to per state normalisation

20
Side Trip label bias problem

CRF like log-linear models, but label bias
problem
per state normalisation requires that
probabilities of transitions leaving a state must
some to one
conservation of probability mass
states with one outgoing transition ignore
observation

Calculate
21
Inference in a linear chain CRF

slight variants of HMM algorithms
Viterbi use definition from HMM
but define
because CRF model can be written as

where
22
Parameter estimation in general

So far major drawback
generative model tend to have higher asymptotic
error, but
it approaches its asymptotic error faster than a
discriminative one with number of training
examples logarithmic in number of parameters
rather than linear
remember discriminative models make no
independent assumptions for observations x

23
Principles in parameter estimation

basic principle maximum likelihood estimation
with conditional log likelihood of
advantage conditional log likelihood is concave,
therefore every local optimum is a global one
use gradient descent quasi-Newton methods
runtime in O(tm2ng) t length of sequence, m
number of labels, n number of training instances,
g number of required gradient computations

24
Application gene prediction

use finite-state CRFs to locate introns and exons
in DNA sequences
advantages of CRFs
ability to straightforwardly incorporate homology
evidence from protein databases.
used feature functions
e.g. frequencies of base conjunctions and
disjunctions in sliding windows over 20 bases
upstream and 40 bases downstream (motivation
splice site detection)
How many times did C or G occurred in the prior
40 bases with sliding window of size 5?
E.g. frequencies how many times a base appears in
related protein (via BLAST search)
Outperforms 5th order hidden semi markov model by
10 reduction in harmonic mean of precision and
recall
(86.09 lt-gt 84.55)

25
Summary graphical models
26
The end
Questions ?
27
References

An Introduction to Conditional Random Fields for
Relational Learning. Charles Sutton and Andrew
McCallum. In Introduction to Statistical
Relational Learning. Edited by Lise Getoor and
Ben Taskar. MIT Press. 2006.
(including figures and formulae)
H. Wallach, "Efficient training of conditional
random fields," Master's thesis, University of
Edinburgh, 2002. http//citeseer.ist.psu.edu/walla
ch02efficient.html
John Lafferty, Andrew McCallum, and Fernando
Pereira. Conditional random fields Probabilistic
models for segmenting and labeling sequence data.
In Proceedings of ICML-01, pages 282-289, 2001.
Gene Prediction with Conditional Random Fields.
Aron Culotta, David Kulp, and Andrew McCallum.
Technical Report UM-CS-2005-028, University of
Massachusetts, Amherst, April 2005.

28
References

Kevin Murphy. An introduction to graphical
models. Technical report, Intel Research
Technical Report., 2001. http//citeseer.ist.psu.e
du/murphy01introduction.html
On Discriminative vs. Generative Classifiers A
comparison of logistic regression and Naive
Bayes, Andrew Y. Ng and Michael Jordan. In NIPS
14,, 2002.
T. Minka. Discriminative models, not
discriminative training. Technical report,
Microsoft Research Cambridge, 2005.
P. Blunsom. Maximum Entropy Classification.
Lecture slides 433-680. 2005. http//www.cs.mu.oz.
au/680/lectures/week06a.pdf