Conditional Random Fields A probabilistic graphical model Stefan Mutter

1 / 28
About This Presentation
Title:

Conditional Random Fields A probabilistic graphical model Stefan Mutter

Description:

Conditional Random Fields - A probabilistic graphical model Stefan Mutter ... T. Minka. Discriminative models, not discriminative training. ... –

Number of Views:457
Avg rating:3.0/5.0
Slides: 29
Provided by: stefan97
Category:

less

Transcript and Presenter's Notes

Title: Conditional Random Fields A probabilistic graphical model Stefan Mutter


1
Conditional Random Fields - A probabilistic
graphical model
  • Stefan Mutter

2
Motivation
Bayesian Network
Naive Bayes
Logistic Regression
Linear Chain Conditional Random Field
Hidden Markov Model
Markov Random Field
General Conditional Random Field
3
Outline
  • different views on building a conditional random
    field (CRF)
  • from directed to undirected graphical models
  • from generative to discriminative models
  • sequence models
  • from HMMs to CRFs
  • CRFs and maximum entropy markov models (MEMM)
  • parameter estimation / inference
  • applications

4
Overview directed graphical models
Bayesian Network
Naive Bayes
Logistic Regression
Linear Chain Conditional Random Field
Hidden Markov Model
Markov Random Field
General Conditional Random Field
5
Bayesian Networks directed graphical models
  • in general
  • a graphical model - family of probability
  • distributions that factorise according to an
  • underlying graph
  • one-to-one correspondence between
  • nodes and random variables
  • a set V of random variables consisting of a set X
    of input variables and a set Y of output
    variables to predict
  • independence assumption using topological
    ordering
  • a node is v conditionally independent of its
    predecessors given its direct parents p(v)
    (Markov blanket)
  • direct probabilistic interpretation
  • family of distributions factorises into

6
Overview undirected graphical models
Bayesian Network
Naive Bayes
Logistic Regression
Linear Chain Conditional Random Field
Hidden Markov Model
Markov Random Field
General Conditional Random Field
7
Markov Random Field undirected graphical models
  • undirected graph for joint probability p(x)
    allows no direct probabilistic interpretation
  • define potential functions ? on maximal cliques A
  • map joint assignment to non-negative real number
  • requires normalisation

?green
?red
8
Markov Random Fields and CRFs
  • A CRF is a Markov Random Field globally
    conditioned on X
  • How do the potential functions ? look like?

9
Overview generative ? discriminative models
Bayesian Network
?
Naive Bayes
Logistic Regression
Linear Chain Conditional Random Field
?
Hidden Markov Model
Markov Random Field
General Conditional Random Field
10
Generative models
  • based on joint probability distribution p(y,x)
  • includes a model of p(x) which is not needed for
    classification
  • interdependent features
  • either enhance model structure to represent them
  • complexity problems
  • or make simplifying independence assumptions
  • e.g. naive bayes once the class label is known,
    all features are independent

11
Discriminative models
  • based directly on conditional probability p(yx)
  • need no model for p(x)
  • simply
  • make independence assumptions among y but not
    among x
  • in general

computed by inference
conditional approach more freedom to fit data
12
Naive bayes and logistic regression (1)
  • naive bayes and logistic regression are
    generative-discriminative pair
  • naive bayes
  • It can be shown that a gaussian naive bayes
    classifier implies the parametric form of p(yx)
    of its discriminative pair logistic regression!

LR is a MRF globally conditioned on X Use
log-linear model as potential functions in
CRFs LR is a very simple CRF
13
Naive bayes and logistic regression (2)
  • if GNB assumptions hold, then GNB and LR converge
    asymptotically toward identical classifiers
  • in generative models set of parameters must
    represent input distribution and conditional
    well.
  • in discriminative models are not as strongly tied
    to their input distribution
  • e.g. LR fits its parameter to the data although
    the naive bayes assumption might be violated
  • in other words there are more (complex) joint
    models than GNB whose conditional also have the
    LR form
  • GNB and LR mirror relationship between HMM and
    linear chain CRF

14
Overview sequence models
Bayesian Network
Naive Bayes
Logistic Regression
Linear Chain Conditional Random Field
Hidden Markov Model
Markov Random Field
General Conditional Random Field
15
Sequence models HMMs
  • power of graphical models model many
    interdependent variables
  • HMM models joint distribution
  • uses two independence assumptions to do it
    tractably
  • given the direct predecessor, each state is
    independent of his ancestors
  • each observation depends only on current state

16
From HMMs to linear chain CRFs (1)
  • key conditional distribution p(yx) of an HMM is
    a CRF with a particular choice of feature
    function
  • parameters are not required to be log
    probabilities, therefore introduce normalisation
  • using feature functions

with
17
From HMMs to linear chain CRFs (2)
  • last step write conditional probability for the
    HMM
  • This is a linear chain CRF that includes features
    only HMM features, richer features are possible

18
Linear chain conditional random fields
  • Definition
  • for general CRFs use arbitrary cliques

with
19
Side trip maximum entropy markov models
  • entropy - measure of the uniformity of a
    distribution
  • maximum entropy model maximises entropy, subject
    to constraints imposed by training data
  • model conditional probabilities of reaching a
    state given an observation o and previous state
    s instead of joint probabilities
  • observations on transitions
  • split P(ss,o) in S separately trained
    transition functions Ps(so)
  • leads to per state normalisation

20
Side Trip label bias problem
  • CRF like log-linear models, but label bias
    problem
  • per state normalisation requires that
    probabilities of transitions leaving a state must
    some to one
  • conservation of probability mass
  • states with one outgoing transition ignore
    observation

Calculate
21
Inference in a linear chain CRF
  • slight variants of HMM algorithms
  • Viterbi use definition from HMM
  • but define
  • because CRF model can be written as

where
22
Parameter estimation in general
  • So far major drawback
  • generative model tend to have higher asymptotic
    error, but
  • it approaches its asymptotic error faster than a
    discriminative one with number of training
    examples logarithmic in number of parameters
    rather than linear
  • remember discriminative models make no
    independent assumptions for observations x

23
Principles in parameter estimation
  • basic principle maximum likelihood estimation
    with conditional log likelihood of
  • advantage conditional log likelihood is concave,
    therefore every local optimum is a global one
  • use gradient descent quasi-Newton methods
  • runtime in O(tm2ng) t length of sequence, m
    number of labels, n number of training instances,
    g number of required gradient computations

24
Application gene prediction
  • use finite-state CRFs to locate introns and exons
    in DNA sequences
  • advantages of CRFs
  • ability to straightforwardly incorporate homology
    evidence from protein databases.
  • used feature functions
  • e.g. frequencies of base conjunctions and
    disjunctions in sliding windows over 20 bases
    upstream and 40 bases downstream (motivation
    splice site detection)
  • How many times did C or G occurred in the prior
    40 bases with sliding window of size 5?
  • E.g. frequencies how many times a base appears in
    related protein (via BLAST search)
  • Outperforms 5th order hidden semi markov model by
    10 reduction in harmonic mean of precision and
    recall
  • (86.09 lt-gt 84.55)

25
Summary graphical models
26
The end
Questions ?
27
References
  • An Introduction to Conditional Random Fields for
    Relational Learning. Charles Sutton and Andrew
    McCallum. In Introduction to Statistical
    Relational Learning. Edited by Lise Getoor and
    Ben Taskar. MIT Press. 2006.
  • (including figures and formulae)
  • H. Wallach, "Efficient training of conditional
    random fields," Master's thesis, University of
    Edinburgh, 2002. http//citeseer.ist.psu.edu/walla
    ch02efficient.html
  • John Lafferty, Andrew McCallum, and Fernando
    Pereira. Conditional random fields Probabilistic
    models for segmenting and labeling sequence data.
    In Proceedings of ICML-01, pages 282-289, 2001.
  • Gene Prediction with Conditional Random Fields.
    Aron Culotta, David Kulp, and Andrew McCallum.
    Technical Report UM-CS-2005-028, University of
    Massachusetts, Amherst, April 2005.

28
References
  • Kevin Murphy. An introduction to graphical
    models. Technical report, Intel Research
    Technical Report., 2001. http//citeseer.ist.psu.e
    du/murphy01introduction.html
  • On Discriminative vs. Generative Classifiers A
    comparison of logistic regression and Naive
    Bayes, Andrew Y. Ng and Michael Jordan. In NIPS
    14,, 2002.
  • T. Minka. Discriminative models, not
    discriminative training. Technical report,
    Microsoft Research Cambridge, 2005.
  • P. Blunsom. Maximum Entropy Classification.
    Lecture slides 433-680. 2005. http//www.cs.mu.oz.
    au/680/lectures/week06a.pdf
Write a Comment
User Comments (0)
About PowerShow.com