Conditional Random Fields - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Conditional Random Fields

Description:

One winter day in a certain unnamed small college town, there was a snowstorm of ... Because of the treacherous conditions, she arrived at the lecture hall forty ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 38

Provided by: www2C

Category:

more less

Transcript and Presenter's Notes

Title: Conditional Random Fields

1
Conditional Random Fields

William W. Cohen
Feb 13, 2007

2
One winter day in a certain unnamed small college
town, there was a snowstorm of such epic
proportions that many roads were closed down.
However, one stalwart and dedicated student
decided to make the trek to class anyway. Because
of the treacherous conditions, she arrived at the
lecture hall forty minutes late, only to find the
room empty except for the professor, busy
lecturing, and one other classmate. She took the
seat next to him. After a few minutes, she leaned
over and asked her fellow student, "What's the
prof talking about?" The other student replied,
"How should I know? I only got here ten minutes
before you.
- Lillian Lee, Cornell CS
3
Announcements

This week
Office hours Fri 1030-1200
Lecture 1 Sha Pereira, Lafferty et al 2001,
Klein and Manning
Lecture 2 Stacked Sequential Learning
Three student presentations

4
Review motivation for CMMs
Ideally we would like to use many, arbitrary,
overlapping features of words.
S
S
S
identity of word ends in -ski is capitalized is
part of a noun phrase is in a list of city
names is under node X in WordNet is in bold
font is indented is in hyperlink anchor
t
-
1
t
t1

is Wisniewski

part ofnoun phrase
ends in -ski
O
O
O
t
t
1
-
t
1
5
Motivation for CMMs
S
S
S
identity of word ends in -ski is capitalized is
part of a noun phrase is in a list of city
names is under node X in WordNet is in bold
font is indented is in hyperlink anchor
t
-
1
t
t1

is Wisniewski

part ofnoun phrase
ends in -ski
O
O
O
t
t
1
-
t
1
Idea replace generative model in HMM with a
maxent model, where state depends on observations
and previous state
6
Implications of the model

Does this do what we want?
Q does Yi-1 depend on Xi1 ?
a nodes is conditionally independent of its
non-descendents given its parents

7
Inference for MXPOST
When will prof Cohen post
the notes
B
B
B
B
B
B
B
I
I
I
I
I
I
I
O
O
O
O
O
O
O
(Approx view) find best path, weights are now on
arcs from state to state.
8
Inference for MXPOST
When will prof Cohen post
the notes
B
B
B
B
B
B
B
I
I
I
I
I
I
I
O
O
O
O
O
O
O
More accurately find total flow to each node,
weights are now on arcs from state to state.
Flow out of a node is always fixed
9
Label Bias Problem

Consider this MEMM

P(1 and 2 ro) P(2 1 and ro)P(1 ro)
P(2 1 and o)P(1 r)
P(1 and 2 ri) P(2 1 and ri)P(1 ri)
P(2 1 and i)P(1 r)
Since P(2 1 and x) 1 for all x, P(1 and 2
ro) P(1 and 2 ri)
In the training data, label value 2 is the only
label value observed after label value 1
Therefore P(2 1) 1, so P(2 1 and x) 1 for
all x
However, we expect P(1 and 2 ri) to be
greater than P(1 and 2 ro).
Per-state normalization does not allow the
required expectation

10
Label Bias Problem

Consider this MEMM, and enough training data to
perfectly model it

Pr(0123rib)1 Pr(0453rob)1
Pr(0123rob) Pr(10,r)/Z1 Pr(21,o)/Z2
Pr(32,b)/Z3 0.5 1 1
Pr(0453rib) Pr(40,r)/Z1 Pr(54,i)/Z2
Pr(35,b)/Z3 0.5 1 1
11
How important is label bias?

Could be avoided in this case by changing
structure

Our models are always wrong is this wrongness
a problem?
See Klein Mannings paper for more on this.

12
Another view of label bias Sha Pereira
So whats the alternative?
13
Inference for MXPOST
When will prof Cohen post
the notes
B
B
B
B
B
B
B
I
I
I
I
I
I
I
O
O
O
O
O
O
O
More accurately find total flow to each node,
weights are now on arcs from state to state.
Flow out of a node is always fixed
14
Another max-flow scheme
When will prof Cohen post
the notes
B
B
B
B
B
B
B
I
I
I
I
I
I
I
O
O
O
O
O
O
O
More accurately find total flow to each node,
weights are now on arcs from state to state.
Flow out of a node is always fixed
15
Another max-flow scheme MRFs
When will prof Cohen post
the notes
B
B
B
B
B
B
B
I
I
I
I
I
I
I
O
O
O
O
O
O
O
Find total flow to each node, weights are now on
edges from state to state. Goal is to learn how
to weight edges in the graph, given features from
the examples.
16
CRFs vs MEMMs

MEMMs
Sequence classification fx?y is reduced to many
cases of ordinary classification, fxi?yi
combined with Viterbi or beam search

CRFs
Sequence classification fx?y is done by
Converting x,Y to a MRF
Using flow computations on the MRF to compute
some best yx

x1 x2 x3 x4 x5 x6
x1 x2 x3 x4 x5 x6

Pr(Yx2,y1)
MRF f(Y1,Y2), f(Y2,Y3),.
Pr(Yx4,y3)

Pr(Yx5,y5)
Pr(Yx2,y1)

y1 y2 y3 y4 y5 y6
y1 y2 y3 y4 y5 y6
17
The math Review of maxent
18
Review of maxent/MEMM/CMMs
We know how to compute this.
19
Details on CMMs
20
From CMMs to CRFs
Recall why were unhappy we dont want local
normalization
How to compute this?
21
Whats the new model look like?
Whats independent?
22
Whats the new model look like?
Whats independent now??
y1
y2
y3
x
23
Hammerley-Clifford

For positive distributions P(x1,,xn)
Pr(xix1,,xi-1,xi1,,xn) Pr(xiNeighbors(xi))
Pr(AB,S) Pr(AS) where A,B are sets of nodes
and S is a set that separates A and B
P can be written as normalized product of clique
potentials

So this is very general any Markov distribution
can be written in this form (modulo nits like
positive distribution)
24
Definition of CRFs
X is a random variable over data sequences to be
labeled Y is a random variable over corresponding
label sequences
25
Example of CRFs
26
Graphical comparison among HMMs, MEMMs and CRFs
HMM MEMM CRF
27
Lafferty et al notation
28
Conditional Distribution (contd)

CRFs use the observation-dependent
normalization Z(x) for the conditional
distributions

Z(x) is a normalization over the data sequence x

Learning
Lafferty et als IIS-based method is rather
inefficient.
Gradient-based methods are faster
Trickiest bit is computing normalization, which
is over exponentially many y vectors.

29
CRF learning from Sha Pereira
30
CRF learning from Sha Pereira
31
CRF learning from Sha Pereira
Something like forward-backward

Idea
Define matrix of y,y affinities at stage i
Miy,y unnormalized probability of
transition from y to y at stage I
Mi Mi1 unnormalized probability of any
path through stages i and i1

32
y1
y2
y3
x
y1
y2
y3
33
Forward backward ideas
a
e
name
name
name
c
g
b
f
nonName
nonName
nonName
d
h
34
CRF learning from Sha Pereira
35
Sha Pereira results
CRF beats MEMM (McNemars test) MEMM probably
beats voted perceptron
36
Sha Pereira results
in minutes, 375k examples
37
Some recent results
ICML 2006
38
Some recent results
39
POS tagging Experiments in Lafferty et al

Compared HMMs, MEMMs, and CRFs on Penn treebank
POS tagging
Each word in a given input sentence must be
labeled with one of 45 syntactic tags
Add a small set of orthographic features whether
a spelling begins with a number or upper case
letter, whether it contains a hyphen, and if it
contains one of the following suffixes -ing,
-ogy, -ed, -s, -ly, -ion, -tion, -ity, -ies
oov out-of-vocabulary (not observed in the
training set)

40
POS tagging vs MXPost

Write a Comment

User Comments (0)