Parameter Learning in MN - PowerPoint PPT Presentation

About This Presentation

Title:

Parameter Learning in MN

Description:

Log-linear Markov network (most common ... Iterate and converge to optimal parameters. Each iteration, must compute: ... Run lbp ,when converged. Yj. xi. xj ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 21

Provided by: scie5

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Parameter Learning in MN

1
Parameter Learning in MN
2
Outline

CRF
Learning CRF for 2-d image segmentation
IPF parameter sharing revisited

3
Log-linear Markov network(most common
representation)

Feature is some function ?D for some subset of
variables D
e.g., indicator function
Log-linear model over a Markov network H
a set of features ?1D1,, ?kDk
each Di is a subset of a clique in H
two ?s can be over the same variables
a set of weights w1,,wk
usually learned from data

4
Generative v. Discriminative classifiers A
review

Want to Learn hX ? Y
X features
Y target classes
Bayes optimal classifier P(YX)
Generative classifier, e.g., Naïve Bayes
Assume some functional form for P(XY), P(Y)
Estimate parameters of P(XY), P(Y) directly from
training data
Use Bayes rule to calculate P(YX x)
This is a generative model
Indirect computation of P(YX) through Bayes rule
But, can generate a sample of the data, P(X) ?y
P(y) P(Xy)
Discriminative classifiers, e.g., Logistic
Regression
Assume some functional form for P(YX)
Estimate parameters of P(YX) directly from
training data
This is the discriminative model
Directly learn P(YX)
But cannot obtain a sample of the data, because
P(X) is not available

5
Log-linear CRFs(most common representation)

Graph H only over hidden vars Y1,..,YP
No assumptions about dependency on observed vars
X
You must always observe all of X
Feature is some function ?D for some subset of
variables D
e.g., indicator function,
Log-linear model over a CRF H
a set of features ?1D1,, ?kDk
each Di is a subset of a clique in H
two ?s can be over the same variables
a set of weights w1,,wk
usually learned from data

6
Example Image Segmentation

- A set of features ?1D1,, ?kDk
each Di is a subset of a clique in H
two ?s can be over the same variables

y1
y2
y3
y4
y5
y6
y7
y8
y9

We will define features as follows
measures compatibility of node color and
its segmentation
A set of indicator features triggered for each
edge labeling pair ff,bb,fb,bf
This is a allowed since we can define many
features overr the same subset of variables

7
Example Image Segmentation

- A set of features ?1D1,, ?kDk
each Di is a subset of a clique in H
two ?s can be over the same variables

y1
y2
y3
y4
y5
y6
y7
y8
y9
8
Example Image Segmentation

- A set of features ?1D1,, ?kDk
each Di is a subset of a clique in H
two ?s can be over the same variables

Now we just need to sum these features

We need to learn parameters wm
9
Example Image Segmentation
Given N data points (images and their
segmentations)
Requires inference using the current parameter
estimates
Count for features m in data n
10
Example Inference for Learning
How to compute ECfbXn
11
Example Inference for Learning
How to compute ECfbXn
12
Representation Equivalence
Log linear representation
Tabular MN representation from HW4
13
Representation Equivalence
Log linear representation
Tabular MN representation from HW4
Now do it over the edge potential
This is correct as for every assignment to yiyj
we select one value from the table
14
Tabular MN representation from HW4
Now do it over the edge potential
This is correct as for every assignment to yiyj
we select one value from the table
The cheap exp(log..) trick
Just algebra
Now lets combine it over all edge assuming
parameter sharing
Now use the same Cm trick
15
Representation Equivalence
Log linear representation
Tabular MN representation from HW4
Now substitute
Equivalent, with wm log qm Where q is the
value in the tabular
16
Outline

CRF
Learning CRF for 2-d image segmentation
IPF parameter sharing revisited

17
Iterative Proportional Fitting (IPF)

Setting derivative to zero
Fixed point equation
Iterate and converge to optimal parameters
Each iteration, must compute

18
Parameter Sharing in your HW

Note that I am suing Y for label
All edge potentials are shared
Also we are learning a conditional model

19
IPF parameter sharing
We only have one data point (image) in this
example so we dropped Xn to only X
In total we have 4 parameters as opposed to 4
parameters per edge
How to calculate these quantities using parameter
sharing?
We can cancel E due to division
20
(No Transcript)

Write a Comment

User Comments (0)