Title: Segmentation and Fitting Using Probabilistic Methods
1Segmentation and Fitting Using Probabilistic
Methods
- Or, How Expectation-Maximization Can Cure Your
Computer Vision System of Almost Anything - Well maybe...
2Departure Point
- Up to now, most of what weve done in the
grouping, segmentation arena has been local. - Now we want to model things globally, and in
probabilistic terms. - Explain a large collection of tokens with a few
parameters. (Hmmm. Like the Hough?)
3Missing Data Problems, Fitting, Segmentation
- Often, if some parameters were known, the maximum
likelihood problem would be easy - Fitting If you know which line each token comes
from, getting the parameters is easy - Segmentation If you the segment each pixel comes
from, the segments parameters are easily
determined - Fundamental Matrix If you know the
correspondences.
4Missing Data Problem
- A missing data problem is one where
- Some terms in a data vector are missing in some
instances, but present in others - An inference problem can be made simpler by
rewriting it using some variables whose values
are unknown - Algorithm Concept Take an expectation over the
missing data
5Missing Data Problems
- Strategy
- Estimate values for the missing data
- Plug these in, now estimate parameters
- Re-estimate values for missing data
- Continue to convergence
- For example
- Guess a mapping of points to lines
- Fit each line to its points
- Reallocate points to the fitted lines
- Loop to convergence
- Reminiscent of K-means, is it not?
6Refining the Strategy
- The problem has parameters to be estimated, and
missing variables (data) - Iterate to convergence
- Replace missing data with expected values, given
fixed parameter values - Fix the missing data, do a maximium likelihood
estimate of the parameters, given that data
7Refining the Example
- Allocate each point to a line with a weight equal
to the probability of the point, given the lines
parameters - Refit the lines to the weighted set of points
- Converges to local extremum (caution)
- Can be generalized
8Image Segmentation
pl Probability of choosing segment l at random
(a priori) p(xql) Conditional density of
feature vector x, given that it
comes from segment l, l1,g Model p(xql) is
Gaussian, ql(ml,Sl) The total density for the
feature vector of any pixel drawn at random
Segment 1, q1 Segment 2, q2 Segment 3, q3 Segment
4, q4
This is known as a Mixture Model
9Mixture Model Generative
- To produce a pixel (feature vector)
- Pick an image segment l with prior probability pl
- Draw a sample from p(xql)
- Density in x space is a set of g Gaussian blobs,
one per segment - We want to determine
- The parameters of each blob (the m and S values)
- The mixing weights (the p values)
- A mapping of pixels to components (the
segmentation)
10Package all these things into a parameter vector
mixing weights blob parameters
The mixture model becomes
With each component a multivariate Gaussian
11The Chicken and the Egg
- If we knew which pixel belonged to which
component, Q would be straightforward - Use Max Likelihood estimates for each ql
- Fraction of image in each component gives al
- If we knew Q, then
- For each pixel, assign it to its most likely blob
- Unfortunately, we know neither
- Thats where Expectation-Maximization (EM) comes
in iterate guesses until convergence
12Formal Statement of Missing Data Problems
X Complete data space
f
Y Incomplete data space
Measurements at each pixel
and Set of variables matching pixels to mixture
components Measurements at each token
and Mapping of tokens to lines
Measurements at each pixel
Measurements at each token
13Missing, Formally
Mixing weights and Parameters (mean, covariance)
of each mixture component (parameters of each
line)
U Parameter space
We want to obtain a maximum-likelihood estimate
of these parameters given incomplete data. If we
had complete data, the we could use the joint
density function for the complete data space,
pc(xu). Complete data log-likelihood
14OK. We maximize this to estimate each segments
parameters (image segmentation) or the mixing
weights and parameters of the lines, given the
mapping of the tokens to lines (for the line
fitting example). Problem. We dont have
complete data. The density for the incomplete
space is the marginal density of the complete
space where weve integrated out the parameters
we dont know.
15This is a pain in the neck We dont know which
of the many possible x values that could
correspond to the y values we observe are
correct. Weve taken a projection (of some
sort), and we cannot uniquely reconstruct the
full joint density. So we have to average over
all those possibilities to make our best guess.
But all is not lost We have the following
strategy 1. Obtain some estimate of the
missing data using a guess at the parameters. 2.
Form a maximum likelihood estimate of the free
parameters using the estimate of the missing
data. 3. Iterate to (hopefully) convergence.
16Strategy by Example
- Image segmentation
- Obtain an estimate of the component from which
each pixel comes using an estimate of the ql - Update the ql and the mixing weights using this
estimate - Tokens and lines
- Obtain an estimate of the correspondence between
tokens and lines, using a guess at the line
parameters - Revise the estimate of the line parameters using
the estimated correspondences
17Expectation-MaximizationFor Mixture Models
- Assume the complete log-likelihood is linear in
the missing variables. (Common) - Mixture model Missing data indicate the mixture
component from which a data item is drawn. - Represent this by associating with each data
point a bit vector z of g elements (one per
component in the mix).
18About the z Vectors (matrix)
Mixture components, one Gaussian per column
l
j
Data points, one per row. That is, one row per
observation, each row a z vector.
1 if pixel (token) j produced by Gaussian mixture
component l. Expectation Probability of that
event.
g
n
19So our complete information can be written
as Write the mixture model as (line
example) Complete data log-likelihood
is This is linear in the missing variables.
Good news! How did we ensure that that would
happen?
We will think of the entries in z as
probabilities, expectations.
20EM The Key Idea
- Obtain working values for the missing data, and
so for x by substituting the expectation for each
missing value. - That is, fix the parameters, then compute each
expectation Ezjl, given yj and the parameter
values. - Plug Ezjl into the complete data log-likelihood
and find parameters maxing that. - Ezjl has probably changed, so repeat.
21More Formally
Given us we form us1 by 1. E-Step Compute
expected value for complete data using the
incomplete data and the current parameter
estimates. We know the expected value of yj (the
means of the current Gaussian guesses) and only
need expected value of zj for each j. Denote
these values as . Superscript indicates
that the expectation depends on current parameter
values at step s. 2. M-Step Maximize the
complete data log-likelihood with respect to u
using the expectation from the E-step.
22Image SegmentationIn Practice (Warning Your
text is a typo minefield)
Set up an n by g array of indicators I (Each row
like z vector) E-Step The j, l element of I is
1 if pixel j comes from blob l E(Ijl) Prob
(pixel j comes from Gaussian blob l) Note
This is no longer a binary value!
b/(ab)
a b
x
23Practice
M-Step Now form a maximum-likelihood estimate of
Qs1
average value in each column
weighted average feature vector for each column
weighted average covariance matrix for each
column
24When it Converges...
- Can make a maximum a posteriori (MAP) decision by
assigning each pixel to the Gaussian for which it
has the highest E(Ijl). - Can also keep the probabilities and work with
them in, for instance, a probabilistic relaxation
framework. (coming attractions)