Title: Probabilistic Relaxation
1Probabilistic Relaxation
This is a method for probabilistically optimal
labeling, a slight variation on the consistent
labeling problem, which is particularly
effective when one has a large number of
objects and a fairly simple, regular arrangement
of relational constraints. We will develop
and motivate this technique by means of a
case study on a real image understanding
problem, coaxially-viewed weld scenes.
There is a crisp version of this process, known
as discrete relaxation, which we will not
discuss. Its not as useful in most realistic
situations, and you dont need to see it
first to understand probabilistic relaxation.
In our problem (a typical one for this
formulation) we will have A set of
objects corresponding to predefined image
regions (16X16 blocks of pixels a
tesselation). A set of labels
corresponding to the scene entities.
Relational constraints defined over a regular
mesh adjacent objects constrain each
others labels.
2Image Tesselation and the Constraint Graph
Each 16X16 pixel tile of the image (e.g.) is
identified with a node in a graph with a very
regular, simple edge structure. Although not
absolutely required, problems that can be
expressed in this graphical formulation are good
candidates for relaxation labeling.
Although not explicitly represented, directional
information will be used in our solution.
Even though it appears that only adjacent nodes
constrain one another, we will see that the
constraints propagate across the full mesh
all nodes are mutually constraining.
3Basic System Structure
4 Notice the information structure, common to
most (all?) labeling problems
For our weld pool problem, we have the
following set of six possible labels We
extract a set of 13 features (attributes) for
each object (region, tile) in the image based
on Location Gray level
Texture Range, standard deviation
Cooccurrence attributes Run length attributes
5 To compute the conditional probability that a
region takes label , given its
(13-D) feature vector , we use Bayes
Rule
Stacking this calculation for each label into a
vector gives us the labeling probability
vector
We can neglect the leading term
and simply normalize (at each step)
6 So we dont have to know the total probability
of X explicitly, which is a good thing.
In effect, all we really care about are
relative probabilities, and thats what were
going to get. Now we build up estimates of the
prior probability of each label and the
conditional probability of the feature vector
X conditioned on each label from histograms of
training data. Yes, it is tedious, but we
only have to do it once! If we can assume that
the features are mutually independent, that
helps a lot. Otherwise, we encounter what is
known as the curse of dimensionality.
How does this help? Each training observation
produces one point in each of N (six) 1D
histograms, instead of a single point in one
N-D (6-D) histogram. Good assumption? Depends
on your feature set, and how you select them.
From our set of 13, about half were
reasonably independent and information-bearing
kept.
7Probabilistic Relaxation Labeling
Compatibility Coefficients We will define
four, one for each compass direction (N, S,
E, W) from the current node (region).
Almost any formulation works (converges to a
reasonable result) as long as the following
constraints are met C0 Bounded ...
C1 If the occurrence of label at the
adjacent region in direction d is
independent of the occurrence of at
the current region, then
C2 If never occurs (or cannot occur) at
the adjacent region in direction d when
occurs at the current region, then
C3 If always occurs (or must occur) at the
adjacent region in direction d when
occurs at the current region, then
8C4 If is unlikely at the adjacent
region in direction d when occurs
at the current region, then C5 If is
likely at the adjacent region in direction d
when occurs at the current region,
then There are a number of ways of
developing these coefficients. I personally
like one by Yamamoto, which is based on
information theory. I will not explain his
development, but simply state the result.
where is the
conditional probability of label at the
current node, given that label has been
assigned to the adjacent node in direction d .
9Parallel - Iterative Operation
Let be the estimated probability that
region i takes label at iteration k .
Then the label probability vector for region i
is . The initial labeling
vectors for each region are provided by the
Bayesian preclassifier
Now the (k1)th label probability at region i
is
where
and is the k-th estimate of the
probability of label occurring at the
adjacent region in direction d .
10Performance Evaluation
How do we know when to stop and make our
decisions? Recall, it is a MAP decision, so we
will assign the label having the largest
estimated probability, once we stop. Two
options (there are more) 1. Watch the
entropy of the labeling. When it gets
small enough, STOP. The entropy of the total
labeling is Where the entropy of labeling
vector i is 2. Watch the mean rate of
change of the labeling vectors. When
they quit changing, STOP.