Title: Abstract
1Cue Integration in Figure/Ground Labeling
Xiaofeng Ren, Charless Fowlkes and Jitendra
Malik, U.C. Berkeley
Abstract We present a model of edge and region
grouping using a conditional random field built
over a scale-invariant representation of images
to integrate multiple cues. Our model includes
potentials that capture low-level similarity,
mid-level curvilinear continuity and high-level
object shape. Maximum likelihood parameters for
the model are learned from human labeled
ground-truth on a large collection of horse
images using belief propagation. Using held out
test data, we quantify the information gained by
incorporating generic mid-level cues and
high-level shape.
Constrained Delaunay Triangulation (CDT)
Bottom-up grouping
Construct a scale-invariant representation from
bottom-up
- Compute low-level edge map
- Trace contours and recursively split them into
piecewise linear segments - Use Constrained Delaunay Triangulation to
complete gaps and partition the image into dual
edges and regions.
Output Marginals
Conditional Random Field
- joint model over contours, regions and objects
- integrate low-, mid- and high-level cues
- easy to train and test on large datasets
Quantitative Analysis of Cue Integration
We train and test our approach on a large dataset
of 344 grayscale horse images. We evaluate the
performance of the grouping algorithm against
both contours and regions in the human marked
ground-truth. We find that for this dataset with
limited pose variation, high-level knowledge
greatly boosts grouping performance nevertheless
mid-level cues still play a significant role.
Global Cue Integration in a Random Field
Maximum likelihood CRF parameters are fit via
gradient descent. We use loopy belief
propagation to perform inference, in particular
estimating the marginals of X, Y and Z.
We consider a conditional random field (CRF) on
top of the CDT triangulation graph, with a binary
random variable Xe for each edge in the CDT, a
binary variable Yt for every triangle, and a
special node Z which encodes object location.
Junctions are parameterized by their
degree Maximum-likelihood weights for various
junction types.
Low-level cues edge energy (L1) and similarity
of brightness/texture (L2).
Mid-level cues contour continuity and junction
frequency (M1) and contour/region labeling
consistency (M2).
LMH gt HL gt ML gt L
High-level cues familiar texture (H1), object
region support (H2) and object shape (H3).
We use a simple linear combination of low-, mid-
and high-level cues.
Spatial distribution of the shapeme relative to
object center.