Title: Learning To Segment Images Using Perceptual Edge Features
1Recognition and Segmentation of Broad Categories
of Scene Content
John Kaufhold and Anthony Hoogs Visualization and
Computer Vision Lab GE Global Research
Center Niskayuna, NY
2Motivation
3Motivation
cumulative scene content map
Per-frame scene content map
4Problem Statement
- Joint segmentation and classification
- Classify every pixel in the image/video
- Recognition of very broad categories
- Large intra-class variation
- Large inter-class separability sometimes
- Applications
- Content-based retrieval
- Land cover classification
- Intelligent tracking
5Related Work
- Color and/or texture, mostly in CBIR
- Konishi and Yuille, 00
- Manjunath et al, 02
- Region-based Classification
- Wang et al, 01
- Ren Malik, 03
- Duygulu, Barnard, de Freitas, Forsyth, Jordan,
01-04 - Manmatha et al, 04
- Sarkar et al, 00
- Detection (2-class case)
- Kumar Hebert, 03
- Unifying segmentation, detection and/or
recognition - Zhu et al, 00-03
- Ullman et al, 02
- Categorical object recognition
- You!
- Complementary approach point/patch vs. image
partition - Classes are getting broader
6Challenges
- Color does not solve the problem
- Helpful, but flaky
- E.g., many things are gray
- Texture does not solve the problem
- Scale-dependent
- Local
- Many things are locally smooth
- Segmentation is ill-posed
- Requires some form of higher-order regularization
- Joint segmentation and classification
- Regions are promising, but difficult to
characterize
7Overview
- Approach
- Region segmentation
- Region-based Perceptual Features
- Classification
- Results
8Approach
Refine over Space/Time Regularize Classifications
Estimate Image Structure Dense Region
Segmentation
Label Structural Elements Classify Regions
Key idea Exploit perceptual features embedded in
the region graph
9 Detailed Approach
Training Images
Test Images
Compute Region Segmentation
Compute Region Segmentation
Compute Region-Based Perceptual Features
Compute Region-Based Perceptual Features
Learn Region Classifier
Trained Classifier
video
Classify Regions
Combine Frame Classifications
Score Segmentation Algorithm
Ground Truth Segmentations by Humans
10Region Segmentation
Eq. 1
Minimize
Minimize Mumford-Shah
90 seconds/image
Image,
Edges, 0 lt lt 1
- Typically 102 - 104 closed regions,
- depending on textures
- Eq. 1 similar to Mumford-Shah (Shah)
- L1 norms and Ambrosio-Tortorelli-ized
- u is piecewise constant, v is edge field (0 to 1)
- Region topology, polylines around
- each region
- Minimize Eq. 1 by coordinate descent
- hold regions fixed, estimate edges
- hold edges fixed, estimate regions
- Individual region-region edge
- segments indexed into regions
- Region estimation is similar to total variation
- Use half-quadratic minimization to solve
(Vogel/Oman,Kaufhold)
11Example Image ROI
image,
12Computed Edge Image
edges, 0 lt lt 1
- Closed edges form regions
- Edge intensity indicates local gradient contrast
13Region Label Image
yellow dots are region centroids
Every yellow dot has an associated region-based
perceptual feature vector, FR (i)
14Region Neighborhood
Edge image, v
X
X
Region highlighted
Region Neighbors highlighted
15Region-based Perceptual Features
16Region Features
- Every regions FR (i) contains the features shown
- Every regions FR(i) depends on its neighbor
regions, too
17Advantages of using RegionNeighborhood
- Region-based features measure relationships
between image elements at arbitrary distances in
the image, as determined by image structure - Not limited to fixed window or propagation
methods - Not a grid
- Boundaries are highly localized
- Can precisely locate texture-texture boundaries
(Kaufhold Hoogs, CVPR 04) - Better generalization from training
- Perceptual features are more suitable for broad
categories (Hoogs et al, PAMI 03) - Same perceptual features can represent texture as
well as large-scale structure. - No need to switch based on texture/region
- The learning system outputs a set of perceptual
rules for distinguishing classes - Not just a collection of support vectors, e.g.
- Rotation and scale invariance (for many features)
182D Rotation Invariance
192D Rotation Invariance
20Limitations
- Must recover regions/edges
- Cannot infer missing contours directly
- Texture features below minimum gradient, or not
formed into edges, are only treated statistically
on intensity - Not effective at discriminating between smooth
textures - E.g. sandpaper, fine-grained carpet
- Region instability
- Small gap in boundary will merge adjacent areas
- No direct representation of texture
- Exploring adding filter bank features
- Mitigations
- Over-segment then group later
- Features combining contrast and geometry
21Classification AdaBoost.MH (multilabel)
Each ht is calculated for all labels, l, and e is
fraction of all errors on mk decisions
Initialize D1(i,l) 1/mk
A Sample Can Belong To more than 1 class!
For t 1 to T
D now spread over k classes
Set of all label estimates for example i
22 Detailed Approach
Training Images
Test Images
Compute Region Segmentation
Compute Region Segmentation
Compute Region-Based Perceptual Features
Compute Region-Based Perceptual Features
Learn Region Classifier
Trained Classifier
video
Classify Regions
Combine Frame Classifications
Score Segmentation Algorithm
Ground Truth Segmentations by Humans
23Preliminary Results Aerial Images
- 6 classes
- Sampled from a single, long video (20 minutes, 10
miles) - No spatial overlap between training and test
images
Training images
Manual ground-truth segmentations
24Training Image
25Test Image
26Test Images
27Test Images
28Preliminary Results Broadcast Video
Trained on these and 34 other keyframes from one
story
29Boundary Detection (CVPR 04)
Original
Human boundaries
Final Boundaries
Initial edges
30original image
human boundaries
initial edges
final boundaries
- Boundary types
- region-region
- region-texture
- texture-texture
31human
initial
final
32Conclusions
- Method for segmentation and classification
- Classify every pixel into some category (could
include other) - Region-based features to capture Gestalt
properties - Recognizes broad, generic categories
- But not specific categories or objects
- Exploits image structure directly
- Not limited to fixed window or scale
- Learns class geometry as well as appearance
- E.g. road regions tend to be long and thin
- Previously shown to learn boundary detection
- Same region-based features