Title: Using Background Knowledge to Improve Visual Learning
1Using Background Knowledge to Improve Visual
Learning
- Derek Hoiem
- Beckman Directors Seminar
- March 11, 2009
Work with Ali Farhadi, Ian Endres, Gang Wang,
Santosh Divvala, James Hays David Forsyth,
Alexei Efros, Martial Hebert
2What Id like to make possible with computer
vision
Household Robot
Intelligent Vehicle
Security
Photo Organization
3What we can do (with the right dataset)
- Recognize faces
- Categorize scenes
- Detect, segment, and track objects
- 3D from multiple images or stereo
- Classify actions
4What we can do
BEACH
Detect and Localize Objects
Categorize Scenes
Face Detection and Recognition
5But were a long way from Rosie
- Computer vision has been divided into many task-
and dataset-specific problems - Difficult to coordinate pieces
- Poor generalization to unfamiliar environments
- Massive engineering and data collection effort
required for every task/dataset
6Goal
- Use background knowledge generalize known
solutions to new problems or dataset
7The Challenge
-
- How can we use what we know to make learning new
things easier and more robust?
8This Talk
- Three uses of background knowledge
- Contextual knowledge
- Compositional knowledge
- Organizational knowledge
9I. Contextual Knowledge
- Goal Use knowledge of objects and spatial
layout to better detect a new object.
Work with Santosh Divvala, James Hays, Alexei
Efros, Martial Hebert
10Object Detection without Context
Search over many positions and scales
11Object Detection without Context
In each window is this a cat?
Cat?
Cat?
Cat?
12Training a Detector
Classifier
Features
Examples
Color
Edges
Texture
13Object Detection without Context
In each window is this a cat?
,
14Object Detection without Context
- Top five cat detections in a challenging dataset
Detector Felzenszwalb et al. CVPR 2008
Dataset PASCAL VOC 2008
15What do we know that can help us?
16What do we know that can help us?
Knowledge of Other Objects and Scenes
Similar Images
Large Set of Loosely Annotated Images
Associated Keywords
Helps tell us how likely the object is to appear
in this image.
Kitten
House
Baby
Puppy
Sand
17What do we know that can help us?
Knowledge of Spatial Layout
Hoiem et al. 2005,2007
Surface Layout
Occlusion Boundaries
Depth Estimates
Helps tell us where and how big the object is
likely to be.
18Context Likelihood of Presence
Contains Cat
No Cat
19Context Likelihood of Presence
Gist
Image
Surface Layout
Likely to contain a cat?
Associated Keywords
House
Kitten
Baby
Puppy
Sand
gist Torralba Oliva 2003
20Context Likelihood of Position
- Predict likelihood that object appears at each
position given surface layout and gist
21Context Likelihood of Size
- Predict height of object based on depth, surface
orientations, gist, and image position
Size from Gist Torralba Oliva 2003
22Rescoring Candidate Objects
Independently Trained Classifiers
Appearance Score (from detector)
Presence Scores
Linear Weights L1-Regularized Logistic Regression
Bounding Box Score
Position Scores
Size Scores
23Context improves detection
Top 5 Before Context
Top 5 After Context
24Context improves detection accuracy
Average Precision (Higher is Better)
25Context changes the error patterns
- More confusion
- Cats and Dogs
- Dogs and Sheep
- Motorbike and Bicycle
- Less confusion
- Objects and background
26II. Compositional Knowledge
- Goal Describe new objects using attributes
learned from other objects.
Work with Ali Farhadi, Ian Endres, David Forsyth
27A name doesnt tell us much
Known Objects
New Object
Name Cat
Name Unknown
Name Dog
Name Horse
28But what if we learn attributes?
Known Objects
New Object
Name Cat
Properties four legs, tail, eyes, ears, furry,
has stripes, gray
Name Unknown
Name Dog
Properties four legs, eyes, ears, snout, tan,
muscular
Name Horse
Properties four legs, tail, mane, eyes, ears,
snout, tan
29We can infer what object is like
Known Objects
New Object
Name Cat
Properties four legs, tail, eyes, ears, furry,
has stripes, gray
Name Unknown
Name Dog
Properties four legs, eyes, snout, tan, muscular
Properties four legs, eyes, ears, snout,
stripes, mane
Name Horse
Properties four legs, tail, mane, eyes, ears,
snout, tan
30Learning Attributes
- Learn to distinguish between things that have an
attribute and things that dont - Train one classifier per attribute
31Learning Correlated Attributes
- Problem
- Many attributes are strongly correlated through
the object category
Most cars are made of metal and have wheels
When we try to learn has wheels, we may
accidentally learn made of metal
Has Wheels, Made of Metal?
32Decorrelating Attributes
- Solution
- Select features that can distinguish between two
classes - Things that have wheels
- Things that do not, but have other attributes in
common
Vs.
No Wheels
Has Wheels
33Learning to Describe Objects
34Describing New Objects
35Identifying Unusual Attributes
Absence of Typical Attributes
752 reports 68 are correct
Presence of Atypical Attributes
951 reports 47 are correct
36Recognition from Description
- Learn new classes by describing them to the
algorithm - Goat Is furry, four legged, has snout, has
horn - 12-Class Classification Accuracy 32.5
- Chance 8
- As good as having 8 visual examples with original
image features
37III. Organizational Knowledge
-
- Goal Help a person organize his photos using
image similarity learned from Flickr groups.
Work with Gang Wang, David Forsyth
38Taming the Digital Explosion
- Photos are easy to take and store.
- But its still difficult to organize them.
39Solution Learn from photo sharing sites
- Billions of images in Flickr
- Hundreds of thousands of categories
40Learn similarity
- Downloads hundreds of groups, each containing
thousands of photos - Train classifier to predict whether a photo is
likely to belong in each group - Gang Wang created super-fast online training
method for kernelized SVMs - Images are similar if they are likely to belong
to the same group
41We can find similar images
Retrieved Images Using Feature Similarity
Retrieved Images Using Similarity Learned from
Flickr
Query Image
42We can say how two images are similar
Fireworks (15.6) Christmas (7.6) Rain (4.0) Water
drops (2.5) Candles (2.0)
Sports (2.6) Dances (2.0) Weddings (1.0) Toys
(0.5) Horses (0.5)
Painting (2.4) Art (1.2) Macro-flowers
(0.9) Hands (0.9) Skateboarding (0.6)
43Conclusions
- Background knowledge is a key missing component
in todays computer vision algorithms - Existing knowledge can make learning easier
- Provides new abilities (say two things are
similar or different) - More complete visual models (better accuracy,
more reasonable mistakes) - Better able to handle new objects and situations
- We need to start designing systems that
accumulate visual knowledge
44Thank you
45(No Transcript)