Title: Scenes and objects
16.870 Object Recognition and Scene Understanding
http//people.csail.mit.edu/torralba/courses/6.870
/6.870.recognition.htm
- Lecture 6
- Scenes and objects
2Class business
3- Week 2 Objects without scenes
- Week 5 Scenes without objects
- Week 6 Scenes and objects
4Why is detection hard?
5Standard approach to scene analysis
6Is local information enough?
7With hundreds of categories
If we have 1000 categories (detectors), and each
detector produces 1 fa every 10 images, we will
have 100 false alarms per image pretty much
garbage
8Is local information even enough?
9Is local information even enough?
Information
Contextual features
Local features
Distance
10The system does not care about the scene, but we
do
We know there is a keyboard present in this scene
even if we cannot see it clearly.
11The multiple personalities of a blob
12The multiple personalities of a blob
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Look-Alikes by Joan Steiner
17Look-Alikes by Joan Steiner
18Look-Alikes by Joan Steiner
19Why is context important?
- Changes the interpretation of an object (or its
function) - Context defines what an unexpected event is
20The influence of an object extends beyond its
physical boundaries
21The context challenge
How far can you go without using an object
detector?
22What are the hidden objects?
1
2
23What are the hidden objects?
Chance 1/30000
24(No Transcript)
25The importance of context
- Cognitive psychology
- Palmer 1975
- Biederman 1981
-
- Computer vision
- Noton and Stark (1971)
- Hanson and Riseman (1978)
- Barrow Tenenbaum (1978)
- Ohta, kanade, Skai (1978)
- Haralick (1983)
- Strat and Fischler (1991)
- Bobick and Pinhanez (1995)
- Campbell et al (1997)
26Biederman 1972
- Arrow appeared before or after picture.
- Selected object from 4 pictures.
27(No Transcript)
28(No Transcript)
29Biederman 1972
- Better accuracy with normal scene and with
pre-cue. - Coherence of surroundings affected object
perception. - But, jumbled pictures had unnatural edge
artifacts.
30Palmer 1975
- Scene preceded object to identify.
- Better identification when preceded by a
semantically consistent scene.
Objects seen for 20, 40, 60 or 120 ms.
31Palmer
- Scenes shown ahead of time for 2 s.
- More accurate recognition of consistent objects
than inconsistent objects. - Similar looking objects were misnamed, showing a
bias effect.
32Loftus Mackworth
- Inconsistent objects fixated earlier and longer.
- Suggested additional processing of objects out of
context. - Similar results found by Friedman (1979).
33De Graef et al. 1990
- Prior results due to memory task?
- Measured eye movements during non-object search
task.
34De Graef et al.
- Inconsistent objects fixated longer than
consistent objects. - Consistency effect only occurred after several
fixations, 2 s. - Consistency effect not initially present in scene
processing.
35Object Detection
- Biederman et al. 1982, relational violations
36(No Transcript)
37Biederman 1982
- Pictures shown for 150 ms.
- Objects in appropriate context were detected more
accurately than objects in an inappropriate
context. - Scene consistency affects object detection.
38Objects and Scenes
- Biedermans violations (1981)
39Support
Golconde Rene Magritte
40Interposition
Blank Check Rene Magritte
41Size
The Listening Room Rene Magritte
42Position, Probability
Personal Values Rene Magritte
43Object Consistencies
Biederman et al (1982), DeGraef(1990).
44Object Consistencies
Examples of inconsistencies
Biederman et al (1982), DeGraef(1990).
45Contextual cueing
Chun Jiang, 1998
46Object priming
Increasing contextual information
Torralba, Sinha, Oliva, VSS 2001
47Object priming
Torralba, Sinha, Oliva, VSS 2001
48Object priming
Car, pedestrian, mailbox,
?
p(object scene)
Torralba, Sinha, Oliva, VSS 2001
49Object priming
Torralba, Sinha, Oliva, VSS 2001
50Examples of consistent scenes (a), inconsistent
scenes (b), and isolated objects and backgrounds
(c) from Davenport Potter, 2004
51But do we really need context?
52Hollingworth Henderson
- Concerns with object detection studies
- Object label could bias results.
- Location cue selectively helpful for consistent
objects. - Controlled for false alarm biases with post-cue
and 2AFC. - Failed to find consistency effects.
53Hollingworth Henderson
- Post-cue
- 2AFC with object labels
- Both consistent or inconsistent.
- 2AFC with token discrimination.
- E.g. sports car or sedan.
- Proposed functional isolation model.
54Who needs context anyway?We can recognize
objects even out of context
Banksy
55Getting stuck
56- We need some signal to go up in order for
top-down to work
57Looking outside the bounding box
Outside the object (contextual features)
Inside the object (intrinsic features)
Object size
Pixels
Parts
Global appearance
Local context
Global context
Kruppa Shiele, (03), Fink Perona
(03) Carbonetto, Freitas, Barnard (03), Kumar,
Hebert, (03) He, Zemel, Carreira-Perpinan (04),
Moore, Essa, Monson, Hayes (99) Strat Fischler
(91), Torralba (03), Murphy, Torralba Freeman
(03)
Agarwal Roth, (02), Moghaddam, Pentland (97),
Turk, Pentland (91),Vidal-Naquet, Ullman,
(03) Heisele, et al, (01), Agarwal Roth, (02),
Kremp, Geman, Amit (02), Dorko, Schmid,
(03) Fergus, Perona, Zisserman (03), Fei Fei,
Fergus, Perona, (03), Schneiderman, Kanade (00),
Lowe (99) Etc.
58CONDOR system
Strat and Fischler (1991)
- Guzman (SEE), 1968
- Noton and Stark 1971
- Hansen Riseman (VISIONS), 1978
- Barrow Tenenbaum 1978
- Brooks (ACRONYM), 1979
- Marr, 1982
- Ohta Kanade, 1978
- Yakimovsky Feldman, 1973
59An Age of Scene Understanding
Ohta Kanade 1978
- Guzman (SEE), 1968
- Noton and Stark 1971
- Hansen Riseman (VISIONS), 1978
- Barrow Tenenbaum 1978
- Brooks (ACRONYM), 1979
- Marr, 1982
- Ohta Kanade, 1978
- Yakimovsky Feldman, 1973
60Current approaches
- Scene to object dependencies
- Object to object dependencies
61Levels of context
- Context in low-level vision
- Part-based models
- Objects relations
Fix graph structures can be useful approximations
Long-range connections Weak constraints Multimodal
62Current approaches
- Scene to object dependencies
- Object to object dependencies
63Many object types co-occur
64 but this co-occurrence has a hidden common
cause the scene
streets
offices
It is easier to first recognize the scene, then
predict object presence, than running local
object classifiers
65The layered structure of scenes
Assuming a human observer standing on the ground
In a display with multiple targets present, the
location of one target constraints the y
coordinate of the remaining targets, but not the
x coordinate.
66The layered structure of scenes
Assuming a human observer standing on the ground
p(x2x1)
p(x)
In a display with multiple targets present, the
location of one target constraints the y
coordinate of the remaining targets, but not the
x coordinate.
Torralba, Oliva, Castelhano, Henderson. In press.
67Detecting faces without a face detector
Torralba Sinha, 01 Torralba, 03
68Context-based vision system for place and object
recognition
We use 17 annotated sequences for training
- Hidden states location (63 values)
- Observations vGt (80 dimensions)
- Transition matrix encodes topology of environment
- Observation model is a mixture of Gaussians
centered on prototypes (100 views per place)
Torralba, Murphy, Freeman and Rubin. ICCV 2003
69Our mobile rig
Torralba, Murphy, Freeman, Rubin. 2003
70Place recognition demo
Shows the category and the identity of The place
when the system is confident. Runs at 4 fps on
Matlab.
Input image (120x160)
71Identification and categorization of known places
Building 400
Outdoor AI-lab
Ground truth
System estimate
Specific location
Location category
Indoor/outdoor
Frame number
72Previous place
Place recognition
Steerable pyr
Object priming
Scene features
Expected object position
73Application of object detection for image
retrieval
Results using the keyboard detector alone
74An integrated model of Scenes, Objects, and Parts
Scene
Ncar
P(Ncar S street)
N
0
1
5
P(Ncar S park)
Scene gist features
N
0
1
5
75Application of object detection for image
retrieval
Results using the keyboard detector alone
Results using both the keyboard detector and the
global scene features
76Global to local
- Use global context to predict objects but there
is no modeling of spatial relationships between
objects.
Keyboards
Murphy, Torralba Freeman (03)
773d Scene Context
Image
World
Hoiem, Efros, Hebert ICCV 2005
783d Scene Context
Ped
Ped
Car
Hoiem, Efros, Hebert ICCV 2005
793D City Modeling using Cognitive Loops
N. Cornelis, B. Leibe, K. Cornelis, L. Van Gool.
CVPR'06
80Current approaches
- Scene to object dependencies
- Object to object dependencies
81Where should I put the silverware?
82Sampling from the labels
83Sampling from the labels
Cf. Hoiem et al Hays, Efros. Siggraph 2007
84Contextual object relationships
Carbonetto, de Freitas Barnard (2004)
Kumar, Hebert (2005)
Torralba Murphy Freeman (2004)
E. Sudderth et al (2005)
Fink Perona (2003)
85Object-Object Relationships
- Fink Perona (NIPS 03)
- Use output of boosting from other objects at
previous iterations as input into boosting for
this iteration
86Pixel labeling using MRFs
- Enforce consistency between neighboring labels,
and between labels and pixels
Carbonetto, de Freitas Barnard, ECCV04
87Beyond nearest-neighbor grids
- Most MRF/CRF models assume nearest-neighbor graph
topology - This cannot capture long-distance correlations
88Dynamically structured trees
- Each node pick its parents(Storkey Williams,
PAMI03) - 2D SCFGs(Pollak, Siskind, Harper Bouman
ICASSP03)
89Object-Object Relationships
- Use latent variables to induce long distance
correlations between labels in a Conditional
Random Field (CRF)
He, Zemel Carreira-Perpinan (04)
90Object-Object Relationships
Kumar Hebert 2005
91Hierarchical Sharing and Context
E. Sudderth, A. Torralba, W. T. Freeman, and A.
Wilsky.
- Scenes share objects
- Objects share parts
- Parts share features
923d Scene Context
Image
Support
Vertical
Sky
V-Center
V-Left
V-Right
V-Porous
V-Solid
Hoiem, Efros, Hebert ICCV 2005
93Detecting difficult objects
Maybe there is a mouse
Office
Start recognizing the scene
Torralba, Murphy, Freeman. NIPS 2004.
94Detecting difficult objects
Detect first simple objects (reliable detectors)
that provide strong contextual constraints to the
target (screen -gt keyboard -gt mouse)
Torralba, Murphy, Freeman. NIPS 2004.
95Detecting difficult objects
Detect first simple objects (reliable detectors)
that provide strong contextual constraints to the
target (screen -gt keyboard -gt mouse)
Torralba, Murphy, Freeman. NIPS 2004.
96BRF for screen/keyboard/mouse
Iteration
97BRF for screen/keyboard/mouse
Iteration
98BRF for screen/keyboard/mouse
Iteration
99BRF for screen/keyboard/mouse
Iteration
100BRF for screen/keyboard/mouse
Iteration
101BRF for car detection topology
102BRF for car detection results
103A car out of context is less of a car
From image
Thresholded beliefs
From detectors
Road
Car
Building
b
F
G
b
F
G
b
F
G
104Context