Title: SceneBased Vision Localization
1Scene-Based Vision Localization
2Outline
- Localization as Landmark Detection
- Scenes as Landmarks
- Scene-based localization system 1
- Scene-based localization system 2
- Overall discussion
3Mobile Robot Localization
- Localization Landmark Recognition
- Characteristics of good landmarks (and the
recognition process) - Uniqueness
- Scalability in identifying large sets of unique
landmarks. - Permanency
- Looking for static and reliable landmarks
- View-point invariance and measurability
- Need accurate estimation for observation model
- Fast and efficient computation
- Real time decision making constraint
4Detecting Scenes as Landmarks
- Treating scene as a whole to obtain global
features, no need for brute force search. - Obtaining the gist of the scene.
- Bypassing segmentation and grouping steps.
- Not as susceptible to dynamic changes.
- Background as source of information, foreground
as source of noise/distraction. - Ideal with peripheral (wide angle) vision system
because foreground area decreases in area
percentage.
5Gist
- Definition
- Essence, holistic characteristics of an image
- Context information obtained within a eye saccade
(app. 150 ms.) - Evidence of place recognizing cells at
Parahippocampal Place Area (PPA). - No biologically plausible models of Gist yet
- Tasks that has been shown to use gist
- Scene categorization/context recognition
- Region priming, layout recognition
6Scene-Based Recognition
- Advantages
- Using more stable features, no need to rely on
permanency of objects chosen - Foreground noise is averaged out.
- Scale and rotational invariance
- Can add coarse layout information
- Disadvantages
- Illumination normalization is a must
- Lost of detailed information for localization
resolution may result in lost of feature
expressiveness. - Formulation of observation model
7Example of Approaches
- Color Histogram of Omni-view image Ulrich and
Nourbakhsh 2002 - Wavelet transform of grid of sub-regions
Torallba 2003 - Fourier Transform of grid of sub-regions Oliva
and Torralba 2001 - Histogram of learned prime-textures Renniger and
Malik 2004
8System 1 Torralba, Murphy Freeman, Rubin - 2003
- Exploit visual context (low dimensional
representation of image) gist of the image - Argue that color is less constraining than
textural properties of an image and their spatial
layout - Coarse layout of scenes is included
- Lends itself to topological map
- Platform is a wearable camera, with the user run
through campus to obtain training and testing
data.
9System 1 Torralba, Murphy Freeman, Rubin - 2003
- Representation
- Input image is low resolution 160 x 120 blurred
and low contrast image without normalization. - Wavelet image decomposition of 6 orientation x 4
scales x (4x4) grid sub-region 384 features - Reduced using PCA to 80 features -gt a lot of
redundancies
10System 1 Torralba, Murphy Freeman, Rubin - 2003
- Observation model
- Each place is modeled as a set of K spherical
Gaussian of features taken from trial runs.
11System 1 Torralba, Murphy Freeman, Rubin - 2003
- Localization/recognition framework
- Hidden Markov Model (HMM)
-
- where A(q,q) is a transition matrix (a map),
obtained from trial-runs by counting the number
of transitions to and from each location. - Transition Matrix is further smoothed with
Dirichlet smoothing
12System 1 Torralba, Murphy Freeman, Rubin - 2003
13System 1 Torralba, Murphy Freeman, Rubin - 2003
- Results and discussion
- Recognize 63 different locations at gt70
- Recognize novel places under place categories
- Recognize indoor vs. outdoors.
14System 1 Torralba, Murphy Freeman, Rubin - 2003
- Trial run for familiar locations
- Top. The solid line represents the true location,
and the dots represent the posterior probability
associated with each location where shading
intensity is proportional to probability. - Middle. Estimated category of each location
- Bottom. Estimated probability of being indoors or
outdoors.
15System 1 Torralba, Murphy Freeman, Rubin - 2003
- Trial run for unfamiliar locations (t 1-1500)
- Place recognition system has low confidence
everywhere - Place categorization system is still able to
classify offices, corridors and conference rooms.
- After returning to a known environment (after t
1500) performance returns to normal
16System 1 Torralba, Murphy Freeman, Rubin - 2003
- HMM improved performance from 50 to 70.
- Filter bank features works better than color and
monochrome histograms. Note they may not be
normalized.
17System 2 Ulrich and Nourbakhsh - 2002
- Inspired by image retrieval techniques, sees
scene features as reduction of storage. - Takes advantage of colors invariance to
orientation and diagnosticity of color. Oliva - Also lends itself to topological map
- Platform is a passive robot pulled around the
campus.
18System 2 Ulrich and Nourbakhsh - 2002
- Representation
- Panoramic color omni-camera simulation of
peripheral vision. Although the level of
distortion is high which renders edge-based
histogram difficult
19System 2 Ulrich and Nourbakhsh - 2002
- Representation, continued.
- Calculate HSV and RGB/normalized RGB values 6
channels - Build 6 one dimensional histograms
- All histograms are low-pass filtered with an
average kernel
20System 2 Ulrich and Nourbakhsh - 2002
- Observation Model
- Use bin-by-bin Jeffrey Divergence similarity
measures as comparison -
- Each of the 6 color bands vote for location with
the minimum distance - Calculate the confidence of the vote
-
- Only produce an answer if unanimous and above
confidence threshold vote is reached.
21System 2 Ulrich and Nourbakhsh - 2002
- Localization/recognition framework
- Training run-through to collect images from each
location - User also create adjacency map to indicate
topological relationship between locations - Images scene representations are stored to be
compared with incoming features - Speed up computation when only check at previous
location and its neighbors -gt need initial
location and cant deal with kidnapped robot case.
22System 2 Ulrich and Nourbakhsh - 2002
23System 2 Ulrich and Nourbakhsh - 2002
- Result and discussion
- Testing at 3 indoors and 1 outdoors locations
produce between 87 and 98 percent correct
classification with no incorrect classification
with high confidence.
24System 2 Ulrich and Nourbakhsh - 2002
- Result and discussion, continued.
- Needs 250ms to compare an input image to 100
reference images. - Quick map-making/labeling process since the
dimensions of locations are not specified - System is sensitive to illumination, which is
part of the problem for outdoor navigation.
25Overall discussion
- Key Limitation in scene-based localization
resolution - Eliminating top-down labeling
26Key Limitation in scene-based localization
resolution
- The features does not provide a fine enough pose
estimation, only indication of a sub-region. - Recognizing views from the features may stress
the limits of the system. - Include local features/objects for within the
place localization, inter-place localization
27Key Limitation in scene-based localization
resolution
- System 1 is able to prime detection and location
of objects using context that is provided by the
same scene features - Can help in moving from topological to metric
localization domain - Still need to work on the pose estimation
- Have sub-maps within place. Combining them to a
global map could be an issue.
28System 1 Torralba, Murphy Freeman, Rubin - 2003
- Using context as priors to infer attributes of
objects in the image - Second term is the output of the place
- recognition module and the first term
- can be computed using Bayer rule
29System 1 Torralba, Murphy Freeman, Rubin - 2003
- The authors then made two assumptions
- Objects are a priori conditionally independent
- Objects properties only influence local features
(and not, to a significant extent, global
features), and thus they are independent of each
other - which allows them to discard local features and
focus only on global effects
30System 1 Torralba, Murphy Freeman, Rubin - 2003
- That is
- where the second term is the output of the place
recognition module and the first term - are the object system discussed in the following
slides
31System 1 Torralba, Murphy Freeman, Rubin - 2003
- Using context as priors to detecting objects
Ot,i.is a binary random variable - where Fi(q)P(Ot,i 1Qt q) and can be
obtained from data. The other term, can be
approximated using mixture of spherical Gaussian -
32System 1 Torralba, Murphy Freeman, Rubin - 2003
- Results prediction overtime
33System 1 Torralba, Murphy Freeman, Rubin - 2003
- Results ROC curve for each object
34System 1 Torralba, Murphy Freeman, Rubin - 2003
- Using context as priors to locating objects Ot,i
Xt,i - using 8x10 bit mask (Mt,i) for grid occupancy to
provide a crude way to represent location and
size/shape - Where the first term is from the object
detection module
35System 1 Torralba, Murphy Freeman, Rubin - 2003
- and, if the object is absent the second term is
0, if it is present - Adopting product kernel density estimator to
model joint on Vtg Mt,i
36System 1 Torralba, Murphy Freeman, Rubin - 2003
- Resulting expected map is set of weighted
prototypes, where the weights are given by how
similar the image is to the previous ones with
this object and place combination
37System 1 Torralba, Murphy Freeman, Rubin - 2003
38Eliminating top-down labeling
- Training is needed to label locations
- Need bottom-up clustering for places to work in
truly deal with novel locations - Needs a similarity measure which depend on the
nature of the information - Recognition of gateways
- Know when to add new landmark/place
- Need to take out useless (indistinct) images
on-line - Landmark quality measure to decide which frames
featureless or dominated by moving objects.