SceneBased Vision Localization - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

SceneBased Vision Localization

Description:

Looking for static and reliable landmarks. View-point invariance and measurability ... Filter bank features works better than color and monochrome histograms. ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 39

Provided by: siag

Category:

more less

Transcript and Presenter's Notes

Title: SceneBased Vision Localization

1
Scene-Based Vision Localization

Christian Siagian

2
Outline

Localization as Landmark Detection
Scenes as Landmarks
Scene-based localization system 1
Scene-based localization system 2
Overall discussion

3
Mobile Robot Localization

Localization Landmark Recognition
Characteristics of good landmarks (and the
recognition process)
Uniqueness
Scalability in identifying large sets of unique
landmarks.
Permanency
Looking for static and reliable landmarks
View-point invariance and measurability
Need accurate estimation for observation model
Fast and efficient computation
Real time decision making constraint

4
Detecting Scenes as Landmarks

Treating scene as a whole to obtain global
features, no need for brute force search.
Obtaining the gist of the scene.
Bypassing segmentation and grouping steps.
Not as susceptible to dynamic changes.
Background as source of information, foreground
as source of noise/distraction.
Ideal with peripheral (wide angle) vision system
because foreground area decreases in area
percentage.

5
Gist

Definition
Essence, holistic characteristics of an image
Context information obtained within a eye saccade
(app. 150 ms.)
Evidence of place recognizing cells at
Parahippocampal Place Area (PPA).
No biologically plausible models of Gist yet
Tasks that has been shown to use gist
Scene categorization/context recognition
Region priming, layout recognition

6
Scene-Based Recognition

Advantages
Using more stable features, no need to rely on
permanency of objects chosen
Foreground noise is averaged out.
Scale and rotational invariance
Can add coarse layout information
Disadvantages
Illumination normalization is a must
Lost of detailed information for localization
resolution may result in lost of feature
expressiveness.
Formulation of observation model

7
Example of Approaches

Color Histogram of Omni-view image Ulrich and
Nourbakhsh 2002
Wavelet transform of grid of sub-regions
Torallba 2003
Fourier Transform of grid of sub-regions Oliva
and Torralba 2001
Histogram of learned prime-textures Renniger and
Malik 2004

8
System 1 Torralba, Murphy Freeman, Rubin - 2003

Exploit visual context (low dimensional
representation of image) gist of the image
Argue that color is less constraining than
textural properties of an image and their spatial
layout
Coarse layout of scenes is included
Lends itself to topological map
Platform is a wearable camera, with the user run
through campus to obtain training and testing
data.

9
System 1 Torralba, Murphy Freeman, Rubin - 2003

Representation
Input image is low resolution 160 x 120 blurred
and low contrast image without normalization.
Wavelet image decomposition of 6 orientation x 4
scales x (4x4) grid sub-region 384 features
Reduced using PCA to 80 features -gt a lot of
redundancies

10
System 1 Torralba, Murphy Freeman, Rubin - 2003

Observation model
Each place is modeled as a set of K spherical
Gaussian of features taken from trial runs.

11
System 1 Torralba, Murphy Freeman, Rubin - 2003

Localization/recognition framework
Hidden Markov Model (HMM)
where A(q,q) is a transition matrix (a map),
obtained from trial-runs by counting the number
of transitions to and from each location.
Transition Matrix is further smoothed with
Dirichlet smoothing

12
System 1 Torralba, Murphy Freeman, Rubin - 2003

Overall model

13
System 1 Torralba, Murphy Freeman, Rubin - 2003

Results and discussion
Recognize 63 different locations at gt70
Recognize novel places under place categories
Recognize indoor vs. outdoors.

14
System 1 Torralba, Murphy Freeman, Rubin - 2003

Trial run for familiar locations
Top. The solid line represents the true location,
and the dots represent the posterior probability
associated with each location where shading
intensity is proportional to probability.
Middle. Estimated category of each location
Bottom. Estimated probability of being indoors or
outdoors.

15
System 1 Torralba, Murphy Freeman, Rubin - 2003

Trial run for unfamiliar locations (t 1-1500)
Place recognition system has low confidence
everywhere
Place categorization system is still able to
classify offices, corridors and conference rooms.
After returning to a known environment (after t
1500) performance returns to normal

16
System 1 Torralba, Murphy Freeman, Rubin - 2003

HMM improved performance from 50 to 70.
Filter bank features works better than color and
monochrome histograms. Note they may not be
normalized.

17
System 2 Ulrich and Nourbakhsh - 2002

Inspired by image retrieval techniques, sees
scene features as reduction of storage.
Takes advantage of colors invariance to
orientation and diagnosticity of color. Oliva
Also lends itself to topological map
Platform is a passive robot pulled around the
campus.

18
System 2 Ulrich and Nourbakhsh - 2002

Representation
Panoramic color omni-camera simulation of
peripheral vision. Although the level of
distortion is high which renders edge-based
histogram difficult

19
System 2 Ulrich and Nourbakhsh - 2002

Representation, continued.
Calculate HSV and RGB/normalized RGB values 6
channels
Build 6 one dimensional histograms
All histograms are low-pass filtered with an
average kernel

20
System 2 Ulrich and Nourbakhsh - 2002

Observation Model
Use bin-by-bin Jeffrey Divergence similarity
measures as comparison
Each of the 6 color bands vote for location with
the minimum distance
Calculate the confidence of the vote
Only produce an answer if unanimous and above
confidence threshold vote is reached.

21
System 2 Ulrich and Nourbakhsh - 2002

Localization/recognition framework
Training run-through to collect images from each
location
User also create adjacency map to indicate
topological relationship between locations
Images scene representations are stored to be
compared with incoming features
Speed up computation when only check at previous
location and its neighbors -gt need initial
location and cant deal with kidnapped robot case.

22
System 2 Ulrich and Nourbakhsh - 2002

Example adjacency map

23
System 2 Ulrich and Nourbakhsh - 2002

Result and discussion
Testing at 3 indoors and 1 outdoors locations
produce between 87 and 98 percent correct
classification with no incorrect classification
with high confidence.

24
System 2 Ulrich and Nourbakhsh - 2002

Result and discussion, continued.
Needs 250ms to compare an input image to 100
reference images.
Quick map-making/labeling process since the
dimensions of locations are not specified
System is sensitive to illumination, which is
part of the problem for outdoor navigation.

25
Overall discussion

Key Limitation in scene-based localization
resolution
Eliminating top-down labeling

26
Key Limitation in scene-based localization
resolution

The features does not provide a fine enough pose
estimation, only indication of a sub-region.
Recognizing views from the features may stress
the limits of the system.
Include local features/objects for within the
place localization, inter-place localization

27
Key Limitation in scene-based localization
resolution

System 1 is able to prime detection and location
of objects using context that is provided by the
same scene features
Can help in moving from topological to metric
localization domain
Still need to work on the pose estimation
Have sub-maps within place. Combining them to a
global map could be an issue.

28
System 1 Torralba, Murphy Freeman, Rubin - 2003

Using context as priors to infer attributes of
objects in the image
Second term is the output of the place
recognition module and the first term
can be computed using Bayer rule

29
System 1 Torralba, Murphy Freeman, Rubin - 2003

The authors then made two assumptions
Objects are a priori conditionally independent
Objects properties only influence local features
(and not, to a significant extent, global
features), and thus they are independent of each
other
which allows them to discard local features and
focus only on global effects

30
System 1 Torralba, Murphy Freeman, Rubin - 2003

That is
where the second term is the output of the place
recognition module and the first term
are the object system discussed in the following
slides

31
System 1 Torralba, Murphy Freeman, Rubin - 2003

Using context as priors to detecting objects
Ot,i.is a binary random variable
where Fi(q)P(Ot,i 1Qt q) and can be
obtained from data. The other term, can be
approximated using mixture of spherical Gaussian

32
System 1 Torralba, Murphy Freeman, Rubin - 2003

Results prediction overtime

33
System 1 Torralba, Murphy Freeman, Rubin - 2003

Results ROC curve for each object

34
System 1 Torralba, Murphy Freeman, Rubin - 2003

Using context as priors to locating objects Ot,i
Xt,i
using 8x10 bit mask (Mt,i) for grid occupancy to
provide a crude way to represent location and
size/shape
Where the first term is from the object
detection module

35
System 1 Torralba, Murphy Freeman, Rubin - 2003

and, if the object is absent the second term is
0, if it is present
Adopting product kernel density estimator to
model joint on Vtg Mt,i

36
System 1 Torralba, Murphy Freeman, Rubin - 2003

Resulting expected map is set of weighted
prototypes, where the weights are given by how
similar the image is to the previous ones with
this object and place combination

37
System 1 Torralba, Murphy Freeman, Rubin - 2003

Preliminary results

38
Eliminating top-down labeling

Training is needed to label locations
Need bottom-up clustering for places to work in
truly deal with novel locations
Needs a similarity measure which depend on the
nature of the information
Recognition of gateways
Know when to add new landmark/place
Need to take out useless (indistinct) images
on-line
Landmark quality measure to decide which frames
featureless or dominated by moving objects.