Computer Science Readings: Reinforcement Learning - PowerPoint PPT Presentation

About This Presentation

Title:

Computer Science Readings: Reinforcement Learning

Description:

We look for certain properties of the target object which makes it ... The Problem of Visual Search ... Second module consists of low-level visual routines. ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 45

Provided by: Coz5

Learn more at: http://www.sci.brooklyn.cuny.edu

Category:

more less

Transcript and Presenter's Notes

Title: Computer Science Readings: Reinforcement Learning

1
Computer Science Readings Reinforcement Learning

Presentation by
Arif OZGELEN

2
How do we perform visual search?

Look at usual places the item is likely to be.
If item is small we tend to get closer to the
area that we are searching in order to heighten
our ability to detect.
We look for certain properties of the target
object which makes it distinguishable from the
search space. e.g. color, shape, size, etc

3
A Reinforcement Learning Model of Selective
Visual AttentionACM 2001

Silviu Minut, Autonomous Agents Lab, Department
of Computer Science, Michigan State University.
Sridhar Mahadevan, Autonomous Agents Lab,
Department of Computer Science, Michigan State
University.

4
The Problem of Visual Search

GoalTo find small objects in a large usually
cluttered environment.
e.g. a pen on a desk.
Preferrable to use wide-field of view images.
Identifying small objects require high resolution
images
Results in very high dimensional input array.

5
Natures Method Foveated Vision - I

Fovea Anatomically defined as the central region
of the retina with high density of receptive
cells.
Density of receptive cells decreases
exponentially from the fovea towards periphery.

6
Natures Method Foveated Vision - II

Saccades To make up for the loss of information
incurred by the decrease in resolution in the
periphery, eyes are re-oriented by rapid
ballistic motions (up to 900/s) called saccades.
Fixations Periods between saccades during which
the eyes remain relatively fixed, to process
visual information and to select the next
fixation point.

7
Foveated Vision Eye Scan Patterns
8
Using Foveated Vision

Using foveal image processing reduces the
dimension of the input data but in turn generates
a sequential decision problem
Choosing the next fixation point requires an
efficient gaze control mechanism in order to
direct the gaze to the most salient object.

9
Gaze Control- Salient Features

In order to solve the problem of gaze control,
next fixation point must be decided based on low
resolution images which dont appear in fovea.
Saliency Map Theory (Koch and Ulmann)
Task independent bottom up model for visual
attention.
Itti and Koch- Based on Saliency Map Theory 3
types of feature maps (color map, edge map,
intensity map) are fused together to form
saliency map.
Low resolution images alone are usually not
sufficient for this decision problem.

10
Gaze Control- Control Mechanism Implementation

Implementation of a high level mechanism is
required to control low level reactive attention.
Tsotsos model proposes selective tuning of
visual processing via a hierarchical winner takes
all process.
Information should be integrated from one
fixation to the next for a global understanding
of the scene.
Model top-down gaze control with bottom-up
reactive saliency map processing based on RL.

11
Problem Definition and General Approach - I

Given an object and an environment
How to build a vision agent that learns where the
object is likely to be found.
How to direct its gaze to the object.
Set of Landmarks L0,L1,..,Ln representing
regions in the environment. A policy on this set
directs the camera to the most probable region
containing the target object.

12
Problem Definition and General Approach II

The approach does not require high level feature
detectors.
Policy learned through RL is based on actual
images seen by the camera.
Once the direction has been selected the precise
location of the next fixation point is determined
by means of visual saliency.
Camera takes low resolution/wide-field of view
images at discrete time intervals. Using these
low resolution images the system tries to
recognize the target object using a low
resolution template.

13
Problem Definition and General Approach III

Since reasonable detection of a small sized
object is difficult at low resolution, system
tries to get candidate locations for the target
object.
The foveated vision is simuated by zooming in and
grabbing high resolution/ narrow field-of-view
images centered at the candidate locations which
are compared with a high resolution template of
the target image.

14
Target Object and the Environment
Color template of the target object
(left). Environment (bottom).
15
Reinforcement Learning

The agent may or may not know the priori the
transition probabilities and the reward. In this
case dynamic programming techniques could be
used to compute an optimal policy.

16
Q-Learning

In the visual search problem, the transition
probabilities and the reward are not known to the
agent.
A model free Q-learning algorithm used to find
the optimal policies.

17
States Objects in the Environment

Recorded scan patterns show that people fixate
from object to object therefore it is natural to
define the states as the objects in the
environment.
Paradox Objects must be recognized as worth
attending to, before they are fixated on.
However, an object cannot be recognized prior to
the fixation, since it is perceived at low
resolution.

18
States Clusters of Images

States are defined as clusters of images
representing the same region.
Each image is represented with color histograms
on a reduced number of bins (48 colors for the
lab environment).
Using histogram introduces perceptual aliasing as
two different images have identical histograms.
To reduce aliasing, histograms are computed
distributedly across quadrants. Expected to
reduce aliasing since natural environments are
sufficiently rich.

19
Kullback Distance - I
20
Kullback Distance - II
21
Actions

Actions are defined as the saccades to the most
salient point.
A1,..,A8 to represent 8 directions. In addition
A0 represents the most salient point in the whole
image.

22
Reward

Agent receives positive reward for a saccade
bringing the object in to the field of view.
Agent receives negative reward if the object is
not in the field of view after a saccade.

23
Within Fixation Processing

It is the stage when the eyes fixate on a point
and the agent processes visual information and
decides where the fixate next.
Comprises computation of two components
A set of two feature maps implementing low level
visual attention, used to select the next
fixation point.
A recognizer, used at low resolution for
detection of candidate target objects and at high
resolution for recognition of target.

24
Histogram Intersection

It is a method used to match two images, I
(search image) and M (model).
It is difficult to find a threshold between
similar and dissimilar images in this method
unless the model is pre-specified.

25
Histogram Back-projection

Given two images I and M, histogram back
projection locates M in I.
Color histograms hI and hM are computed on the
same number of color bins.
Operation requires one pass through I. For every
pixel (x,y), B(x,y) R(j) iff I(x,y) falls in
bin j.
Always finds candidates.

26
Histogram Back-Projection Example
27
Symmetry Operator

In order to fixate on objects a symmetry operator
is used since most man-made objects have
vertical, horizontal or radial symmetry.
It computes an edge map first and then has each
pair pi, pj of an edge pixels vote for its
midpoint by (9).

28
Symmetry Map
29
Model Description - I

Each low resolution image is processed by two
main modules
Top module (RL) learns a set of clusters
consisting of images with similar color
histograms. Clusters represents physical regions
and are used as states in the Q-learning method.
Second module consists of low-level visual
routines. Its purpose is to compute color and
symmetry maps for saliency and to recognize the
target object at both low and high resolution.

30
Model Description - II

Each low resolution image is processed by two
main modules
Top module (RL) learns a set of clusters

31
Visual Search Agent Model
32
Algorithm - Initialization
33
Algorithm If object found
34
Algorithm If object not found
35
Results

The agent is trained to learn in which direction
to direct its gaze in order to reach the region
where the target object is most likely to be
found, 400 epochs each.
Epoch a sequence of at most 100 fixations.
Every 5th epoch was used for testing where agent
simply executed the learned policy.
Performance metric was number of fixations.
Within a single trial, starting point was the
same in all test epochs.

36
Experimental Results - I
37
Experimental Results - II
38
Experimental Results - III
39
Sequence of Fixations
40
Experimental Results - IV
41
Experimental Results - V
42
Experimental Results - VI
43
Conclusion

Developed a model of selective attention for a
visual search task which, is a combination of
visual processing and control for attention.
Control is achieved by means of RL over a low
level, visual mechanism of selecting the next
fixation.
Color and symmetry are used for selection of next
fixation and it is not necessary to combine them
in a unique saliency map.
The information is integrated from saccade to
saccade

44
Future Work

Goal is to extend this approach to a mobile
robot. Problem becomes more challenging as the
position consequently the appearance of the
object changes according to the robots position.
Single template is not sufficient.
In this paper it is assumed that the environment
is rich in color so that perceptual aliasing
would not be an issue. Extension to a mobile
robot, will inevitably lead to learning in
inherently perceptually aliased environments.