Title: Computer Science Readings: Reinforcement Learning
1Computer Science Readings Reinforcement Learning
- Presentation by
- Arif OZGELEN
2How do we perform visual search?
- Look at usual places the item is likely to be.
- If item is small we tend to get closer to the
area that we are searching in order to heighten
our ability to detect. - We look for certain properties of the target
object which makes it distinguishable from the
search space. e.g. color, shape, size, etc
3A Reinforcement Learning Model of Selective
Visual AttentionACM 2001
- Silviu Minut, Autonomous Agents Lab, Department
of Computer Science, Michigan State University. - Sridhar Mahadevan, Autonomous Agents Lab,
Department of Computer Science, Michigan State
University.
4The Problem of Visual Search
- GoalTo find small objects in a large usually
cluttered environment. - e.g. a pen on a desk.
- Preferrable to use wide-field of view images.
- Identifying small objects require high resolution
images - Results in very high dimensional input array.
5Natures Method Foveated Vision - I
- Fovea Anatomically defined as the central region
of the retina with high density of receptive
cells. - Density of receptive cells decreases
exponentially from the fovea towards periphery.
6Natures Method Foveated Vision - II
- Saccades To make up for the loss of information
incurred by the decrease in resolution in the
periphery, eyes are re-oriented by rapid
ballistic motions (up to 900/s) called saccades. - Fixations Periods between saccades during which
the eyes remain relatively fixed, to process
visual information and to select the next
fixation point.
7Foveated Vision Eye Scan Patterns
8Using Foveated Vision
- Using foveal image processing reduces the
dimension of the input data but in turn generates
a sequential decision problem - Choosing the next fixation point requires an
efficient gaze control mechanism in order to
direct the gaze to the most salient object.
9Gaze Control- Salient Features
- In order to solve the problem of gaze control,
next fixation point must be decided based on low
resolution images which dont appear in fovea. - Saliency Map Theory (Koch and Ulmann)
- Task independent bottom up model for visual
attention. - Itti and Koch- Based on Saliency Map Theory 3
types of feature maps (color map, edge map,
intensity map) are fused together to form
saliency map. - Low resolution images alone are usually not
sufficient for this decision problem.
10Gaze Control- Control Mechanism Implementation
- Implementation of a high level mechanism is
required to control low level reactive attention.
- Tsotsos model proposes selective tuning of
visual processing via a hierarchical winner takes
all process. - Information should be integrated from one
fixation to the next for a global understanding
of the scene. - Model top-down gaze control with bottom-up
reactive saliency map processing based on RL.
11Problem Definition and General Approach - I
- Given an object and an environment
- How to build a vision agent that learns where the
object is likely to be found. - How to direct its gaze to the object.
- Set of Landmarks L0,L1,..,Ln representing
regions in the environment. A policy on this set
directs the camera to the most probable region
containing the target object.
12Problem Definition and General Approach II
- The approach does not require high level feature
detectors. - Policy learned through RL is based on actual
images seen by the camera. - Once the direction has been selected the precise
location of the next fixation point is determined
by means of visual saliency. - Camera takes low resolution/wide-field of view
images at discrete time intervals. Using these
low resolution images the system tries to
recognize the target object using a low
resolution template.
13Problem Definition and General Approach III
- Since reasonable detection of a small sized
object is difficult at low resolution, system
tries to get candidate locations for the target
object. - The foveated vision is simuated by zooming in and
grabbing high resolution/ narrow field-of-view
images centered at the candidate locations which
are compared with a high resolution template of
the target image.
14Target Object and the Environment
Color template of the target object
(left). Environment (bottom).
15Reinforcement Learning
- The agent may or may not know the priori the
transition probabilities and the reward. In this
case dynamic programming techniques could be
used to compute an optimal policy.
16Q-Learning
- In the visual search problem, the transition
probabilities and the reward are not known to the
agent. - A model free Q-learning algorithm used to find
the optimal policies.
17States Objects in the Environment
- Recorded scan patterns show that people fixate
from object to object therefore it is natural to
define the states as the objects in the
environment. - Paradox Objects must be recognized as worth
attending to, before they are fixated on.
However, an object cannot be recognized prior to
the fixation, since it is perceived at low
resolution.
18States Clusters of Images
- States are defined as clusters of images
representing the same region. - Each image is represented with color histograms
on a reduced number of bins (48 colors for the
lab environment). - Using histogram introduces perceptual aliasing as
two different images have identical histograms. - To reduce aliasing, histograms are computed
distributedly across quadrants. Expected to
reduce aliasing since natural environments are
sufficiently rich.
19Kullback Distance - I
20Kullback Distance - II
21Actions
- Actions are defined as the saccades to the most
salient point. - A1,..,A8 to represent 8 directions. In addition
A0 represents the most salient point in the whole
image.
22Reward
- Agent receives positive reward for a saccade
bringing the object in to the field of view. - Agent receives negative reward if the object is
not in the field of view after a saccade.
23Within Fixation Processing
- It is the stage when the eyes fixate on a point
and the agent processes visual information and
decides where the fixate next. - Comprises computation of two components
- A set of two feature maps implementing low level
visual attention, used to select the next
fixation point. - A recognizer, used at low resolution for
detection of candidate target objects and at high
resolution for recognition of target.
24Histogram Intersection
- It is a method used to match two images, I
(search image) and M (model). - It is difficult to find a threshold between
similar and dissimilar images in this method
unless the model is pre-specified.
25Histogram Back-projection
- Given two images I and M, histogram back
projection locates M in I. - Color histograms hI and hM are computed on the
same number of color bins. - Operation requires one pass through I. For every
pixel (x,y), B(x,y) R(j) iff I(x,y) falls in
bin j. - Always finds candidates.
26Histogram Back-Projection Example
27Symmetry Operator
- In order to fixate on objects a symmetry operator
is used since most man-made objects have
vertical, horizontal or radial symmetry. - It computes an edge map first and then has each
pair pi, pj of an edge pixels vote for its
midpoint by (9).
28Symmetry Map
29Model Description - I
- Each low resolution image is processed by two
main modules - Top module (RL) learns a set of clusters
consisting of images with similar color
histograms. Clusters represents physical regions
and are used as states in the Q-learning method. - Second module consists of low-level visual
routines. Its purpose is to compute color and
symmetry maps for saliency and to recognize the
target object at both low and high resolution.
30Model Description - II
- Each low resolution image is processed by two
main modules - Top module (RL) learns a set of clusters
31Visual Search Agent Model
32Algorithm - Initialization
33Algorithm If object found
34Algorithm If object not found
35Results
- The agent is trained to learn in which direction
to direct its gaze in order to reach the region
where the target object is most likely to be
found, 400 epochs each. - Epoch a sequence of at most 100 fixations.
- Every 5th epoch was used for testing where agent
simply executed the learned policy. - Performance metric was number of fixations.
- Within a single trial, starting point was the
same in all test epochs.
36Experimental Results - I
37Experimental Results - II
38Experimental Results - III
39Sequence of Fixations
40Experimental Results - IV
41Experimental Results - V
42Experimental Results - VI
43Conclusion
- Developed a model of selective attention for a
visual search task which, is a combination of
visual processing and control for attention. - Control is achieved by means of RL over a low
level, visual mechanism of selecting the next
fixation. - Color and symmetry are used for selection of next
fixation and it is not necessary to combine them
in a unique saliency map. - The information is integrated from saccade to
saccade
44Future Work
- Goal is to extend this approach to a mobile
robot. Problem becomes more challenging as the
position consequently the appearance of the
object changes according to the robots position.
Single template is not sufficient. - In this paper it is assumed that the environment
is rich in color so that perceptual aliasing
would not be an issue. Extension to a mobile
robot, will inevitably lead to learning in
inherently perceptually aliased environments.