Computer Science Readings: Reinforcement Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Computer Science Readings: Reinforcement Learning

Description:

We look for certain properties of the target object which makes it ... The Problem of Visual Search ... Second module consists of low-level visual routines. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 45
Provided by: Coz5
Category:

less

Transcript and Presenter's Notes

Title: Computer Science Readings: Reinforcement Learning


1
Computer Science Readings Reinforcement Learning
  • Presentation by
  • Arif OZGELEN

2
How do we perform visual search?
  • Look at usual places the item is likely to be.
  • If item is small we tend to get closer to the
    area that we are searching in order to heighten
    our ability to detect.
  • We look for certain properties of the target
    object which makes it distinguishable from the
    search space. e.g. color, shape, size, etc

3
A Reinforcement Learning Model of Selective
Visual AttentionACM 2001
  • Silviu Minut, Autonomous Agents Lab, Department
    of Computer Science, Michigan State University.
  • Sridhar Mahadevan, Autonomous Agents Lab,
    Department of Computer Science, Michigan State
    University.

4
The Problem of Visual Search
  • GoalTo find small objects in a large usually
    cluttered environment.
  • e.g. a pen on a desk.
  • Preferrable to use wide-field of view images.
  • Identifying small objects require high resolution
    images
  • Results in very high dimensional input array.

5
Natures Method Foveated Vision - I
  • Fovea Anatomically defined as the central region
    of the retina with high density of receptive
    cells.
  • Density of receptive cells decreases
    exponentially from the fovea towards periphery.

6
Natures Method Foveated Vision - II
  • Saccades To make up for the loss of information
    incurred by the decrease in resolution in the
    periphery, eyes are re-oriented by rapid
    ballistic motions (up to 900/s) called saccades.
  • Fixations Periods between saccades during which
    the eyes remain relatively fixed, to process
    visual information and to select the next
    fixation point.

7
Foveated Vision Eye Scan Patterns
8
Using Foveated Vision
  • Using foveal image processing reduces the
    dimension of the input data but in turn generates
    a sequential decision problem
  • Choosing the next fixation point requires an
    efficient gaze control mechanism in order to
    direct the gaze to the most salient object.

9
Gaze Control- Salient Features
  • In order to solve the problem of gaze control,
    next fixation point must be decided based on low
    resolution images which dont appear in fovea.
  • Saliency Map Theory (Koch and Ulmann)
  • Task independent bottom up model for visual
    attention.
  • Itti and Koch- Based on Saliency Map Theory 3
    types of feature maps (color map, edge map,
    intensity map) are fused together to form
    saliency map.
  • Low resolution images alone are usually not
    sufficient for this decision problem.

10
Gaze Control- Control Mechanism Implementation
  • Implementation of a high level mechanism is
    required to control low level reactive attention.
  • Tsotsos model proposes selective tuning of
    visual processing via a hierarchical winner takes
    all process.
  • Information should be integrated from one
    fixation to the next for a global understanding
    of the scene.
  • Model top-down gaze control with bottom-up
    reactive saliency map processing based on RL.

11
Problem Definition and General Approach - I
  • Given an object and an environment
  • How to build a vision agent that learns where the
    object is likely to be found.
  • How to direct its gaze to the object.
  • Set of Landmarks L0,L1,..,Ln representing
    regions in the environment. A policy on this set
    directs the camera to the most probable region
    containing the target object.

12
Problem Definition and General Approach II
  • The approach does not require high level feature
    detectors.
  • Policy learned through RL is based on actual
    images seen by the camera.
  • Once the direction has been selected the precise
    location of the next fixation point is determined
    by means of visual saliency.
  • Camera takes low resolution/wide-field of view
    images at discrete time intervals. Using these
    low resolution images the system tries to
    recognize the target object using a low
    resolution template.

13
Problem Definition and General Approach III
  • Since reasonable detection of a small sized
    object is difficult at low resolution, system
    tries to get candidate locations for the target
    object.
  • The foveated vision is simuated by zooming in and
    grabbing high resolution/ narrow field-of-view
    images centered at the candidate locations which
    are compared with a high resolution template of
    the target image.

14
Target Object and the Environment
Color template of the target object
(left). Environment (bottom).
15
Reinforcement Learning
  • The agent may or may not know the priori the
    transition probabilities and the reward. In this
    case dynamic programming techniques could be
    used to compute an optimal policy.

16
Q-Learning
  • In the visual search problem, the transition
    probabilities and the reward are not known to the
    agent.
  • A model free Q-learning algorithm used to find
    the optimal policies.

17
States Objects in the Environment
  • Recorded scan patterns show that people fixate
    from object to object therefore it is natural to
    define the states as the objects in the
    environment.
  • Paradox Objects must be recognized as worth
    attending to, before they are fixated on.
    However, an object cannot be recognized prior to
    the fixation, since it is perceived at low
    resolution.

18
States Clusters of Images
  • States are defined as clusters of images
    representing the same region.
  • Each image is represented with color histograms
    on a reduced number of bins (48 colors for the
    lab environment).
  • Using histogram introduces perceptual aliasing as
    two different images have identical histograms.
  • To reduce aliasing, histograms are computed
    distributedly across quadrants. Expected to
    reduce aliasing since natural environments are
    sufficiently rich.

19
Kullback Distance - I
20
Kullback Distance - II
21
Actions
  • Actions are defined as the saccades to the most
    salient point.
  • A1,..,A8 to represent 8 directions. In addition
    A0 represents the most salient point in the whole
    image.

22
Reward
  • Agent receives positive reward for a saccade
    bringing the object in to the field of view.
  • Agent receives negative reward if the object is
    not in the field of view after a saccade.

23
Within Fixation Processing
  • It is the stage when the eyes fixate on a point
    and the agent processes visual information and
    decides where the fixate next.
  • Comprises computation of two components
  • A set of two feature maps implementing low level
    visual attention, used to select the next
    fixation point.
  • A recognizer, used at low resolution for
    detection of candidate target objects and at high
    resolution for recognition of target.

24
Histogram Intersection
  • It is a method used to match two images, I
    (search image) and M (model).
  • It is difficult to find a threshold between
    similar and dissimilar images in this method
    unless the model is pre-specified.

25
Histogram Back-projection
  • Given two images I and M, histogram back
    projection locates M in I.
  • Color histograms hI and hM are computed on the
    same number of color bins.
  • Operation requires one pass through I. For every
    pixel (x,y), B(x,y) R(j) iff I(x,y) falls in
    bin j.
  • Always finds candidates.

26
Histogram Back-Projection Example
27
Symmetry Operator
  • In order to fixate on objects a symmetry operator
    is used since most man-made objects have
    vertical, horizontal or radial symmetry.
  • It computes an edge map first and then has each
    pair pi, pj of an edge pixels vote for its
    midpoint by (9).

28
Symmetry Map
29
Model Description - I
  • Each low resolution image is processed by two
    main modules
  • Top module (RL) learns a set of clusters
    consisting of images with similar color
    histograms. Clusters represents physical regions
    and are used as states in the Q-learning method.
  • Second module consists of low-level visual
    routines. Its purpose is to compute color and
    symmetry maps for saliency and to recognize the
    target object at both low and high resolution.

30
Model Description - II
  • Each low resolution image is processed by two
    main modules
  • Top module (RL) learns a set of clusters

31
Visual Search Agent Model
32
Algorithm - Initialization
33
Algorithm If object found
34
Algorithm If object not found
35
Results
  • The agent is trained to learn in which direction
    to direct its gaze in order to reach the region
    where the target object is most likely to be
    found, 400 epochs each.
  • Epoch a sequence of at most 100 fixations.
  • Every 5th epoch was used for testing where agent
    simply executed the learned policy.
  • Performance metric was number of fixations.
  • Within a single trial, starting point was the
    same in all test epochs.

36
Experimental Results - I
37
Experimental Results - II
38
Experimental Results - III
39
Sequence of Fixations
40
Experimental Results - IV
41
Experimental Results - V
42
Experimental Results - VI
43
Conclusion
  • Developed a model of selective attention for a
    visual search task which, is a combination of
    visual processing and control for attention.
  • Control is achieved by means of RL over a low
    level, visual mechanism of selecting the next
    fixation.
  • Color and symmetry are used for selection of next
    fixation and it is not necessary to combine them
    in a unique saliency map.
  • The information is integrated from saccade to
    saccade

44
Future Work
  • Goal is to extend this approach to a mobile
    robot. Problem becomes more challenging as the
    position consequently the appearance of the
    object changes according to the robots position.
    Single template is not sufficient.
  • In this paper it is assumed that the environment
    is rich in color so that perceptual aliasing
    would not be an issue. Extension to a mobile
    robot, will inevitably lead to learning in
    inherently perceptually aliased environments.
Write a Comment
User Comments (0)
About PowerShow.com