Reinforcement Learning applied to Vehicle Navigation - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Reinforcement Learning applied to Vehicle Navigation

Description:

Learning optimal vehicle navigation policy using reinforcement learning in the ... hide features of the world from the agent's current perceptual snapshot ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 29
Provided by: DNA45
Category:

less

Transcript and Presenter's Notes

Title: Reinforcement Learning applied to Vehicle Navigation


1
Reinforcement Learning applied to Vehicle
Navigation
  • Nishit Desai (Y3111014)
  • Madhav Pandya(Y3111028)

2
Contents
  • Problem Definition
  • Motivation
  • Various approaches
  • Our approach

3
Problem Definition
  • Learning optimal vehicle navigation policy using
    reinforcement learning in the non-deterministic
    environment containing hidden states which can
    select a utile fraction from perception space for
    its decision

4
Motivation
  • Need for Navigation
  • Limited Resources
  • Too much sensory data
  • Aliased states
  • Too less sensory data

5
Selective Perception
  • Inspired from human perceptual system
  • What subset of features from all the features
    determine the next internal state of the agent ?
  • Types of selective perception
  • Overt Attention
  • Covert Attention

6
Hidden State A big hurdle
  • Sensory limitations hide features of the world
    from the agents current perceptual snapshot
  • Problem can be solved using short term memory

7
Markov v/s Non-Markov
  • The value of an entity, having Markov property,
    cannot be estimated using previous values
  • Non-Markov entity can be converted to Markov
    property using subset creation

8
Model of Environment
  • X finite set of discrete states of the world
  • A finite set of discrete actions
  • T transition function that models the result of
    actions
  • T(xkxi,aj) probability of world state being xk
    on execution of action aj from state xi

9
Model of Interface
  • R Reward Function OS
  • R(xt,at)rt
  • O Set of Observations
  • O Observation Function
  • O(xt,at)ot1
  • S Set of Internal States
  • S State Transition Function
  • S(st-1,at-1,ot)st

10
(No Transcript)
11
Utility of a state
  • Utility of a state is a measure of the usefulness
    of the state in optimal solution
  • Utility of a state can be measured as,
  • U(s) ? maxa?A Q(s,a)
  • where, Q(s,a) is parameter learnt by
  • Q-learning

12
Partially Observable Markov Decision Process
(PO-MDP)
  • Agent cannot determine the exact state of the
    world after an action, which is real world
    situation
  • PO-MDP models this situation as a vector of state
    occupation probability
  • (p1, p2, p3,, pN)
  • ?i pi 1

13
Algorithms based on PO-MDP
  • UDM (Utile Distinction Memory)
  • NSM (Nearest Sequence Memory)
  • USM (Utile Suffix Memory)
  • U - tree

14
(No Transcript)
15
Utile Distinction Memory
  • Splitting a node based on utility
  • Generates state space from initial states
  • Helps reduce the size of the state space
  • Able to discover hidden states
  • Split based on statistical test

16
(No Transcript)
17
Nearest Sequence Memory
  • Concept similar to K-Nearest Neighbour
  • Instance based learning from variable time window
  • Keeps record of raw experiences
  • Selects best action by voting

18
(No Transcript)
19
Utile Suffix Memory
  • Combination of Above two algorithms
  • Keeps the history in the data structure called
    Prediction Suffix TreeRon94
  • The leaf nodes of these tree are the states of
    PO-MDP
  • Percept is considered indivisible











































































































































































































































































20
(No Transcript)
21
U-Tree
  • This is the most successful algorithm for
    attacking the given problem
  • Implements selective perception
  • At each node a dimension of the perception space
    is stored

22
Suffix Tree Constructed from algorithm
23
Vehicle Navigation Environment
  • Driving on a highway with vehicle classified with
    speed
  • Slow vehicles
  • Fast vehicles
  • Other vehicles dont change their lane
  • Visibility is limited
  • And many other parameters...

24
Vehicle Navigation Environment
  • The agent has a limited number of sensors
  • Each sensor will contribute a dimension in its
    perceptual space
  • The agent also has a limited number of discrete
    actions
  • New vehicles enter the simulation world
    probabilistically

25
Our Approach
  • We will use U-Tree algorithm as it can handle
    multidimensional perceptual space efficiently
  • After getting good results in this environment,
    we will try to improve performance in less
    restricted environment

26
Sample Simulation Display
27
Questions?
28
Thank You
Write a Comment
User Comments (0)
About PowerShow.com