Reinforcement Learning applied to Vehicle Navigation

About This Presentation

Title:

Reinforcement Learning applied to Vehicle Navigation

Description:

Learning optimal vehicle navigation policy using reinforcement learning in the ... hide features of the world from the agent's current perceptual snapshot ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 29

Provided by: DNA45

Category:

more less

Transcript and Presenter's Notes

Title: Reinforcement Learning applied to Vehicle Navigation

1
Reinforcement Learning applied to Vehicle
Navigation

Nishit Desai (Y3111014)
Madhav Pandya(Y3111028)

2
Contents

Problem Definition
Motivation
Various approaches
Our approach

3
Problem Definition

Learning optimal vehicle navigation policy using
reinforcement learning in the non-deterministic
environment containing hidden states which can
select a utile fraction from perception space for
its decision

4
Motivation

Need for Navigation
Limited Resources
Too much sensory data
Aliased states
Too less sensory data

5
Selective Perception

Inspired from human perceptual system
What subset of features from all the features
determine the next internal state of the agent ?
Types of selective perception
Overt Attention
Covert Attention

6
Hidden State A big hurdle

Sensory limitations hide features of the world
from the agents current perceptual snapshot
Problem can be solved using short term memory

7
Markov v/s Non-Markov

The value of an entity, having Markov property,
cannot be estimated using previous values
Non-Markov entity can be converted to Markov
property using subset creation

8
Model of Environment

X finite set of discrete states of the world
A finite set of discrete actions
T transition function that models the result of
actions
T(xkxi,aj) probability of world state being xk
on execution of action aj from state xi

9
Model of Interface

R Reward Function OS
R(xt,at)rt
O Set of Observations
O Observation Function
O(xt,at)ot1
S Set of Internal States
S State Transition Function
S(st-1,at-1,ot)st

10
(No Transcript)
11
Utility of a state

Utility of a state is a measure of the usefulness
of the state in optimal solution
Utility of a state can be measured as,
U(s) ? maxa?A Q(s,a)
where, Q(s,a) is parameter learnt by
Q-learning

12
Partially Observable Markov Decision Process
(PO-MDP)

Agent cannot determine the exact state of the
world after an action, which is real world
situation
PO-MDP models this situation as a vector of state
occupation probability
(p1, p2, p3,, pN)
?i pi 1

13
Algorithms based on PO-MDP

UDM (Utile Distinction Memory)
NSM (Nearest Sequence Memory)
USM (Utile Suffix Memory)
U - tree

14
(No Transcript)
15
Utile Distinction Memory

Splitting a node based on utility
Generates state space from initial states
Helps reduce the size of the state space
Able to discover hidden states
Split based on statistical test

16
(No Transcript)
17
Nearest Sequence Memory

Concept similar to K-Nearest Neighbour
Instance based learning from variable time window
Keeps record of raw experiences
Selects best action by voting

18
(No Transcript)
19
Utile Suffix Memory

Combination of Above two algorithms
Keeps the history in the data structure called
Prediction Suffix TreeRon94
The leaf nodes of these tree are the states of
PO-MDP
Percept is considered indivisible

20
(No Transcript)
21
U-Tree

This is the most successful algorithm for
attacking the given problem
Implements selective perception
At each node a dimension of the perception space
is stored

22
Suffix Tree Constructed from algorithm
23
Vehicle Navigation Environment

Driving on a highway with vehicle classified with
speed
Slow vehicles
Fast vehicles
Other vehicles dont change their lane
Visibility is limited
And many other parameters...

24
Vehicle Navigation Environment

The agent has a limited number of sensors
Each sensor will contribute a dimension in its
perceptual space
The agent also has a limited number of discrete
actions
New vehicles enter the simulation world
probabilistically

25
Our Approach

We will use U-Tree algorithm as it can handle
multidimensional perceptual space efficiently
After getting good results in this environment,
we will try to improve performance in less
restricted environment

26
Sample Simulation Display
27
Questions?
28
Thank You

Write a Comment

User Comments (0)