Adaptive Intelligent Mobile Robotics - PowerPoint PPT Presentation

About This Presentation

Title:

Adaptive Intelligent Mobile Robotics

Description:

use meta-knowledge to be conservative about dynamic-programming updates ... Conservative backups. Bootstrapping value function really helps. Initial supplied ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 36

Provided by: lesliepack

Learn more at: https://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Intelligent Mobile Robotics

1

Adaptive Intelligent Mobile Robotics
William D. Smart, Presenter
Leslie Pack Kaelbling, PI
Artificial Intelligence Laboratory
MIT

2
Progress to Date

Fast bootstrapped reinforcement learning
algorithmic techniques
demo on robot
Optical-flow based navigation
flow algorithm implemented
pilot navigation experiments on robot
pilot navigation experiments in simulation testbed

3
Making RL Really Work

Typical RL methods require far too much data to
be practical in an online setting. Address the
problem by
strong generalization techniques
using human input to bootstrap
Let humans do what theyre good at
Let learning algorithms do what theyre good at

4
JAQL

Learning a value function in a continuous state
and action space
based on locally weighted regression (fancy
version of nearest neighbor)
algorithm knows what it knows
use meta-knowledge to be conservative about
dynamic-programming updates

5
Problems with Q-Learning on Robots

Huge state spaces/sparse data
Continuous states and actions
Slow to propagate values
Safety during exploration
Lack of initial knowledge

6
Value Function Approximation

Use a function approximator instead of a table
generalization
deals with continuous spaces and actions
Q-learning with VFA has been shown to diverge,
even in benign cases
Which function approximator should we use to
minimize problems?

s
Q(s,a)
F
a
7
Locally Weighted Regression

Store all previous data points
Given a query point, find k nearest points
Fit a locally linear model to these points,
giving closer ones more weight
Use KD-trees to make lookups more efficient
Fast learning from a single data point

8
Locally Weighted Regression

Original function

9
Locally Weighted Regression

Bandwidth 0.1, 500 training points

10
Problems with ApproximateQ-Learning

Errors are amplified by backups

11
One Source of Errors
12
Independent Variable Hull

Interpolation is safe extrapolation is not, so
construct hull around known points
do local regression if the query point is within
the hull
give a default prediction if not

13
Recap

Use LWR to represent the value function
generalization
continuous spaces
Use IVH and dont know
conservative predictions
safer backups

14
Incorporating Human Input

Humans can help a lot, even if they cant perform
the task very well.
Provide some initial successful trajectories
through the space
Trajectories are not used for supervised
learning, but to guide the reinforcement-learning
methods through useful parts of the space
Learn models of the dynamics of the world and of
the reward structure
Once learned models are good, use them to update
the value function and policy as well.

15
Give Some Trajectories

Supply an example policy
Need not be optimal and might be very wrong
Code or human-controlled
Used to generate experience
Follow example policy and record experiences
Shows learner interesting parts of the space
Bad initial policies might be better

16
Two Learning Phases
Phase One
A
R
O
Learning System
17
Two Learning Phases
Phase Two
A
R
O
Learning System
18
What does this Give Us?

Natural way to insert human knowledge
Keeps robot safe in early stages of learning
Bootstraps information into the Q-function

19
Experimental ResultsCorridor-Following
20
Corridor-Following

3 continuous state dimensions
corridor angle
offset from middle
distance to end of corridor
1 continuous action dimension
rotation velocity
Supplied example policy
Average 110 steps to goal

21
Corridor-Following

Experimental setup
Initial training runs start from roughly the
middle of the corridor
Translation speed has a fixed policy
Evaluation on a number of set starting points
Reward
10 at end of corridor
0 everywhere else

22
Corridor-Following
Phase 1
Phase 2
Average training
Best possible
23
Corridor Following Initial Policy
24
Corridor Following After Phase 1
25
Corridor Following After Phase 1
26
Corridor Following After Phase 2
27
Conclusions

VFA can be made more stable
Locally weighted regression
Independent variable hull
Conservative backups
Bootstrapping value function really helps
Initial supplied trajectories
Two learning phases

28
Optical Flow

Get range information visually by computing
optical flow field
nearer objects cause flow of higher magnitude
expansion pattern means youre going to hit
rate of expansion tells you when
elegant control laws based on center and rate of
expansion (derived from human and fly behavior)

29
Approaching a Wall
30
Balance Strategy