Title: Adaptive Intelligent Mobile Robotics
1- Adaptive Intelligent Mobile Robotics
- William D. Smart, Presenter
- Leslie Pack Kaelbling, PI
- Artificial Intelligence Laboratory
- MIT
2Progress to Date
- Fast bootstrapped reinforcement learning
- algorithmic techniques
- demo on robot
- Optical-flow based navigation
- flow algorithm implemented
- pilot navigation experiments on robot
- pilot navigation experiments in simulation testbed
3Making RL Really Work
- Typical RL methods require far too much data to
be practical in an online setting. Address the
problem by - strong generalization techniques
- using human input to bootstrap
- Let humans do what theyre good at
- Let learning algorithms do what theyre good at
4JAQL
- Learning a value function in a continuous state
and action space - based on locally weighted regression (fancy
version of nearest neighbor) - algorithm knows what it knows
- use meta-knowledge to be conservative about
dynamic-programming updates
5Problems with Q-Learning on Robots
- Huge state spaces/sparse data
- Continuous states and actions
- Slow to propagate values
- Safety during exploration
- Lack of initial knowledge
6Value Function Approximation
- Use a function approximator instead of a table
- generalization
- deals with continuous spaces and actions
- Q-learning with VFA has been shown to diverge,
even in benign cases - Which function approximator should we use to
minimize problems?
s
Q(s,a)
F
a
7Locally Weighted Regression
- Store all previous data points
- Given a query point, find k nearest points
- Fit a locally linear model to these points,
giving closer ones more weight - Use KD-trees to make lookups more efficient
- Fast learning from a single data point
8Locally Weighted Regression
9Locally Weighted Regression
- Bandwidth 0.1, 500 training points
10Problems with ApproximateQ-Learning
- Errors are amplified by backups
11One Source of Errors
12Independent Variable Hull
- Interpolation is safe extrapolation is not, so
- construct hull around known points
- do local regression if the query point is within
the hull - give a default prediction if not
-
13Recap
- Use LWR to represent the value function
- generalization
- continuous spaces
- Use IVH and dont know
- conservative predictions
- safer backups
14Incorporating Human Input
- Humans can help a lot, even if they cant perform
the task very well. - Provide some initial successful trajectories
through the space - Trajectories are not used for supervised
learning, but to guide the reinforcement-learning
methods through useful parts of the space - Learn models of the dynamics of the world and of
the reward structure - Once learned models are good, use them to update
the value function and policy as well.
15Give Some Trajectories
- Supply an example policy
- Need not be optimal and might be very wrong
- Code or human-controlled
- Used to generate experience
- Follow example policy and record experiences
- Shows learner interesting parts of the space
- Bad initial policies might be better
16Two Learning Phases
Phase One
A
R
O
Learning System
17Two Learning Phases
Phase Two
A
R
O
Learning System
18What does this Give Us?
- Natural way to insert human knowledge
- Keeps robot safe in early stages of learning
- Bootstraps information into the Q-function
19Experimental ResultsCorridor-Following
20Corridor-Following
- 3 continuous state dimensions
- corridor angle
- offset from middle
- distance to end of corridor
- 1 continuous action dimension
- rotation velocity
- Supplied example policy
- Average 110 steps to goal
21Corridor-Following
- Experimental setup
- Initial training runs start from roughly the
middle of the corridor - Translation speed has a fixed policy
- Evaluation on a number of set starting points
- Reward
- 10 at end of corridor
- 0 everywhere else
22Corridor-Following
Phase 1
Phase 2
Average training
Best possible
23Corridor Following Initial Policy
24Corridor Following After Phase 1
25Corridor Following After Phase 1
26Corridor Following After Phase 2
27Conclusions
- VFA can be made more stable
- Locally weighted regression
- Independent variable hull
- Conservative backups
- Bootstrapping value function really helps
- Initial supplied trajectories
- Two learning phases
28Optical Flow
- Get range information visually by computing
optical flow field - nearer objects cause flow of higher magnitude
- expansion pattern means youre going to hit
- rate of expansion tells you when
- elegant control laws based on center and rate of
expansion (derived from human and fly behavior)
29Approaching a Wall
30Balance Strategy
- Simple obstacle-avoidance strategy
- compute flow field
- compute average magnitude of flow in each
hemi-field - turn away from the side with higher magnitude
(because it has closer objects)
31Balance Strategy in Action
32Crystal Space
33Crystal Space
34Crystal Space
35Next Steps
- Extend RL architecture to include model-learning
and planning - Apply RL techniques to tune parameters in
optical-flow - Build topological maps using visual information
- Build highly complex simulated environment
- Integrate planning and learning in multi-layer
system