Programming Robots using HighLevel Task Descriptions

About This Presentation

Title:

Programming Robots using HighLevel Task Descriptions

Description:

'Follow the red ball quickly while avoiding objects.' Generate a controller to perform the task ... Accelerate RL to work quickly enough. Real robots can't try ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 12

Provided by: ajm78

Category:

more less

Transcript and Presenter's Notes

Title: Programming Robots using HighLevel Task Descriptions

1
Programming Robots using High-Level Task
Descriptions

AAAI 2004 Workshop on
Supervisory Control of Learning and Adaptive
Systems

Andrew J. Martignoni III Advisor William D.
Smart http//www.cse.wustl.edu/ajm7/controller/ S
upported by The Boeing Foundation
2
The Problem

Limited natural language task description
Follow the red ball quickly while avoiding
objects.
Generate a controller to perform the task
Execute the controller
Respond to feedback
Turn more aggressively, Drive faster
Improve performance by learning
Use the generated controller to provide examples

3
The Target Audience

People who
Know how to perform a specific task themselves
Want a robot to perform the task
Dont know about robots (in detail)
Dont know how to program

4
Carefully follow the blue ball
blue
kP
Speed
Dist
X
T
T
Y
Distance
R
R
Found
5
Carefully follow the blue ball
6
Current Issues

Responding to Feedback
Meaning of More/Less
Reinforcement Learning
Human as reward function
NOT the same as a pre-programmed reward function,
rewards come early or late!

7
Changing the Rules

Transforms one MDP into another MDP
Rewards become spread out

8
Early or Late Rewards

Will it still converge to the optimal policy?
Can we bound the changes?
Will it converge more slowly?
Can we guarantee it converges?

9
Toy Example

10 States, 2 Actions (Left and Right)
Distribution of reward times, -2, -1, 0, 1, 2
Converges to optimal policy ? Pr(-1) Pr(0)
Pr(1) gt 0
Does this system satisfy these conditions?

10
Collected Data

Let real people give rewards
Measure timings
Fit distribution
What can we infer?

11
Conclusion

Questions
What properties are required?
Symmetry
Reversible actions
Grouped rewards
Directions
Looking at a variety MDPs
Determine common elements

12
The Robot

RWI, inc. B21r robot
2 rings of 24 sonar
24 infrared sensors
56 bump sensors
Laser range finder
Pan-tilt unit w/ camera

13
Dictionary Word List
14
Tuning Performance
kP
1.5
3.0

Tuning keywords
Each component contains a set of words
Categorized by motor output
Changes parameter to modify behavior

P
f(E) kPE
f(E) 1.5E
f(E) 3.0E
Turn - More Aggressively - Less Aggressively
15
Why not say what you mean?

Turn more aggressively
Double the gain on the P-controller connected
to the rotational motor output

f(E) 1.5E
f(E) 3.0E
16
Space of All Possible Controllers
Carefully follow the blue ball
Unsafe
X
Unsafe
Controllers are all of this form
17
Adaptation

Small-scale adaptation
Adjust parameters based on feedback
Moves from one point in the controller space to
another
Possibly closer to the optimal controller

P
18
Reinforcement Learning (RL)

RL Agent
Views state of the world
Receives rewards
Takes action
Learning
Tries to maximize reward
Produces a Value Function
Needs to see every state many times

19
Why RL?

Searches the entire controller space(in theory)
Can we find the optimal controller?
Need help (examples)

RL
Unsafe
Unsafe
X
20
Adaptation using RL

Small scale RL
Replace a component with RL Scott et al. 1992
Agent could be pre-trained
Continues to adapt
Generate initial reward function

RL
21
Issues with Small Scale RL

Issues
Accelerate RL to work quickly enough
Real robots cant try it 10,000 times
Trainers wont say Good Robot 10,000 times
Choose when to substitute in an agent
Ensure safety when using RL

22
Adaptation using RL Revisited

Large scale RL
Use RL to replace the entire controller Smart
2002
Generate example trajectories to learn from using
assembled control diagrams
Much faster than without example Smart 2002
Use feedback (rewards) to train the whole system

23
Issues with Large Scale RL

Issues
Generate an automatic reward function
Harder than single component case
Know when to switch control
Monolithic
Every time step Humphrys 1996
Adapt to user feedback in realistic time
RL makes very small changes
Simulation Transfer experience to real robot

24
Avoid the blue ball and come here carefully
blue
kP
Dist
X
Y
Distance
Found
T
T
R
T
R
R
Sel
kP
Speed
Dist
X
Y
T
Distance
R
Found
25
Avoid the blue ball and come here carefully
26
Keep the wall left input at distance one point
threeand keep driving at speed one tenth
0.1
m/s
1.3
m
T
R
27
Keep the wall left input at distance one point
threeand keep driving at speed one tenth
28
Follow me carefully
P
Speed
Dist
X
T
T
Y
Distance
R
R
Found
29
Follow me carefully
30
Drive to here while avoiding objects
P
Speed
Dist
X
T
T
Y
Distance
R
R
Found
31
Drive to here while avoiding objects

Write a Comment

User Comments (0)