Title: Programming Robots using HighLevel Task Descriptions
1Programming Robots using High-Level Task
Descriptions
- AAAI 2004 Workshop on
- Supervisory Control of Learning and Adaptive
Systems
Andrew J. Martignoni III Advisor William D.
Smart http//www.cse.wustl.edu/ajm7/controller/ S
upported by The Boeing Foundation
2The Problem
- Limited natural language task description
- Follow the red ball quickly while avoiding
objects. - Generate a controller to perform the task
- Execute the controller
- Respond to feedback
- Turn more aggressively, Drive faster
- Improve performance by learning
- Use the generated controller to provide examples
3The Target Audience
- People who
- Know how to perform a specific task themselves
- Want a robot to perform the task
- Dont know about robots (in detail)
- Dont know how to program
4Carefully follow the blue ball
blue
kP
Speed
Dist
X
T
T
Y
Distance
R
R
Found
5Carefully follow the blue ball
6Current Issues
- Responding to Feedback
- Meaning of More/Less
- Reinforcement Learning
- Human as reward function
- NOT the same as a pre-programmed reward function,
rewards come early or late!
7Changing the Rules
- Transforms one MDP into another MDP
- Rewards become spread out
8Early or Late Rewards
- Will it still converge to the optimal policy?
- Can we bound the changes?
- Will it converge more slowly?
- Can we guarantee it converges?
9Toy Example
- 10 States, 2 Actions (Left and Right)
- Distribution of reward times, -2, -1, 0, 1, 2
- Converges to optimal policy ? Pr(-1) Pr(0)
Pr(1) gt 0 - Does this system satisfy these conditions?
10Collected Data
- Let real people give rewards
- Measure timings
- Fit distribution
- What can we infer?
11Conclusion
- Questions
- What properties are required?
- Symmetry
- Reversible actions
- Grouped rewards
- Directions
- Looking at a variety MDPs
- Determine common elements
12The Robot
- RWI, inc. B21r robot
- 2 rings of 24 sonar
- 24 infrared sensors
- 56 bump sensors
- Laser range finder
- Pan-tilt unit w/ camera
13Dictionary Word List
14Tuning Performance
kP
1.5
3.0
- Tuning keywords
- Each component contains a set of words
- Categorized by motor output
- Changes parameter to modify behavior
P
f(E) kPE
f(E) 1.5E
f(E) 3.0E
Turn - More Aggressively - Less Aggressively
15Why not say what you mean?
- Turn more aggressively
- Double the gain on the P-controller connected
to the rotational motor output
f(E) 1.5E
f(E) 3.0E
16Space of All Possible Controllers
Carefully follow the blue ball
Unsafe
X
Unsafe
Controllers are all of this form
17Adaptation
- Small-scale adaptation
- Adjust parameters based on feedback
- Moves from one point in the controller space to
another - Possibly closer to the optimal controller
P
18Reinforcement Learning (RL)
- RL Agent
- Views state of the world
- Receives rewards
- Takes action
- Learning
- Tries to maximize reward
- Produces a Value Function
- Needs to see every state many times
19Why RL?
- Searches the entire controller space(in theory)
- Can we find the optimal controller?
- Need help (examples)
RL
Unsafe
Unsafe
X
20Adaptation using RL
- Small scale RL
- Replace a component with RL Scott et al. 1992
- Agent could be pre-trained
- Continues to adapt
- Generate initial reward function
RL
21Issues with Small Scale RL
- Issues
- Accelerate RL to work quickly enough
- Real robots cant try it 10,000 times
- Trainers wont say Good Robot 10,000 times
- Choose when to substitute in an agent
- Ensure safety when using RL
22Adaptation using RL Revisited
- Large scale RL
- Use RL to replace the entire controller Smart
2002 - Generate example trajectories to learn from using
assembled control diagrams - Much faster than without example Smart 2002
- Use feedback (rewards) to train the whole system
23Issues with Large Scale RL
- Issues
- Generate an automatic reward function
- Harder than single component case
- Know when to switch control
- Monolithic
- Every time step Humphrys 1996
- Adapt to user feedback in realistic time
- RL makes very small changes
- Simulation Transfer experience to real robot
24Avoid the blue ball and come here carefully
blue
kP
Dist
X
Y
Distance
Found
T
T
R
T
R
R
Sel
kP
Speed
Dist
X
Y
T
Distance
R
Found
25Avoid the blue ball and come here carefully
26Keep the wall left input at distance one point
threeand keep driving at speed one tenth
0.1
m/s
1.3
m
T
R
27Keep the wall left input at distance one point
threeand keep driving at speed one tenth
28Follow me carefully
P
Speed
Dist
X
T
T
Y
Distance
R
R
Found
29Follow me carefully
30Drive to here while avoiding objects
P
Speed
Dist
X
T
T
Y
Distance
R
R
Found
31Drive to here while avoiding objects