Programming Robots using HighLevel Task Descriptions - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Programming Robots using HighLevel Task Descriptions

Description:

'Follow the red ball quickly while avoiding objects.' Generate a controller to perform the task ... Accelerate RL to work quickly enough. Real robots can't try ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 12
Provided by: ajm78
Category:

less

Transcript and Presenter's Notes

Title: Programming Robots using HighLevel Task Descriptions


1
Programming Robots using High-Level Task
Descriptions
  • AAAI 2004 Workshop on
  • Supervisory Control of Learning and Adaptive
    Systems

Andrew J. Martignoni III Advisor William D.
Smart http//www.cse.wustl.edu/ajm7/controller/ S
upported by The Boeing Foundation
2
The Problem
  • Limited natural language task description
  • Follow the red ball quickly while avoiding
    objects.
  • Generate a controller to perform the task
  • Execute the controller
  • Respond to feedback
  • Turn more aggressively, Drive faster
  • Improve performance by learning
  • Use the generated controller to provide examples

3
The Target Audience
  • People who
  • Know how to perform a specific task themselves
  • Want a robot to perform the task
  • Dont know about robots (in detail)
  • Dont know how to program

4
Carefully follow the blue ball
blue
kP
Speed
Dist
X
T
T
Y
Distance
R
R
Found
5
Carefully follow the blue ball
6
Current Issues
  • Responding to Feedback
  • Meaning of More/Less
  • Reinforcement Learning
  • Human as reward function
  • NOT the same as a pre-programmed reward function,
    rewards come early or late!

7
Changing the Rules
  • Transforms one MDP into another MDP
  • Rewards become spread out

8
Early or Late Rewards
  • Will it still converge to the optimal policy?
  • Can we bound the changes?
  • Will it converge more slowly?
  • Can we guarantee it converges?

9
Toy Example
  • 10 States, 2 Actions (Left and Right)
  • Distribution of reward times, -2, -1, 0, 1, 2
  • Converges to optimal policy ? Pr(-1) Pr(0)
    Pr(1) gt 0
  • Does this system satisfy these conditions?

10
Collected Data
  • Let real people give rewards
  • Measure timings
  • Fit distribution
  • What can we infer?

11
Conclusion
  • Questions
  • What properties are required?
  • Symmetry
  • Reversible actions
  • Grouped rewards
  • Directions
  • Looking at a variety MDPs
  • Determine common elements

12
The Robot
  • RWI, inc. B21r robot
  • 2 rings of 24 sonar
  • 24 infrared sensors
  • 56 bump sensors
  • Laser range finder
  • Pan-tilt unit w/ camera

13
Dictionary Word List
14
Tuning Performance
kP
1.5
3.0
  • Tuning keywords
  • Each component contains a set of words
  • Categorized by motor output
  • Changes parameter to modify behavior

P
f(E) kPE
f(E) 1.5E
f(E) 3.0E
Turn - More Aggressively - Less Aggressively
15
Why not say what you mean?
  • Turn more aggressively
  • Double the gain on the P-controller connected
    to the rotational motor output

f(E) 1.5E
f(E) 3.0E
16
Space of All Possible Controllers
Carefully follow the blue ball
Unsafe
X
Unsafe
Controllers are all of this form
17
Adaptation
  • Small-scale adaptation
  • Adjust parameters based on feedback
  • Moves from one point in the controller space to
    another
  • Possibly closer to the optimal controller

P
18
Reinforcement Learning (RL)
  • RL Agent
  • Views state of the world
  • Receives rewards
  • Takes action
  • Learning
  • Tries to maximize reward
  • Produces a Value Function
  • Needs to see every state many times

19
Why RL?
  • Searches the entire controller space(in theory)
  • Can we find the optimal controller?
  • Need help (examples)

RL
Unsafe
Unsafe
X
20
Adaptation using RL
  • Small scale RL
  • Replace a component with RL Scott et al. 1992
  • Agent could be pre-trained
  • Continues to adapt
  • Generate initial reward function

RL
21
Issues with Small Scale RL
  • Issues
  • Accelerate RL to work quickly enough
  • Real robots cant try it 10,000 times
  • Trainers wont say Good Robot 10,000 times
  • Choose when to substitute in an agent
  • Ensure safety when using RL

22
Adaptation using RL Revisited
  • Large scale RL
  • Use RL to replace the entire controller Smart
    2002
  • Generate example trajectories to learn from using
    assembled control diagrams
  • Much faster than without example Smart 2002
  • Use feedback (rewards) to train the whole system

23
Issues with Large Scale RL
  • Issues
  • Generate an automatic reward function
  • Harder than single component case
  • Know when to switch control
  • Monolithic
  • Every time step Humphrys 1996
  • Adapt to user feedback in realistic time
  • RL makes very small changes
  • Simulation Transfer experience to real robot

24
Avoid the blue ball and come here carefully
blue
kP
Dist
X
Y
Distance
Found
T
T
R
T
R
R
Sel
kP
Speed
Dist
X
Y
T
Distance
R
Found
25
Avoid the blue ball and come here carefully
26
Keep the wall left input at distance one point
threeand keep driving at speed one tenth
0.1
m/s
1.3
m
T
R
27
Keep the wall left input at distance one point
threeand keep driving at speed one tenth
28
Follow me carefully
P
Speed
Dist
X
T
T
Y
Distance
R
R
Found
29
Follow me carefully
30
Drive to here while avoiding objects
P
Speed
Dist
X
T
T
Y
Distance
R
R
Found
31
Drive to here while avoiding objects
Write a Comment
User Comments (0)
About PowerShow.com