Learning from Observation Using Primitives - PowerPoint PPT Presentation

1 / 50

About This Presentation

Title:

Learning from Observation Using Primitives

Description:

Paddle command: Desired end location and time of the trajectory, (x, y, t) ... Time from when the command is given to the time the paddle observed at the hit position. ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 51

Provided by: darrinbe

Category:

more less

Transcript and Presenter's Notes

Title: Learning from Observation Using Primitives

1
Learning from Observation Using Primitives

Darrin Bentivegna

2
Outline

Motivation
Test environments
Learning from observation
Learning from practice
Contributions
Future directions

3
Motivation

Reduce the learning time needed by robots.
Quickly learn skills from observing others.
Improve performance through practice .
Adapt to environment changes.
Create robots that can interact with and learn
from humans in a human-like way.

4
Real World Marble Maze
5
Real World Air Hockey
6
Research Strategy

Domain knowledge library of primitives.
Manually defining primitives is a natural way to
specify domain knowledge.
Focus of research is on how to use a fixed
library of primitives

Marble Maze Primitives
Roll To Corner
Roll Off Wall
Leave Corner
Guide
Roll From Wall
7
Primitives in Air Hockey
Straight Shot
Right Bank Shot
Left Bank Shot
- Defend Goal -Static Shot -Idle
8
Take home message

Learning using primitives greatly speeds up
learning and allows more complex problems to be
performed by robots.
Memory based learning makes learning from
observation easy.
I created a way to do memory based reinforcement
learning.
Problem is no fixed set of parameters to adjust.
Learn by adjusting distance function.
Present algorithms that learn from both
observation and practice.

9
Observe Critical Events in Marble Maze
Raw Data
10
Observe Critical Events in Marble Maze
Raw Data
Wall Contact Inferred
11
Observe Critical Events in Air Hockey
Human paddle
Human paddle
movement
movement
Paddle X
Paddle X
Shots made
Shots made
by human
by human
Paddle Y
Paddle Y
Puck
Puck
movement
movement
Puck X
Puck X
Puck Y
Puck Y
y
y
x
x
12
Learning From Observation

Memory-based learner Learn by storing
experiences.
Primitive selection K-nearest neighbor.
Sub-goal generation Kernel regression (distance
weighted averaging) based on remembered
primitives of the appropriate type.
Action generation Learned or fixed policy.

13
Three Level Structure
Primitive Selection
Sub-goal Generation
Action Generation
14
Learning from Observation Framework
Learning from Observation
Primitive Selection
Sub-goal Generation
Action Generation
15
Observe Primitives Performed by a Human
?-Guide ?-Roll To Corner ?- Roll Off
Wall -Roll From Wall X-Leave Corner
16
Primitive Database

Create a data point for each observed primitive.
The primitive type performed TYPE
State of the environment at the start of the
primitive performance.
State of the environment at the end of the
primitive performance.

17
Marble Maze Example
18
Primitive Type Selection

Lookup using environment state.
Weighted nearest neighbor.
Many ways to select a primitive type.
Use closest point.
Use n nearest points to vote.
Highest frequency.
Weighted by distance from the query point.

19
Sub-goal Generation

Locally weighted average over nearby primitives
(data points) of the same type.
Use a kernel function to control the influence
of nearby data points.

20
Action Generation

Provides the action (motor command) to perform at
each time step.
LWR, neural networks, physical model, etc.

21
Creating an Action Generation Module (Roll to
Corner)

Record at each time step from the beginning to
the end of the primitive
Environment state
Actions taken
End state

22
Transform to a Local Coordinate Frame

Global information
Primitive specific local information.

Reference Point
Dist to the end
23
Learning the Maze Task from Only Observation
24
Related Research Primitive Recognition

Survey of research in human motion analysis and
recognizing human activities from image
sequences.
Aggarwal and Cai.
Recognize over time.
HMM, Brand, Oliver, and Pentland.
Template matching, Davis and Bobick.
Discover Primitives.
Fod, Mataric, and Jenkins.

25
Related Research Primitive Selection

Predefined sequence.
Virtual characters, Hodgins, et al., Faloutsos,
et al., and Mataric et al.
Mobile robots, Balch, et al. and Arkin, et al.
Learn from observation.
Assembly, Kuniyoshi, Inaba, Inoue, and Kang.
Use a planning system.
Assembly, Thomas and Wahl.
RL, Ryan and Reid.

26
Related Research Primitive Execution

Predefine execution policy.
Virtual characters, Mataric et al., and Hodgins
et al.
Mobile robots, Brooks et al. and Arkin.
Learn while operating in the environment.
Mobile robots, Mahadevan and Connell
RL, Kaelbling, Dietterich, and Sutton at al.
Learn from observation
Mobile robots, Larson and Voyles, Hugues and
Drogoul, Grudic and Lawrence.
High DOF robots, Aboaf et al., Atkeson, and
Schaal.

27
Review
Learning from Observation
Primitive Selection
Sub-goal Generation
Action Generation
28
Using Only Observed Data

Tries to mimic the teacher.
Can not always perform primitives as well as the
teacher.
Sometimes select the wrong primitive type for the
observed state.
Does not know what to do in states it has not
observed.
No way to know it should try something different.
Solution Learning from practice.

29
Improving Primitive Selection and Sub-goal
Generation from Practice
30
Improving Primitive Selection and Sub-goal
Generation Through Practice

Need task specification information to create a
reward function.
Learn by adjusting distance to query Scale
distance function by value of using a data point.
f(data point location, query location) related to
Q value 1/Q or exp(-Q)
Associate scale values with each data point.
The scale values must be stored, and learned.

31
Store Values in Function Approximator

Look-up table.
Fixed size.
Locally Weighted Projection Regression (LWPR),
Schaal, et al.
Create a model for each data point.
Indexed by the difference between the query point
and data points state (delta-state).

32
Learn Values Using a Reinforcement Learning
Strategy

State delta-state.
Action Using this data point.
Reward Assignment
Positive Making progress through the maze.
Negative
Falling into a hole.
Going backwards through the maze.
Taking time performing the primitive.

33
Learning the Value of Choosing a Data Point
(Simulation)
Testing area
Incoming Velocity Vector
Computed Scale Values
Observed Roll Off Wall Primitive
(12.9,18.8)
Two marble positions with the incoming velocity
as shown when the LWPR model associated with the
Roll Off Wall primitive shown is queried.
BAD
GOOD
34
Maze Learning from Practice
Simulation
Real World
Cumulative failures/meter
Obs. Only
Table
LWPR
35
Learning New Strategies
36
Learning Action Generation from Practice
37
Improving Action Generation Through Practice

Environment changes over time.
Need to compensate for structural modeling error.
Can not learn everything from only observing
others.

38
Knowledge for Making a Hit
Target Line
Hit Line
Absolute Post-hit Velocity
Hit Location
Target Location
Path of the incoming puck

After hit location has been determined.
Puck movement
Puck-paddle collision
Paddle placement
Paddle movement timing

39
Results of Learning Straight Shots (Simulation)

Observed 44 straight shots made by the human.
Running average of 5 shots.
Too much noise in hardware sensing.

40
Robot Model (Real World)
Puck Motion
Impact
Robot
Target Location
Outgoing Puck Velocity
Incoming Paddle Velocity
Robot Trajectory
41
Obtaining Proper Robot Movement

Six set robot configurations.
Interpolate between the four surrounding
configurations.

Paddle command
Desired end location and time of the trajectory,
(x, y, t).
Follows fifth-order polynomial equation, zero
start and end velocity and acceleration.

42
Robot Model
Desired state of the puck at hit time.
Compute the movement command
Generate robot trajectory
Robot trajectory
(x, y, t)
(x, y, t)
Pre-set time delay
Starting location
43
Robot Movement Errors

Movement accuracy determined by many factors.
Speed of the movement.
Friction between the paddle and the board.
Hydraulic pressure applied to the robot.
Operating within the designed performance
parameters.

44
Robot Model

Learn to properly place the paddle.
Learn the timing of the paddle.
Observe its own actions
Actual hit point (highest velocity point).
Time from when the command is given to the time
the paddle observed at the hit position.

45
Improving the Robot Model
Desired state of the puck at hit time.
Robot trajectory
46
Using the Improved Robot Model
Desired hit location Location of highest
paddle velocity -
Desired trajectory
Observed path of the paddle.
y
Starting location
x
47
Using the Improved Robot Model
Desired hit location Location of highest
paddle velocity -
Desired trajectory
Observed path of the paddle.
y
Starting location
x
48
Real-World Air Hockey
49
Major Contributions

A framework has been created as a tool in which
to perform research in learning from observation
using primitives.
Flexible structure allows for the use of various
learning algorithms.
Can also learn from practice.
Presented learning methods that can learn quickly
from observed information and also have the
ability to increase performance through practice.
Created a unique algorithm that gives a robot the
ability to learn the effectiveness of data points
in a data base and then use that information to
change its behavior as it operates in the
environment.
Presented a method of breaking the learning
problem into small learning modules.
Individual modules have more opportunities to
learn and generalize.

50
Some Future Directions

Automatically defining primitive types.
Explore how to represent learned information so
it can be used in other tasks/environments.
Can robots learn about the world from playing
these games?
Explore other ways to select primitives and
sub-goals.
Use the observed information to create a planner.
Investigate methods of exploration at primitive
selection and sub-goal generation.
Simultaneously learn primitive selection and
action generation.