Learning from Observation Using Primitives - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Learning from Observation Using Primitives

Description:

Paddle command: Desired end location and time of the trajectory, (x, y, t) ... Time from when the command is given to the time the paddle observed at the hit position. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 51
Provided by: darrinbe
Category:

less

Transcript and Presenter's Notes

Title: Learning from Observation Using Primitives


1
Learning from Observation Using Primitives
  • Darrin Bentivegna

2
Outline
  • Motivation
  • Test environments
  • Learning from observation
  • Learning from practice
  • Contributions
  • Future directions

3
Motivation
  • Reduce the learning time needed by robots.
  • Quickly learn skills from observing others.
  • Improve performance through practice .
  • Adapt to environment changes.
  • Create robots that can interact with and learn
    from humans in a human-like way.

4
Real World Marble Maze
5
Real World Air Hockey
6
Research Strategy
  • Domain knowledge library of primitives.
  • Manually defining primitives is a natural way to
    specify domain knowledge.
  • Focus of research is on how to use a fixed
    library of primitives

Marble Maze Primitives
Roll To Corner
Roll Off Wall
Leave Corner
Guide
Roll From Wall
7
Primitives in Air Hockey
Straight Shot
Right Bank Shot
Left Bank Shot
- Defend Goal -Static Shot -Idle
8
Take home message
  • Learning using primitives greatly speeds up
    learning and allows more complex problems to be
    performed by robots.
  • Memory based learning makes learning from
    observation easy.
  • I created a way to do memory based reinforcement
    learning.
  • Problem is no fixed set of parameters to adjust.
  • Learn by adjusting distance function.
  • Present algorithms that learn from both
    observation and practice.

9
Observe Critical Events in Marble Maze
Raw Data
10
Observe Critical Events in Marble Maze
Raw Data
Wall Contact Inferred
11
Observe Critical Events in Air Hockey
Human paddle
Human paddle
movement
movement
Paddle X
Paddle X
Shots made
Shots made
by human
by human
Paddle Y
Paddle Y
Puck
Puck
movement
movement
Puck X
Puck X
Puck Y
Puck Y
y
y
x
x
12
Learning From Observation
  • Memory-based learner Learn by storing
    experiences.
  • Primitive selection K-nearest neighbor.
  • Sub-goal generation Kernel regression (distance
    weighted averaging) based on remembered
    primitives of the appropriate type.
  • Action generation Learned or fixed policy.

13
Three Level Structure
Primitive Selection
Sub-goal Generation
Action Generation
14
Learning from Observation Framework
Learning from Observation
Primitive Selection
Sub-goal Generation
Action Generation
15
Observe Primitives Performed by a Human
?-Guide ?-Roll To Corner ?- Roll Off
Wall -Roll From Wall X-Leave Corner
16
Primitive Database
  • Create a data point for each observed primitive.
  • The primitive type performed TYPE
  • State of the environment at the start of the
    primitive performance.
  • State of the environment at the end of the
    primitive performance.

17
Marble Maze Example
18
Primitive Type Selection
  • Lookup using environment state.
  • Weighted nearest neighbor.
  • Many ways to select a primitive type.
  • Use closest point.
  • Use n nearest points to vote.
  • Highest frequency.
  • Weighted by distance from the query point.

19
Sub-goal Generation
  • Locally weighted average over nearby primitives
    (data points) of the same type.
  • Use a kernel function to control the influence
    of nearby data points.

20
Action Generation
  • Provides the action (motor command) to perform at
    each time step.
  • LWR, neural networks, physical model, etc.

21
Creating an Action Generation Module (Roll to
Corner)
  • Record at each time step from the beginning to
    the end of the primitive
  • Environment state
  • Actions taken
  • End state

22
Transform to a Local Coordinate Frame
  • Global information
  • Primitive specific local information.

Reference Point
Dist to the end
23
Learning the Maze Task from Only Observation
24
Related Research Primitive Recognition
  • Survey of research in human motion analysis and
    recognizing human activities from image
    sequences.
  • Aggarwal and Cai.
  • Recognize over time.
  • HMM, Brand, Oliver, and Pentland.
  • Template matching, Davis and Bobick.
  • Discover Primitives.
  • Fod, Mataric, and Jenkins.

25
Related Research Primitive Selection
  • Predefined sequence.
  • Virtual characters, Hodgins, et al., Faloutsos,
    et al., and Mataric et al.
  • Mobile robots, Balch, et al. and Arkin, et al.
  • Learn from observation.
  • Assembly, Kuniyoshi, Inaba, Inoue, and Kang.
  • Use a planning system.
  • Assembly, Thomas and Wahl.
  • RL, Ryan and Reid.

26
Related Research Primitive Execution
  • Predefine execution policy.
  • Virtual characters, Mataric et al., and Hodgins
    et al.
  • Mobile robots, Brooks et al. and Arkin.
  • Learn while operating in the environment.
  • Mobile robots, Mahadevan and Connell
  • RL, Kaelbling, Dietterich, and Sutton at al.
  • Learn from observation
  • Mobile robots, Larson and Voyles, Hugues and
    Drogoul, Grudic and Lawrence.
  • High DOF robots, Aboaf et al., Atkeson, and
    Schaal.

27
Review
Learning from Observation
Primitive Selection
Sub-goal Generation
Action Generation
28
Using Only Observed Data
  • Tries to mimic the teacher.
  • Can not always perform primitives as well as the
    teacher.
  • Sometimes select the wrong primitive type for the
    observed state.
  • Does not know what to do in states it has not
    observed.
  • No way to know it should try something different.
  • Solution Learning from practice.

29
Improving Primitive Selection and Sub-goal
Generation from Practice
30
Improving Primitive Selection and Sub-goal
Generation Through Practice
  • Need task specification information to create a
    reward function.
  • Learn by adjusting distance to query Scale
    distance function by value of using a data point.
  • f(data point location, query location) related to
    Q value 1/Q or exp(-Q)
  • Associate scale values with each data point.
  • The scale values must be stored, and learned.

31
Store Values in Function Approximator
  • Look-up table.
  • Fixed size.
  • Locally Weighted Projection Regression (LWPR),
    Schaal, et al.
  • Create a model for each data point.
  • Indexed by the difference between the query point
    and data points state (delta-state).

32
Learn Values Using a Reinforcement Learning
Strategy
  • State delta-state.
  • Action Using this data point.
  • Reward Assignment
  • Positive Making progress through the maze.
  • Negative
  • Falling into a hole.
  • Going backwards through the maze.
  • Taking time performing the primitive.

33
Learning the Value of Choosing a Data Point
(Simulation)
Testing area
Incoming Velocity Vector
Computed Scale Values
Observed Roll Off Wall Primitive
(12.9,18.8)
Two marble positions with the incoming velocity
as shown when the LWPR model associated with the
Roll Off Wall primitive shown is queried.
BAD
GOOD
34
Maze Learning from Practice
Simulation
Real World
Cumulative failures/meter
Obs. Only
Table
LWPR
35
Learning New Strategies
36
Learning Action Generation from Practice
37
Improving Action Generation Through Practice
  • Environment changes over time.
  • Need to compensate for structural modeling error.
  • Can not learn everything from only observing
    others.

38
Knowledge for Making a Hit
Target Line
Hit Line
Absolute Post-hit Velocity
Hit Location
Target Location
Path of the incoming puck
  • After hit location has been determined.
  • Puck movement
  • Puck-paddle collision
  • Paddle placement
  • Paddle movement timing

39
Results of Learning Straight Shots (Simulation)
  • Observed 44 straight shots made by the human.
  • Running average of 5 shots.
  • Too much noise in hardware sensing.

40
Robot Model (Real World)
Puck Motion
Impact
Robot
Target Location
Outgoing Puck Velocity
Incoming Paddle Velocity
Robot Trajectory
41
Obtaining Proper Robot Movement
  • Six set robot configurations.
  • Interpolate between the four surrounding
    configurations.
  • Paddle command
  • Desired end location and time of the trajectory,
    (x, y, t).
  • Follows fifth-order polynomial equation, zero
    start and end velocity and acceleration.

42
Robot Model
Desired state of the puck at hit time.
Compute the movement command
Generate robot trajectory
Robot trajectory
(x, y, t)
(x, y, t)
Pre-set time delay
Starting location
43
Robot Movement Errors
  • Movement accuracy determined by many factors.
  • Speed of the movement.
  • Friction between the paddle and the board.
  • Hydraulic pressure applied to the robot.
  • Operating within the designed performance
    parameters.

44
Robot Model
  • Learn to properly place the paddle.
  • Learn the timing of the paddle.
  • Observe its own actions
  • Actual hit point (highest velocity point).
  • Time from when the command is given to the time
    the paddle observed at the hit position.

45
Improving the Robot Model
Desired state of the puck at hit time.
Robot trajectory
46
Using the Improved Robot Model
Desired hit location Location of highest
paddle velocity -
Desired trajectory
Observed path of the paddle.
y
Starting location
x
47
Using the Improved Robot Model
Desired hit location Location of highest
paddle velocity -
Desired trajectory
Observed path of the paddle.
y
Starting location
x
48
Real-World Air Hockey
49
Major Contributions
  • A framework has been created as a tool in which
    to perform research in learning from observation
    using primitives.
  • Flexible structure allows for the use of various
    learning algorithms.
  • Can also learn from practice.
  • Presented learning methods that can learn quickly
    from observed information and also have the
    ability to increase performance through practice.
  • Created a unique algorithm that gives a robot the
    ability to learn the effectiveness of data points
    in a data base and then use that information to
    change its behavior as it operates in the
    environment.
  • Presented a method of breaking the learning
    problem into small learning modules.
  • Individual modules have more opportunities to
    learn and generalize.

50
Some Future Directions
  • Automatically defining primitive types.
  • Explore how to represent learned information so
    it can be used in other tasks/environments.
  • Can robots learn about the world from playing
    these games?
  • Explore other ways to select primitives and
    sub-goals.
  • Use the observed information to create a planner.
  • Investigate methods of exploration at primitive
    selection and sub-goal generation.
  • Simultaneously learn primitive selection and
    action generation.
Write a Comment
User Comments (0)
About PowerShow.com