Learning optimal behavior - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Learning optimal behavior

Description:

Learning optimal behavior Twan van Laarhoven AIBO robot Walking Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion Nate Kohl, Peter Stone (2004 ... – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 23
Provided by: TwanvanL
Category:

less

Transcript and Presenter's Notes

Title: Learning optimal behavior


1
Learning optimal behavior
  • Twan van Laarhoven

2
AIBO robot Walking
  • Policy Gradient Reinforcement Learningfor Fast
    Quadrupedal LocomotionNate Kohl, Peter Stone
    (2004)
  • Goal speed

3
(No Transcript)
4
Parameterization
  • 12 parameters
  • Front ellipse
  • Rear ellipse
  • Height body
  • etc.

5
Learning
  • No simulator
  • test on actual AIBO
  • expensive
  • Not a MDP
  • No Q-Learning
  • Gradient Reinforcement Learning

6
Gradient Reinforcement Learning
  1. Parameter vector p ?1, , ?N
  2. Random policies Ri ?1 ?1, , ?N ?N
  3. Each parameter S-e,n / S0,n / Se,n
  4. Averages Avg-e,n avgscore(S-e,n)
  5. Adjust An 0 or Avge,n - Avg-e,n
  6. Repeat

7
(No Transcript)
8
Conclusion
  • Gradient Reinforcement Learning is very simple
    and gives good results
  • Evaluation can be done in parallel

9
Learning from experts
  • Apprenticeship Learning for Motion Planning with
    Application to Parking Lot Navigation Pieter
    Abbeel, Dmitri Dolgov,Andrew Y. Ng, Sebastian
    Thrun (2008)

10
Parking lot navigation
  • Path planning
  • Many cost functions
  • length
  • backward
  • smoothness
  • off road
  • etc.

11
Cost functions
  • forward length ?fwd ?fwd xi - xi-1
  • reverse length ?rev ?rev xi - xi-1
  • off-road ?road ?road(i) xi - xi-1
  • curvature ?curv ? (?xi1 - ?xi)2
  • in lane ?lane ? D(xi, ?i, G)
  • direction ?dir ? sin2 (2(?i - ai))

12
Path planning
  • Two step approach
  • Coarse A search
  • Refinement

13
Cost and paths
  • Total cost
  • Best path argmins?S F(s)
  • Many cost functions
  • how to weigh them?
  • learn from examples
  • Goal match cost ?k(s) ?k(sE)

14
Apprenticeship learning
  1. random weights w(0) random
  2. find paths si argmins F(s)
  3. sum costs µ(i)k ? ?k(si)
  4. find new weights w(j1) µk µEk
  5. repeat until w(j1) e

15
Results
  • Nice
  • Sloppy
  • Backwards

16
Results
  • Nice
  • Sloppy
  • Backwards

17
Results
  • Nice
  • Sloppy
  • Backwards

18
Conclusion
  • Always performs as well as expert!
  • µ µE w e
  • Algorithm is difficult to understand
  • Paper uses confusing notation

19
EOF
20
More information
  • Apprenticeship learning via inverse reinforcement
    learningPieter Abbeel, Andrew Y. Ng
  • Maximal margin

21
More information
  • Apprenticeship learning via inverse reinforcement
    learningPieter Abbeel, Andrew Y. Ng
  • Projection method

22
Apprenticeship learning
  • random weights w(0) random
  • find path(s) si argmins ? w(i)k ?k(s)
  • sum costs µ(i)k ? ?k(si)
  • find weights minw,x w st. µk ? xjµ(j)k
  • wk µk µEk
  • repeat w(j1) w / w
  • combine
Write a Comment
User Comments (0)
About PowerShow.com