Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

About This Presentation

Title:

Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

Description:

Planning footsteps for a quadruped robot over challenging, irregular, previously ... Hierarchical Reinforcement Learning: Parr and Russell (1998), Sutton et. al ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 2

Provided by: morganq

Category:

more less

Transcript and Presenter's Notes

Title: Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

1
Hierarchical Apprenticeship Learning with
Application to Quadruped Locomotion
S T A N F O R D
S T A N F O R D
J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng

1. Motivating Application
Planning footsteps for a quadruped robot over
challenging, irregular, previously unseen terrain
Good footsteps need to properly trade off several
features slope, proximity to drop-offs,
stability of robots pose, etc.
Highly non-trivial to hand-specify the reward
function for a planner, which requires manually
determining relative weights for all features

2. Apprenticeship Learning Background
Key idea of Apprenticeship Learning often easier
to demonstrate good behavior than to specify a
reward that induces this behavior
Two factors make Apprenticeship Learning hard to
apply to large, complex problems such as
quadruped planning
Very difficult, even for a domain expert, to
specify a good complete path (e.g., a full set of
footsteps across terrain)
Even given a reward function, planning (e.g.
finding a complete set of a footsteps) is a hard,
high-dimensional, task

Goal
Initial Position
3. Hierarchical Apprenticeship Learning Main Idea

4. Convex Formulation
Two assumptions on the reward function
Reward is linear in state features
High level rewards are averages of low level
rewards
High-level demonstrations imply constraints on
value function
Low-level demonstrations imply constraints on
reward function
Can combine high and low-level constraints (plus
adding slack variables) to form a single,
unified, convex optimization problem

Demonstrate good behavior at each level separately

Decompose planning task into multiple levels of
abstraction

High level demonstrationDemonstrate body path
across terrain
Low level demonstrationGreedy local footsteps
at a few key locations
Footstep specified by teacher
Goal
Initial Position
Step 1 High level Plan path for center of robot
body
Current foot positions
Goal
Initial Position
Step 2 Low level Plan footsteps along body path
Easier to demonstrate greedy actions than
long-term optimal actions
Easier to specify a path in the reduced, abstract
state space than in the full state space
5. Experimental Results
Quadruped Robot
Multi-room Grid World

10x10 rooms connected by doors, where each room
is a 10x10 grid world
High level demonstration shows only room-to-room
path (using true reward function)
Low-level demonstration shows only local greedy
action at grid level

Evaluated algorithm on easier terrain for
training, and harder terrain for testing
On training terrain, demonstrated a single
high-level body path and 20 greedy low-level foot
placements (10 minutes to gather all data)
System achieves state-of-the-art performance on
this task

Planned Footsteps

6. Related Work
Apprenticeship Learning Abbeel and Ng (2004),
Ratliff et. al (2006, 2007), Neu and Szepesvari
(2007), Syed and Schapire (2007)
Hierarchical Reinforcement Learning Parr and
Russell (1998), Sutton et. al (1999), Dietterich
(2000), Barto and Mahadevan (2003)

7. Conclusion
Presented a novel algorithm for applying
apprenticeship learning to large, complex domains
via hierarchical decomposition
Demonstrated algorithm on multi-room grid world
and challenging quadruped task, where we achieve
state-of the-art performance
More generally, algorithm is applicable whenever
reward function can be hierarchically decomposed
as described above

Training Terrain
Testing Terrain
No Planning
Hierarchical Apprenticeship Learning
High Level (Body Path) Constraints Only
Low Level (Footstep) Constraints Only
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAA

Write a Comment

User Comments (0)