Title: Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion
1Hierarchical Apprenticeship Learning with
Application to Quadruped Locomotion
S T A N F O R D
S T A N F O R D
J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng
- 1. Motivating Application
- Planning footsteps for a quadruped robot over
challenging, irregular, previously unseen terrain - Good footsteps need to properly trade off several
features slope, proximity to drop-offs,
stability of robots pose, etc. - Highly non-trivial to hand-specify the reward
function for a planner, which requires manually
determining relative weights for all features
- 2. Apprenticeship Learning Background
- Key idea of Apprenticeship Learning often easier
to demonstrate good behavior than to specify a
reward that induces this behavior - Two factors make Apprenticeship Learning hard to
apply to large, complex problems such as
quadruped planning - Very difficult, even for a domain expert, to
specify a good complete path (e.g., a full set of
footsteps across terrain) - Even given a reward function, planning (e.g.
finding a complete set of a footsteps) is a hard,
high-dimensional, task
Goal
Initial Position
3. Hierarchical Apprenticeship Learning Main Idea
- 4. Convex Formulation
- Two assumptions on the reward function
- Reward is linear in state features
- High level rewards are averages of low level
rewards - High-level demonstrations imply constraints on
value function - Low-level demonstrations imply constraints on
reward function - Can combine high and low-level constraints (plus
adding slack variables) to form a single,
unified, convex optimization problem
- Demonstrate good behavior at each level separately
- Decompose planning task into multiple levels of
abstraction
High level demonstrationDemonstrate body path
across terrain
Low level demonstrationGreedy local footsteps
at a few key locations
Footstep specified by teacher
Goal
Initial Position
Step 1 High level Plan path for center of robot
body
Current foot positions
Goal
Initial Position
Step 2 Low level Plan footsteps along body path
Easier to demonstrate greedy actions than
long-term optimal actions
Easier to specify a path in the reduced, abstract
state space than in the full state space
5. Experimental Results
Quadruped Robot
Multi-room Grid World
- 10x10 rooms connected by doors, where each room
is a 10x10 grid world - High level demonstration shows only room-to-room
path (using true reward function) - Low-level demonstration shows only local greedy
action at grid level
- Evaluated algorithm on easier terrain for
training, and harder terrain for testing - On training terrain, demonstrated a single
high-level body path and 20 greedy low-level foot
placements (10 minutes to gather all data) - System achieves state-of-the-art performance on
this task
Planned Footsteps
- 6. Related Work
- Apprenticeship Learning Abbeel and Ng (2004),
Ratliff et. al (2006, 2007), Neu and Szepesvari
(2007), Syed and Schapire (2007) - Hierarchical Reinforcement Learning Parr and
Russell (1998), Sutton et. al (1999), Dietterich
(2000), Barto and Mahadevan (2003)
- 7. Conclusion
- Presented a novel algorithm for applying
apprenticeship learning to large, complex domains
via hierarchical decomposition - Demonstrated algorithm on multi-room grid world
and challenging quadruped task, where we achieve
state-of the-art performance - More generally, algorithm is applicable whenever
reward function can be hierarchically decomposed
as described above
Training Terrain
Testing Terrain
No Planning
Hierarchical Apprenticeship Learning
High Level (Body Path) Constraints Only
Low Level (Footstep) Constraints Only
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAA