Title: Skill Acquisition via Transfer Learning and Advice Taking
1Skill Acquisition via Transfer Learningand
Advice Taking
Lisa Torrey, Jude Shavlik, Trevor
Walker University of Wisconsin-Madison, USA
Richard Maclin University of Minnesota-Duluth,
USA
2Transfer Learning
Agent learns Task A
3Transfer Learning
The goal for the target task
with transfer
without transfer
performance
training
4Reinforcement Learning Overview
Described by a set of features
Take an action
Observe world state
Policy choose the action with the highest
Q-value in the current state
Receive a reward
Use the rewards to estimate the Q-values of
actions in states
5Transfer in Reinforcement Learning
- What knowledge will we transfer from the source?
- Q-functions (Taylor Stone 2005)
- Policies (Torrey et al. 2005)
- Skills (this work)
- How will we extract that knowledge from the
source? - From Q-functions (Torrey et al. 2005)
- From observed behavior (this work)
- How will we apply that knowledge in the target?
- Model reuse (Taylor Stone 2005)
- Advice taking (Torrey et al. 2005, this work)
6Advice Taking
- Advice instructions for the learner
In these states Qaction1 gt Qaction2
IF condition THEN prefer action
7Experimental Domain RoboCup
BreakAway (BA)
Score a goal Maclin et al. 2005
Different objectives, but a transferable skill
passing to teammates
8A Challenge for Skill Transfer
- Shared skills are not exactly the same
- Skills have general and specific aspects
- Aspects of the pass skill in RoboCup
- General teammate must be open
- Game-specific where teammate should be located
- Player-specific whether teammate is nearest or
furthest
9Addressing the Challenge
- We focus on learning general skill aspects
- These should transfer better
- We learn skills that apply to multiple players
- This generalizes over player-specific aspects
- We allow humans to provide information
- They can point out game-specific aspects
10Human-Provided Information
- User provides a mapping to show task similarities
- May also provide user advice about task
differences
Pass Ø Ø
Pass towards goal Move towards goal Shoot at goal
11Our Transfer Algorithm
Observe source task games to learn skills
Translate learned skills into transfer advice
Create advice for the target task
If there is user advice, add it in
Learn target task with KBKR
12Learning Skills By Observation
- Source-task games are sequences (state, action)
- Learning skills is like learning to classify
states by their correct actions - We use Inductive Logic Programming to learn
classifiers
13Advantages of ILP
- Can produce first-order rules for skills
- Capture only the essential aspects of the skill
- We expect these aspects to transfer better
- Can incorporate background knowledge
pass(teammate1)
.
pass(Teammate)
vs.
.
.
pass(teammateN)
14Preparing Datasets for ILP
15Example of a Skill Learned
pass(Teammate) - distBetween(me, Teammate) gt
14, passAngle(Teammate) gt 30,
passAngle(Teammate) lt 150, distBetween(me,
Opponent) lt 7.
16Technical Challenges
- KBKR requires propositional advice
- We instantiate each rule head
- Variables in rule bodies create disjunctions
- We use tile features to translate them
- Variables can appear multiple times
- We create new features to translate them
17Two Experimental Scenarios
Pass Ø Ø
Pass towards goal Move towards goal Shoot at goal
4-on-3 MKA
3-on-2 BA
Pass MoveAhead Ø
Pass MoveAhead Shoot at goal
3-on-2 BA
3-on-2 MD
18Skill Transfer Results
From MKA
Without transfer
From MD
19Breakdown of MKA Results
20What if User Advice is Bad?
21Related Work
- Q-function transfer in RoboCup
- Taylor Stone (AAMAS 2005, AAAI 2005)
- Transfer via policy reuse
- Fernandez Veloso (AAMAS 2006, ICML workshop
2006) - Madden Howley (AI Review 2004)
- Transfer via relational RL
- Driessens et al. (ICML workshop 2006)
22Summary of Contributions
- Transfer of shared skills in high-level logic
- Despite differences in shared skills
- Demonstration of the value of user guidance
- Easy to give and beneficial
- Effective transfer in the RoboCup domain
- Challenging and dissimilar tasks
23Future Work
- Learn more general skills by combining multiple
source tasks - Compare several transfer methods on RoboCup
scenarios of varying difficulty - Reach similar levels of transfer with less user
input
24Acknowledgements
- DARPA Grant HR0011-04-1-0007
- US Naval Research Laboratory Grant
N00173-06-1-G002
Thank You
25User Advice
IF distBetween(me,goal) lt 10 AND
angle(goal, me, goalie) gt 40 THEN prefer shoot
This is the part that came from transfer
IF distBetween(me,goal) gt 10 THEN prefer
move_ahead
IF transferred conditions
AND
distBetween(Teammate,goal) lt distBetween(me,goal)
THEN prefer pass(Teammate)
26Feature Tiling
Original feature
max value
min value
Tiling 1
Tiling 2
Tiling 8
(16 tiles)
Tiling 9
Tiling 10
(8 tiles)
Tiling 11
(8 tiles)
27Propositionalizing Rules
pass(Teammate) - distBetween(me, Teammate) gt
14,
pass(teammate1) - distBetween(me, teammate1)
gt 14,
pass(teammateN) - distBetween(me, teammateN)
gt 14,
28Propositionalizing Rules
- Step 2 single-variable disjunctions
distBetween(me, Opponent) lt 7
distBetween(me,opponent1) lt 7 OR OR
distBetween(me,opponentN) lt 7
distBetween(me,opponent1)0,7
distBetween(me,opponentN )0,7 1
29Propositionalizing Rules
- Step 3 linked-variable disjunctions
distBetween(me, Player) gt 14,
distBetween(Player, goal) lt 10
newFeature(player1) newFeature(playerN) 1
newFeature(Player) - Dist1 is distBetween(me,
Player), Dist2 is distBetween(Player, goal),
Dist1 gt 14, Dist2 lt 10.
Add to target task feature space