Skill Acquisition via Transfer Learning and Advice Taking - PowerPoint PPT Presentation

About This Presentation
Title:

Skill Acquisition via Transfer Learning and Advice Taking

Description:

distBetween(me,teammate1) = 10. distBetween(me,opponent1) = 5. action = pass(teammate2) ... Madden & Howley (AI Review 2004) Transfer via relational RL ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 30
Provided by: david1067
Learn more at: https://ftp.cs.wisc.edu
Category:

less

Transcript and Presenter's Notes

Title: Skill Acquisition via Transfer Learning and Advice Taking


1
Skill Acquisition via Transfer Learningand
Advice Taking
Lisa Torrey, Jude Shavlik, Trevor
Walker University of Wisconsin-Madison, USA
Richard Maclin University of Minnesota-Duluth,
USA
2
Transfer Learning
Agent learns Task A
3
Transfer Learning
The goal for the target task
with transfer
without transfer
performance
training
4
Reinforcement Learning Overview
Described by a set of features
Take an action
Observe world state
Policy choose the action with the highest
Q-value in the current state
Receive a reward
Use the rewards to estimate the Q-values of
actions in states
5
Transfer in Reinforcement Learning
  • What knowledge will we transfer from the source?
  • Q-functions (Taylor Stone 2005)
  • Policies (Torrey et al. 2005)
  • Skills (this work)
  • How will we extract that knowledge from the
    source?
  • From Q-functions (Torrey et al. 2005)
  • From observed behavior (this work)
  • How will we apply that knowledge in the target?
  • Model reuse (Taylor Stone 2005)
  • Advice taking (Torrey et al. 2005, this work)

6
Advice Taking
  • Advice instructions for the learner

In these states Qaction1 gt Qaction2
IF condition THEN prefer action
7
Experimental Domain RoboCup
BreakAway (BA)
Score a goal Maclin et al. 2005
Different objectives, but a transferable skill
passing to teammates
8
A Challenge for Skill Transfer
  • Shared skills are not exactly the same
  • Skills have general and specific aspects
  • Aspects of the pass skill in RoboCup
  • General teammate must be open
  • Game-specific where teammate should be located
  • Player-specific whether teammate is nearest or
    furthest

9
Addressing the Challenge
  • We focus on learning general skill aspects
  • These should transfer better
  • We learn skills that apply to multiple players
  • This generalizes over player-specific aspects
  • We allow humans to provide information
  • They can point out game-specific aspects

10
Human-Provided Information
  • User provides a mapping to show task similarities
  • May also provide user advice about task
    differences

Pass Ø Ø
Pass towards goal Move towards goal Shoot at goal
11
Our Transfer Algorithm
Observe source task games to learn skills
Translate learned skills into transfer advice
Create advice for the target task
If there is user advice, add it in
Learn target task with KBKR
12
Learning Skills By Observation
  • Source-task games are sequences (state, action)
  • Learning skills is like learning to classify
    states by their correct actions
  • We use Inductive Logic Programming to learn
    classifiers

13
Advantages of ILP
  • Can produce first-order rules for skills
  • Capture only the essential aspects of the skill
  • We expect these aspects to transfer better
  • Can incorporate background knowledge

pass(teammate1)
.
pass(Teammate)
vs.
.
.
pass(teammateN)
14
Preparing Datasets for ILP
15
Example of a Skill Learned
pass(Teammate) - distBetween(me, Teammate) gt
14, passAngle(Teammate) gt 30,
passAngle(Teammate) lt 150, distBetween(me,
Opponent) lt 7.
16
Technical Challenges
  • KBKR requires propositional advice
  • We instantiate each rule head
  • Variables in rule bodies create disjunctions
  • We use tile features to translate them
  • Variables can appear multiple times
  • We create new features to translate them

17
Two Experimental Scenarios
Pass Ø Ø
Pass towards goal Move towards goal Shoot at goal
4-on-3 MKA
3-on-2 BA
Pass MoveAhead Ø
Pass MoveAhead Shoot at goal
3-on-2 BA
3-on-2 MD
18
Skill Transfer Results
From MKA
Without transfer
From MD
19
Breakdown of MKA Results
20
What if User Advice is Bad?
21
Related Work
  • Q-function transfer in RoboCup
  • Taylor Stone (AAMAS 2005, AAAI 2005)
  • Transfer via policy reuse
  • Fernandez Veloso (AAMAS 2006, ICML workshop
    2006)
  • Madden Howley (AI Review 2004)
  • Transfer via relational RL
  • Driessens et al. (ICML workshop 2006)

22
Summary of Contributions
  • Transfer of shared skills in high-level logic
  • Despite differences in shared skills
  • Demonstration of the value of user guidance
  • Easy to give and beneficial
  • Effective transfer in the RoboCup domain
  • Challenging and dissimilar tasks

23
Future Work
  • Learn more general skills by combining multiple
    source tasks
  • Compare several transfer methods on RoboCup
    scenarios of varying difficulty
  • Reach similar levels of transfer with less user
    input

24
Acknowledgements
  • DARPA Grant HR0011-04-1-0007
  • US Naval Research Laboratory Grant
    N00173-06-1-G002

Thank You
25
User Advice
IF distBetween(me,goal) lt 10 AND
angle(goal, me, goalie) gt 40 THEN prefer shoot
This is the part that came from transfer
IF distBetween(me,goal) gt 10 THEN prefer
move_ahead
IF transferred conditions
AND
distBetween(Teammate,goal) lt distBetween(me,goal)
THEN prefer pass(Teammate)
26
Feature Tiling
Original feature
max value
min value
Tiling 1
Tiling 2


Tiling 8
(16 tiles)
Tiling 9
Tiling 10
(8 tiles)
Tiling 11
(8 tiles)
27
Propositionalizing Rules
  • Step 1 rule head

pass(Teammate) - distBetween(me, Teammate) gt
14,
pass(teammate1) - distBetween(me, teammate1)
gt 14,
pass(teammateN) - distBetween(me, teammateN)
gt 14,

28
Propositionalizing Rules
  • Step 2 single-variable disjunctions

distBetween(me, Opponent) lt 7
distBetween(me,opponent1) lt 7 OR OR
distBetween(me,opponentN) lt 7
distBetween(me,opponent1)0,7
distBetween(me,opponentN )0,7 1
29
Propositionalizing Rules
  • Step 3 linked-variable disjunctions

distBetween(me, Player) gt 14,
distBetween(Player, goal) lt 10
newFeature(player1) newFeature(playerN) 1
newFeature(Player) - Dist1 is distBetween(me,
Player), Dist2 is distBetween(Player, goal),
Dist1 gt 14, Dist2 lt 10.
Add to target task feature space
Write a Comment
User Comments (0)
About PowerShow.com