Skill Acquisition via Transfer Learning and Advice Taking - PowerPoint PPT Presentation

About This Presentation

Title:

Skill Acquisition via Transfer Learning and Advice Taking

Description:

distBetween(me,teammate1) = 10. distBetween(me,opponent1) = 5. action = pass(teammate2) ... Madden & Howley (AI Review 2004) Transfer via relational RL ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 30

Provided by: david1067

Learn more at: https://ftp.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Skill Acquisition via Transfer Learning and Advice Taking

1
Skill Acquisition via Transfer Learningand
Advice Taking
Lisa Torrey, Jude Shavlik, Trevor
Walker University of Wisconsin-Madison, USA
Richard Maclin University of Minnesota-Duluth,
USA
2
Transfer Learning
Agent learns Task A
3
Transfer Learning
The goal for the target task
with transfer
without transfer
performance
training
4
Reinforcement Learning Overview
Described by a set of features
Take an action
Observe world state
Policy choose the action with the highest
Q-value in the current state
Receive a reward
Use the rewards to estimate the Q-values of
actions in states
5
Transfer in Reinforcement Learning

What knowledge will we transfer from the source?
Q-functions (Taylor Stone 2005)
Policies (Torrey et al. 2005)
Skills (this work)
How will we extract that knowledge from the
source?
From Q-functions (Torrey et al. 2005)
From observed behavior (this work)
How will we apply that knowledge in the target?
Model reuse (Taylor Stone 2005)
Advice taking (Torrey et al. 2005, this work)

6
Advice Taking

Advice instructions for the learner

In these states Qaction1 gt Qaction2
IF condition THEN prefer action
7
Experimental Domain RoboCup
BreakAway (BA)
Score a goal Maclin et al. 2005
Different objectives, but a transferable skill
passing to teammates
8
A Challenge for Skill Transfer

Shared skills are not exactly the same
Skills have general and specific aspects
Aspects of the pass skill in RoboCup
General teammate must be open
Game-specific where teammate should be located
Player-specific whether teammate is nearest or
furthest

9
Addressing the Challenge

We focus on learning general skill aspects
These should transfer better
We learn skills that apply to multiple players
This generalizes over player-specific aspects
We allow humans to provide information
They can point out game-specific aspects

10
Human-Provided Information

User provides a mapping to show task similarities

May also provide user advice about task
differences

Pass Ø Ø
Pass towards goal Move towards goal Shoot at goal
11
Our Transfer Algorithm
Observe source task games to learn skills
Translate learned skills into transfer advice
Create advice for the target task
If there is user advice, add it in
Learn target task with KBKR
12
Learning Skills By Observation

Source-task games are sequences (state, action)
Learning skills is like learning to classify
states by their correct actions
We use Inductive Logic Programming to learn
classifiers

13
Advantages of ILP

Can produce first-order rules for skills
Capture only the essential aspects of the skill
We expect these aspects to transfer better
Can incorporate background knowledge

pass(teammate1)
.
pass(Teammate)
vs.
.
.
pass(teammateN)
14
Preparing Datasets for ILP
15
Example of a Skill Learned
pass(Teammate) - distBetween(me, Teammate) gt
14, passAngle(Teammate) gt 30,
passAngle(Teammate) lt 150, distBetween(me,
Opponent) lt 7.
16
Technical Challenges

KBKR requires propositional advice
We instantiate each rule head
Variables in rule bodies create disjunctions
We use tile features to translate them
Variables can appear multiple times
We create new features to translate them

17
Two Experimental Scenarios
Pass Ø Ø
Pass towards goal Move towards goal Shoot at goal
4-on-3 MKA
3-on-2 BA
Pass MoveAhead Ø
Pass MoveAhead Shoot at goal
3-on-2 BA
3-on-2 MD
18
Skill Transfer Results
From MKA
Without transfer
From MD
19
Breakdown of MKA Results
20
What if User Advice is Bad?
21
Related Work

Q-function transfer in RoboCup
Taylor Stone (AAMAS 2005, AAAI 2005)
Transfer via policy reuse
Fernandez Veloso (AAMAS 2006, ICML workshop
2006)
Madden Howley (AI Review 2004)
Transfer via relational RL
Driessens et al. (ICML workshop 2006)

22
Summary of Contributions

Transfer of shared skills in high-level logic
Despite differences in shared skills
Demonstration of the value of user guidance
Easy to give and beneficial
Effective transfer in the RoboCup domain
Challenging and dissimilar tasks

23
Future Work

Learn more general skills by combining multiple
source tasks
Compare several transfer methods on RoboCup
scenarios of varying difficulty
Reach similar levels of transfer with less user
input

24
Acknowledgements

DARPA Grant HR0011-04-1-0007
US Naval Research Laboratory Grant
N00173-06-1-G002

Thank You
25
User Advice
IF distBetween(me,goal) lt 10 AND
angle(goal, me, goalie) gt 40 THEN prefer shoot
This is the part that came from transfer
IF distBetween(me,goal) gt 10 THEN prefer
move_ahead
IF transferred conditions
AND
distBetween(Teammate,goal) lt distBetween(me,goal)
THEN prefer pass(Teammate)
26
Feature Tiling
Original feature
max value
min value
Tiling 1
Tiling 2

Tiling 8
(16 tiles)
Tiling 9
Tiling 10
(8 tiles)
Tiling 11
(8 tiles)
27
Propositionalizing Rules

Step 1 rule head

pass(Teammate) - distBetween(me, Teammate) gt
14,
pass(teammate1) - distBetween(me, teammate1)
gt 14,
pass(teammateN) - distBetween(me, teammateN)
gt 14,

28
Propositionalizing Rules

Step 2 single-variable disjunctions

distBetween(me, Opponent) lt 7
distBetween(me,opponent1) lt 7 OR OR
distBetween(me,opponentN) lt 7
distBetween(me,opponent1)0,7
distBetween(me,opponentN )0,7 1
29
Propositionalizing Rules

Step 3 linked-variable disjunctions

distBetween(me, Player) gt 14,
distBetween(Player, goal) lt 10
newFeature(player1) newFeature(playerN) 1
newFeature(Player) - Dist1 is distBetween(me,
Player), Dist2 is distBetween(Player, goal),
Dist1 gt 14, Dist2 lt 10.
Add to target task feature space

Write a Comment

User Comments (0)