Transfer Learning in Jean - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Transfer Learning in Jean

Description:

ESS: Experimental State Splitting. The Domain: ISIS. The Games Lattice ... Experimental State Splitting (ESS) differentiates, or splits, states to increase ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 27

Provided by: david1738

Category:

more less

Transcript and Presenter's Notes

Title: Transfer Learning in Jean

1
Transfer Learning in Jean

Paul R. Cohen
Clayton T. Morrison
Yu-Han Chang
Joshua Moody

2
Outline

How Jean Does Transfer
ESS Experimental State Splitting
The Domain ISIS
The Games Lattice
Experiment and Evaluation
Experiment
Metrics
Results

3
Jean, ESS and Transfer Learning

Incrementally build knowledge of world over time
Procedural knowledge as Finite State Machines
Transitions occur when we choose to execute a new
action
Experimental State Splitting (ESS)
differentiates, or splits, states to increase
predictive power of our model
ESS experiments with many hypotheses to find
causal accounts for the observed results
A theory of transfer learning includes both what
to transfer as well as when it is appropriate to
transfer.

4
Domain Military tactics

Goal Eliminate all enemy troops
Enemy evades our troops if they can sight us

5
Sensors to States to Knowledge

Data from many sensors, often continuous
States determined by regions where some sensors
stay constant, or fall into certain regions
FSMs control behavior based on state
Use ESS to expand FSM models and make them more
predictive

Approaching
Enemy evading
Steady state chasing
6
Experimental State Splitting

Building up a model of the world

1/2
fire
1
1
?GOAL
run, crawl, or fire
1/2
GOAL
crawl or run
fire
7
Experimental State Splitting
?GOAL
1/2
fire

Splitting

?GOAL
1/2
fire
GOAL
8
Experimental State Splitting
?GOAL
1/2
fire

Splitting

?GOAL
1/2
fire
GOAL
4/5
?GOAL
?GOAL
0
fire
fire
?GOAL D ? 200
?GOAL D lt 200
1/5
1
fire
fire
GOAL
GOAL
9
Transferring Causal Knowledge

Create splits in order to decrease entropy of the
next-state distributions
Transfer learned state machines (or causal
sub-components) between domains / test problems
Store in memory a repository of causal state
machines or components

Evasive Enemy
Enemy in Hilly Terrain
?GOAL NOT VISIBLE
find unit
?GOAL FAR
crawl
?GOAL FAR
crawl
?GOAL NEAR
?GOAL NEAR
GOAL
fire
GOAL
fire
10
The Domain ISIS
11
ISIS

What is ISIS?
Real-time tactical and strategic military
simulation
Allows first-person perspective
Military scenarios, simulated robot

Why is ISIS good for TL?
Able to configure scenarios of varying
complexity range from single-unit maneuvers to
complex coordinated operations
Require different types and combinations of
knowledge

12
Game/Scenario Lattice
Schemas (static, dynamic, action) learned in
one game transfered to another. Not all
transfer is relevant and sometimes may be
detrimental (in absence of other knowledge).
Full
Intermediate
Link absence little that is relevant to
transfer.
Blue useful transfer
Red if SOLE transfer, then detrimental
Basic
Positive transfer speeds Learning in new game
13
Experiment
14
Scenarios

Scenarios
Restrained Mobile
lt range iii
gt range iii
Full Mobile 50 lt, 50 gt
Mountains
Entrenched

Dependent Variables Final unit strength,
Time to complete task
Engagement Ranges
15
Learning
Early
Later
Early failure Running at opponent allows them to
see you and they escape
Later Success Sneak up to opponent until
close then attack.
16
Scenario/Experiment Relations
17
Protocol
A Protocol for Generating Learning Curves

Tick - Jean gets ISIS state, selects and runs
controller for fixed time in ISIS.
Trial - a series of 100 ticks in a given
scenario.
Training Phase - a set of 20 learning trials.
Testing Phase - a fixed set of 10 test trials
(not learning). Mean/Variance performance on
test is recorded.
Performance Unit - one training phase followed by
one test phase. The test from a performance unit
1 point on a learning curve.
Replication - a series of 10 performance units.
A complete replication 1 learning curve (with
10 points).

Performance Unit
Replication
18
Protocol
A Protocol for Generating Learning Curves

Based on BEP
Test Condition 1
Administer B scenario for one replication epoch
B, test B
Copy eval data, Copy Jeans memory, Wipe Jeans
memory
Test Condition 2
Training Phase
Administer A scenario for one replication epoch
A test with A.
Copy eval data Copy Jeans memory Do NOT wipe
jeans memory
Testing Phase
Administer B scenario for one replication epoch
B test B.
Copy eval data copy Jeans memory Wipe Jeans
memory

19
Experiment 1 Results
20
Experiment 2 Results
21
Experiment 3 Results
22
Results Summary
23
END
24
Y1 Internal Results

Metric Ratio in areas below each learning curve
Area(B) / Area(AB)
Experiment 1 1.704
P-value 0.035
Sampling distribution for null hypothesis (AB is
the same as B) generated by randomization-bootstra
p

25
Y1 Internal Results
Observed

Sampling distribution for null hypothesis
Vertical line marks our observed statistic
P-value 0.035
Sampling distribution of the difference between
the areas of the B and AB curves

2.5 quantile
97.5 quantile
26
Publications

St. Amant, R., Morrison, C. T., Chang, Y., Mu,
W., Cohen, P. R. and Beal, C. (2006). An Image
Schema Language. In Proceedings of the
International Conference on Cognitive Modeling
(ICCM 2006).
Chang, Y., Morrison, C. T., Kerr, W., Galstyan,
A., Cohen, P. R., Beal, C., St. Amant, R. and
Oates, T. (2006). The Jean System. In
Proceedings of the 5th International Conference
on Development and Learning (ICDL 2006).
Chang, Y., Cohen, P., Morrison, C. T. and St.
Amant, R. (2006). Piagetian Adaptation Meets
Image Schemas The Jean System. In Proceedings
of the Ninth International Conference on the
Simulation of Adaptive Behavior (SAB 2006).
Morrison, C. T., Chang, Y., Cohen, P. R., Moody,
J. (2006). Transfer Learning with the Jean
System. In Proceedings of the ICML 2006 Workshop
on Structural Knowledge Transfer for Machine
Learning.