Title: Transfer Learning in Jean
1Transfer Learning in Jean
- Paul R. Cohen
- Clayton T. Morrison
- Yu-Han Chang
- Joshua Moody
2Outline
- How Jean Does Transfer
- ESS Experimental State Splitting
- The Domain ISIS
- The Games Lattice
- Experiment and Evaluation
- Experiment
- Metrics
- Results
3Jean, ESS and Transfer Learning
- Incrementally build knowledge of world over time
- Procedural knowledge as Finite State Machines
- Transitions occur when we choose to execute a new
action - Experimental State Splitting (ESS)
differentiates, or splits, states to increase
predictive power of our model - ESS experiments with many hypotheses to find
causal accounts for the observed results - A theory of transfer learning includes both what
to transfer as well as when it is appropriate to
transfer.
4Domain Military tactics
- Goal Eliminate all enemy troops
- Enemy evades our troops if they can sight us
5Sensors to States to Knowledge
- Data from many sensors, often continuous
- States determined by regions where some sensors
stay constant, or fall into certain regions - FSMs control behavior based on state
- Use ESS to expand FSM models and make them more
predictive
Approaching
Enemy evading
Steady state chasing
6Experimental State Splitting
- Building up a model of the world
1/2
fire
1
1
?GOAL
run, crawl, or fire
1/2
GOAL
crawl or run
fire
7Experimental State Splitting
?GOAL
1/2
fire
?GOAL
1/2
fire
GOAL
8Experimental State Splitting
?GOAL
1/2
fire
?GOAL
1/2
fire
GOAL
4/5
?GOAL
?GOAL
0
fire
fire
?GOAL D ? 200
?GOAL D lt 200
1/5
1
fire
fire
GOAL
GOAL
9Transferring Causal Knowledge
- Create splits in order to decrease entropy of the
next-state distributions - Transfer learned state machines (or causal
sub-components) between domains / test problems - Store in memory a repository of causal state
machines or components
Evasive Enemy
Enemy in Hilly Terrain
?GOAL NOT VISIBLE
find unit
?GOAL FAR
crawl
?GOAL FAR
crawl
?GOAL NEAR
?GOAL NEAR
GOAL
fire
GOAL
fire
10The Domain ISIS
11ISIS
- What is ISIS?
- Real-time tactical and strategic military
simulation - Allows first-person perspective
- Military scenarios, simulated robot
- Why is ISIS good for TL?
- Able to configure scenarios of varying
complexity range from single-unit maneuvers to
complex coordinated operations - Require different types and combinations of
knowledge
12Game/Scenario Lattice
Schemas (static, dynamic, action) learned in
one game transfered to another. Not all
transfer is relevant and sometimes may be
detrimental (in absence of other knowledge).
Full
Intermediate
Link absence little that is relevant to
transfer.
Blue useful transfer
Red if SOLE transfer, then detrimental
Basic
Positive transfer speeds Learning in new game
13Experiment
14Scenarios
- Scenarios
- Restrained Mobile
- lt range iii
- gt range iii
- Full Mobile 50 lt, 50 gt
- Mountains
- Entrenched
Dependent Variables Final unit strength,
Time to complete task
Engagement Ranges
15Learning
Early
Later
Early failure Running at opponent allows them to
see you and they escape
Later Success Sneak up to opponent until
close then attack.
16Scenario/Experiment Relations
17Protocol
A Protocol for Generating Learning Curves
- Tick - Jean gets ISIS state, selects and runs
controller for fixed time in ISIS. - Trial - a series of 100 ticks in a given
scenario. - Training Phase - a set of 20 learning trials.
- Testing Phase - a fixed set of 10 test trials
(not learning). Mean/Variance performance on
test is recorded. - Performance Unit - one training phase followed by
one test phase. The test from a performance unit
1 point on a learning curve. - Replication - a series of 10 performance units.
A complete replication 1 learning curve (with
10 points).
Performance Unit
Replication
18Protocol
A Protocol for Generating Learning Curves
- Based on BEP
- Test Condition 1
- Administer B scenario for one replication epoch
B, test B - Copy eval data, Copy Jeans memory, Wipe Jeans
memory - Test Condition 2
- Training Phase
- Administer A scenario for one replication epoch
A test with A. - Copy eval data Copy Jeans memory Do NOT wipe
jeans memory - Testing Phase
- Administer B scenario for one replication epoch
B test B. - Copy eval data copy Jeans memory Wipe Jeans
memory
19Experiment 1 Results
20Experiment 2 Results
21Experiment 3 Results
22Results Summary
23END
24Y1 Internal Results
- Metric Ratio in areas below each learning curve
- Area(B) / Area(AB)
- Experiment 1 1.704
- P-value 0.035
- Sampling distribution for null hypothesis (AB is
the same as B) generated by randomization-bootstra
p
25Y1 Internal Results
Observed
- Sampling distribution for null hypothesis
- Vertical line marks our observed statistic
- P-value 0.035
-
- Sampling distribution of the difference between
the areas of the B and AB curves
2.5 quantile
97.5 quantile
26Publications
- St. Amant, R., Morrison, C. T., Chang, Y., Mu,
W., Cohen, P. R. and Beal, C. (2006). An Image
Schema Language. In Proceedings of the
International Conference on Cognitive Modeling
(ICCM 2006). - Chang, Y., Morrison, C. T., Kerr, W., Galstyan,
A., Cohen, P. R., Beal, C., St. Amant, R. and
Oates, T. (2006). The Jean System. In
Proceedings of the 5th International Conference
on Development and Learning (ICDL 2006). - Chang, Y., Cohen, P., Morrison, C. T. and St.
Amant, R. (2006). Piagetian Adaptation Meets
Image Schemas The Jean System. In Proceedings
of the Ninth International Conference on the
Simulation of Adaptive Behavior (SAB 2006). - Morrison, C. T., Chang, Y., Cohen, P. R., Moody,
J. (2006). Transfer Learning with the Jean
System. In Proceedings of the ICML 2006 Workshop
on Structural Knowledge Transfer for Machine
Learning.