Combining Genetics, Learning and Parenting

1 / 44

About This Presentation

Title:

Combining Genetics, Learning and Parenting

Description:

Combining Genetics, Learning and Parenting Michael Berger Based on: When to Apply the Fifth Commandment: The Effects of Parenting on Genetic and Learning Agents / – PowerPoint PPT presentation

Number of Views:5

Avg rating:3.0/5.0

Slides: 45

Provided by: Michael3040

more less

Transcript and Presenter's Notes

Title: Combining Genetics, Learning and Parenting

1
Combining Genetics, Learning and Parenting
Michael Berger
Based on When to Apply the Fifth Commandment
The Effects of Parenting on Genetic and Learning
Agents / Michael Berger and Jeffrey S.
Rosenschein Submitted to AAMAS 2004
2
Abstract Problem

Hidden state
Metric defined over state space
Condition C1 When state changes, it is only to
an adjacent state
Condition C2 State changes occur at a low, but
positive rate

3
The Environment
4
Agent Definitions

Reward Food Presence (0 or 1)
Perception ltPosition, Food Presencegt
Action ? NORTH, EAST, SOUTH, WEST, HALT
Memory ltltPer, Acgt, , ltPer, Acgt, Pergt
Memory length Mem No. of elements in memory
No. of possible memories (2ltGrid WidthgtltGrid
Heightgt)Mem 5Mem-1
MAM - Memory-Action Mapper
Table
One entry for every possible memory
ASF - Action-Selection Filter

5
Genetic Algorithm (I)

Algorithm on a complete population, not on a
single agent
Requires introduction of generations
Every generation consists of a new group of
agents
Each agent is created at the beginning of a
generation, and is terminated at its end
Agents life cycleBirth --gt Run (foraging) --gt
Possible matings --gt Death

6
Genetic Algorithm (II)

Each agent carries a gene sequence
Each gene has a key (memory) and a value (action)
A given memory determines the resultant action
Gene sequence remains constant during the
life-time of an agent
Gene sequence is determined at the mating stage
of an agents parents

7
Genetic Algorithm (III)

Mating consists of two stages
Selection stage - Determining mating rights.
Should be performed according to two principles
Survival of the fittest (as indicated in
performance during the life-time)
Preservation of genetic variance
Offspring creation stage
One or more parents create one or more offspring
Offspring inherit some combination of parents
gene sequence
Each of the stages has many variants

8
Genetic Algorithm Variant

Selection
Will be discussed later.
Offspring creation
Two parents mate and create two offspring
Gene sequences of parents are aligned against one
another, and then two processes occur
Random crossover
Random mutation
Resultant pair of gene sequences are inherited by
the offspring (one by each offspring).

9
Genetic Inheritance
10
Genetic Agent

MAM
Every entry is considered a gene
First column - Possible memory (key)
Second column - Action to take (value)
No changes after creation
Parameters

Memory length
Crossover probability for each gene pair
Mutation probability for each gene
11
Learning Algorithm

Reinforcement Learning type algorithm
After performing an action, agents receive a
signal informing them how well their choice of
action was (in this case, the reward)
Selected algorithm Q-learning with Boltzmann
exploration

12
Basic Q-Learning (I)

Definitions

Discount factor (non-negative, less than 1)
Reward at round j
Rewards Discounted sum at round n

Q-learning attempts to maximize the expected
rewards discounted sum of an agent as a function
of any given memory at any round n

13
Basic Q-Learning (II)

Q(s,a) - Q-value. The expected discounted sum
of future rewards for an agent when its memory is
s and it selects action a and follows an optimal
policy thereafter.
Q(s,a) is updated after every time an agent
selects action a when at memory s. After action
execution, agent receives reward r and contains
memory s. Q(s,a) is updated as follows

14
Basic Q-Learning (III)

Q(s,a) values can be stored in different forms
Neural network
Table (nicknamed a Q-table)
When saved as a Q-table, each row corresponds to
a possible memory s, and each column to a
possible action a.
When an agent contains memory s, it should simply
select an action a with that maximizes Q(s,a) -
right ???

Q(s,a) values can be stored in different forms
Neural network
Table (nicknamed a Q-table)
When saved as a Q-table, each row corresponds to
a possible memory s, and each column to a
possible action a.
When an agent contains memory s, it should simply
select an action a with that maximizes Q(s,a) -
WRONG !!!

15
Boltzmann Exploration (I)

Full exploitation of a Q-value might hide other,
better Q-values
Exploration of Q-values needed, at least in early
stages
Boltzmann explorationThe probability of
selecting action ai

16
Boltzmann Exploration (II)

t - An annealing temperature
At round n

t decreases gt exploration decreases,
exploitation increases
For a given s, the probability for selecting its
best Q-value approaches 1 as n increases
Variant here uses a freezing temperature

Freezing temperature - when t is below it,
exploration is replaced by full exploitation
17
Learning Agent

MAM
A Q-table (dynamic)
Parameters

Memory length
Learning rate
Rewards discount factor
Temperature annealing function
Freezing temperature
18
Parenting Algorithm

No classical parenting algorithm around, this
needs to be simulated
Selected algorithm Monte-Carlo (another
Reinforcement Learning type algorithm)

19
Monte-Carlo (I)

Some similarity to Q-learning
A table (nicknamed an MC-table) stores values
(MC-values) that describe how good it is to
take action a given memory s
Table dictates a policy of action-selection
Major differences from Q-learning
Table isnt modified after every round, but only
after episodes of rounds (in our case, a
generation)
Q-Value and MC-values have different meanings

20
Monte-Carlo (II)

Off-line version of Monte-Carlo
After completing an episode (generation) where
one table has dictated the action-selection
policy, a new, second table is constructed from
scratch to evaluate how good any action a is for
a given memory s
Second table will dictate policy in the next
episode (generation)
Equivalent to considering the second table as
being built during the current episode, as long
as it isnt used in the current episode

21
Monte-Carlo (III)

MC(s,a) is defined as the average of all rewards
received after memory s was encountered and
action a was selected
What if (s,a) was encountered more than once?
Every-visit variant
The average of all subsequent rewards is
calculated for each occurrence of (s,a)
MC(s,a) is the average of all calculated averages

22
Monte-Carlo (IV)

Every-visit variant more suitable than
first-visit variant (where only the first
encounter with (s,a) counts)
Environment can change a lot since the first
encounter with (s,a)
Exploration variants not used here
For a given memory s, action a with the highest
MC-value is selected
Full exploitation here because we have the
experience of the previous episode of rounds

23
Parenting Agent

MAM
An MC-table (doesnt matter if dynamic or static)
Dictates action-selection for offsprings only
ASF
Selects between the actions suggested by both
parents with equal chance
Parameters

Memory length
24
Complex Agent (I)

Contains a genetic agent, a learning agent and a
parenting agent in a subsumption architecture
Mating selection (debt from before) occurs among
complex agents
At a generations end, each agents average
reward serves as its score
Agents receive mating rights according to scores
strata (determined by scores average and
standard deviation)

25
Complex Agent (II)

Mediates between the inner agents and the
environment
Perceptions passed directly to inner agents
Actions suggested by all inner agents passed
through an ASF, which selects one of them
Parameters

ASFs prob. to select genetic action
ASFs prob. to select learning action
ASFs prob. to select parenting action
26
Complex Agent - Mating
ENVIRONMENT
27
Complex Agent - Perception
ENVIRONMENT
28
Complex Agent - Action
ENVIRONMENT
29
Experiment (I)

Measures
Eating-rate average reward for a given agent
(throughout its generation)
BER Best Eating-Rate (in a generation)
Framework
20 agents in generation
9500 generations
30000 rounds per generation
Dependent variable
Success measure (Lambda) - Average of the BERs in
the last 1000 generations

30
Experiment (II)

Environment
Grid 20 x 20
A single food patch, 5 x 5 in size

31
Experiment (III)

Constant values

1
0.02
0.005
1
0.2
0.95
5 0.999n
0.2
1
32
Experiment (IV)

Independent variables

Complex agent parametersASF probabilities (111
combinations)

Environment parameterProbability that in a
given round, the food patch moves in a random
direction (0, 10-6, 10-5, 10-4, 10-3, 10-2, 10-1)

Movement Probability

One run for each combination of values

33
Results Static Environment

Best combination
Genetic-Parenting hybrid (PLrn 0)
PGen gt PPar

Pure genetics dont perform well
GA converges slower if not assisted by learning
or parenting
Pure parenting performs poorly
For a given PPar, success improves as PLrn
decreases

(Graph for movement prob. 0)
34
Results Low Dynamic Rate

Best combination
Genetic-Learning-Parenting hybrid
PLrn gt PGen PPar
PPar gt PGen
Pure parenting performs poorly

(Graph for movement prob. 10-4)
35
Results High Dynamic Rate

Best combination
Pure learning(PGen 0,Ppar 0)
Pure parenting performs poorly
Parenting loses effectiveness
Non-parenting agents have better success

(Graph for movement prob. 10-2)
36
Conclusions

Pure parenting doesnt work
Agent algorithm A will be defined as an
action-augmentor of agent algorithm B if
A and B are always used for receiving perceptions
B is applied for executing an action in most
steps
A is applied for executing an action in at least
50 of the other steps
In a static enviornment (C1 C2), parenting
helps when used as an action-augmentor for
genetics
In slowly changing enviornments (C1 C2),
parenting helps when used as an action-augmentor
for learning
In quickly changing enviroments (C1 only),
parenting doesnt work - pure learning is best

37
Bibliography (I)

Genetic Algorithm
R. Axelrod. The complexity of Cooperation
Agent-Based Models of Competition and
Collaboration. Princeton University Press, 1997.
H.G. Cobb and J.J. Grefenstette. Genetic
algorithms for tracking changing environments. In
Proceedings of the Fifth International Conference
on Genetic Algorithms, pages 523-530, San Mateo,
1993.
Q-Learning
T.W. Sandholm and R.H. Crites. Multiagent
reinforcement learning in the iterated prisoners
dilemma. Biosystems, 37 147-166, 1996.
Monte-Carlo methods, Q-Learning, Reinforcement
Learning
R.S. Sutton and A.G. Barto. Reinforcement
Learning An Introduction. The MIT Press, 1998.

38
Bibliography (II)

Genetic-Learning combinations
G. E. Hinton and S. J. Nowlan. How learning can
guide evolution. In Adaptive Individuals in
Evolving Populations Models and Algorithms,
pages 447-454. Addison-Wesley, 1996.
T.D. Johnston. Selective costs and benefits in
the evolution of learning. In Adaptive
Individuals in Evolving Populations Models and
Algorithms, pages 315-358. Addison-Wesley, 1996.
M. Littman. Simulations combining evolution and
learning. In Adaptive Individuals in Evolving
Populations Models and Algorithms, pages
465-477. Addison-Wesley, 1996.
G. Mayley. Landscapes, learning costs and genetic
assimilation. Evolutionary Computation, 4(3)
213-234, 1996.

39
Bibliography (III)

Genetic-Learning combinations (cont.)
S. Nolfi, J.L. Elman and D. Parisi. Learning and
evolution in neural networks. Adaptive Behavior,
3(1) 5-28, 1994.
S. Nolfi and D. Parisi. Learning to adapt to
changing environments in evolving neural
networks. Adaptive Behavior, 5(1) 75-98, 1997.
D. Parisi and S.Nolfi. The influence of learning
on evolution. Models and Algorithms, pages
419-428. Addison-Wesley, 1996.
P.M. Todd and G.F. Miller. Exploring adaptive
agency II Simulating the evolution of
associative learning. In From Animals to Animats
Proceedings of the First International Conference
on Simulation of Adaptive Behavior, pages
306-315, San Mateo, 1991.

40
Bibliography (IV)

Exploitation vs. Exploration
D. Carmel and S. Markovitch. Exploration
strategies for model-based learning in multiagent
systems. Autonomous Agents and Multi-agent
Systems, 2(2) 141-172, 1999.
Subsumption architecture
R.A. Brooks. A robust layered control system for
a mobile robot. IEEE Journal of Robotics and
Automation, 2(1) 14-23, March 1986.

41
Backup - Qualitative Data
42
Qual. Data Mov. Prob. 0
Pure Parenting
Pure Genetics
Pure Learning
Best (0.7, 0, 0.3)
43
Qual. Data Mov. Prob. 10-4
Pure Parenting
Pure Learning
Best (0.03, 0.9, 0.07)
44
Qual. Data Mov. Prob. 10-2
Pure Parenting
(0.09, 0.9, 0.01)
Best Pure Learning

Write a Comment

User Comments (0)