Combining Genetics, Learning and Parenting

1 / 44
About This Presentation
Title:

Combining Genetics, Learning and Parenting

Description:

Combining Genetics, Learning and Parenting Michael Berger Based on: When to Apply the Fifth Commandment: The Effects of Parenting on Genetic and Learning Agents / – PowerPoint PPT presentation

Number of Views:5
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Combining Genetics, Learning and Parenting


1
Combining Genetics, Learning and Parenting
Michael Berger
Based on When to Apply the Fifth Commandment
The Effects of Parenting on Genetic and Learning
Agents / Michael Berger and Jeffrey S.
Rosenschein Submitted to AAMAS 2004
2
Abstract Problem
  • Hidden state
  • Metric defined over state space
  • Condition C1 When state changes, it is only to
    an adjacent state
  • Condition C2 State changes occur at a low, but
    positive rate

3
The Environment
4
Agent Definitions
  • Reward Food Presence (0 or 1)
  • Perception ltPosition, Food Presencegt
  • Action ? NORTH, EAST, SOUTH, WEST, HALT
  • Memory ltltPer, Acgt, , ltPer, Acgt, Pergt
  • Memory length Mem No. of elements in memory
  • No. of possible memories (2ltGrid WidthgtltGrid
    Heightgt)Mem 5Mem-1
  • MAM - Memory-Action Mapper
  • Table
  • One entry for every possible memory
  • ASF - Action-Selection Filter

5
Genetic Algorithm (I)
  • Algorithm on a complete population, not on a
    single agent
  • Requires introduction of generations
  • Every generation consists of a new group of
    agents
  • Each agent is created at the beginning of a
    generation, and is terminated at its end
  • Agents life cycleBirth --gt Run (foraging) --gt
    Possible matings --gt Death

6
Genetic Algorithm (II)
  • Each agent carries a gene sequence
  • Each gene has a key (memory) and a value (action)
  • A given memory determines the resultant action
  • Gene sequence remains constant during the
    life-time of an agent
  • Gene sequence is determined at the mating stage
    of an agents parents

7
Genetic Algorithm (III)
  • Mating consists of two stages
  • Selection stage - Determining mating rights.
    Should be performed according to two principles
  • Survival of the fittest (as indicated in
    performance during the life-time)
  • Preservation of genetic variance
  • Offspring creation stage
  • One or more parents create one or more offspring
  • Offspring inherit some combination of parents
    gene sequence
  • Each of the stages has many variants

8
Genetic Algorithm Variant
  • Selection
  • Will be discussed later.
  • Offspring creation
  • Two parents mate and create two offspring
  • Gene sequences of parents are aligned against one
    another, and then two processes occur
  • Random crossover
  • Random mutation
  • Resultant pair of gene sequences are inherited by
    the offspring (one by each offspring).

9
Genetic Inheritance
10
Genetic Agent
  • MAM
  • Every entry is considered a gene
  • First column - Possible memory (key)
  • Second column - Action to take (value)
  • No changes after creation
  • Parameters

Memory length
Crossover probability for each gene pair
Mutation probability for each gene
11
Learning Algorithm
  • Reinforcement Learning type algorithm
  • After performing an action, agents receive a
    signal informing them how well their choice of
    action was (in this case, the reward)
  • Selected algorithm Q-learning with Boltzmann
    exploration

12
Basic Q-Learning (I)
  • Definitions

Discount factor (non-negative, less than 1)
Reward at round j
Rewards Discounted sum at round n
  • Q-learning attempts to maximize the expected
    rewards discounted sum of an agent as a function
    of any given memory at any round n

13
Basic Q-Learning (II)
  • Q(s,a) - Q-value. The expected discounted sum
    of future rewards for an agent when its memory is
    s and it selects action a and follows an optimal
    policy thereafter.
  • Q(s,a) is updated after every time an agent
    selects action a when at memory s. After action
    execution, agent receives reward r and contains
    memory s. Q(s,a) is updated as follows

14
Basic Q-Learning (III)
  • Q(s,a) values can be stored in different forms
  • Neural network
  • Table (nicknamed a Q-table)
  • When saved as a Q-table, each row corresponds to
    a possible memory s, and each column to a
    possible action a.
  • When an agent contains memory s, it should simply
    select an action a with that maximizes Q(s,a) -
    right ???
  • Q(s,a) values can be stored in different forms
  • Neural network
  • Table (nicknamed a Q-table)
  • When saved as a Q-table, each row corresponds to
    a possible memory s, and each column to a
    possible action a.
  • When an agent contains memory s, it should simply
    select an action a with that maximizes Q(s,a) -
    WRONG !!!

15
Boltzmann Exploration (I)
  • Full exploitation of a Q-value might hide other,
    better Q-values
  • Exploration of Q-values needed, at least in early
    stages
  • Boltzmann explorationThe probability of
    selecting action ai

16
Boltzmann Exploration (II)
  • t - An annealing temperature
  • At round n
  • t decreases gt exploration decreases,
    exploitation increases
  • For a given s, the probability for selecting its
    best Q-value approaches 1 as n increases
  • Variant here uses a freezing temperature

Freezing temperature - when t is below it,
exploration is replaced by full exploitation
17
Learning Agent
  • MAM
  • A Q-table (dynamic)
  • Parameters

Memory length
Learning rate
Rewards discount factor
Temperature annealing function
Freezing temperature
18
Parenting Algorithm
  • No classical parenting algorithm around, this
    needs to be simulated
  • Selected algorithm Monte-Carlo (another
    Reinforcement Learning type algorithm)

19
Monte-Carlo (I)
  • Some similarity to Q-learning
  • A table (nicknamed an MC-table) stores values
    (MC-values) that describe how good it is to
    take action a given memory s
  • Table dictates a policy of action-selection
  • Major differences from Q-learning
  • Table isnt modified after every round, but only
    after episodes of rounds (in our case, a
    generation)
  • Q-Value and MC-values have different meanings

20
Monte-Carlo (II)
  • Off-line version of Monte-Carlo
  • After completing an episode (generation) where
    one table has dictated the action-selection
    policy, a new, second table is constructed from
    scratch to evaluate how good any action a is for
    a given memory s
  • Second table will dictate policy in the next
    episode (generation)
  • Equivalent to considering the second table as
    being built during the current episode, as long
    as it isnt used in the current episode

21
Monte-Carlo (III)
  • MC(s,a) is defined as the average of all rewards
    received after memory s was encountered and
    action a was selected
  • What if (s,a) was encountered more than once?
  • Every-visit variant
  • The average of all subsequent rewards is
    calculated for each occurrence of (s,a)
  • MC(s,a) is the average of all calculated averages

22
Monte-Carlo (IV)
  • Every-visit variant more suitable than
    first-visit variant (where only the first
    encounter with (s,a) counts)
  • Environment can change a lot since the first
    encounter with (s,a)
  • Exploration variants not used here
  • For a given memory s, action a with the highest
    MC-value is selected
  • Full exploitation here because we have the
    experience of the previous episode of rounds

23
Parenting Agent
  • MAM
  • An MC-table (doesnt matter if dynamic or static)
  • Dictates action-selection for offsprings only
  • ASF
  • Selects between the actions suggested by both
    parents with equal chance
  • Parameters

Memory length
24
Complex Agent (I)
  • Contains a genetic agent, a learning agent and a
    parenting agent in a subsumption architecture
  • Mating selection (debt from before) occurs among
    complex agents
  • At a generations end, each agents average
    reward serves as its score
  • Agents receive mating rights according to scores
    strata (determined by scores average and
    standard deviation)

25
Complex Agent (II)
  • Mediates between the inner agents and the
    environment
  • Perceptions passed directly to inner agents
  • Actions suggested by all inner agents passed
    through an ASF, which selects one of them
  • Parameters

ASFs prob. to select genetic action
ASFs prob. to select learning action
ASFs prob. to select parenting action
26
Complex Agent - Mating
ENVIRONMENT
27
Complex Agent - Perception
ENVIRONMENT
28
Complex Agent - Action
ENVIRONMENT
29
Experiment (I)
  • Measures
  • Eating-rate average reward for a given agent
    (throughout its generation)
  • BER Best Eating-Rate (in a generation)
  • Framework
  • 20 agents in generation
  • 9500 generations
  • 30000 rounds per generation
  • Dependent variable
  • Success measure (Lambda) - Average of the BERs in
    the last 1000 generations

30
Experiment (II)
  • Environment
  • Grid 20 x 20
  • A single food patch, 5 x 5 in size

31
Experiment (III)
  • Constant values

1
0.02
0.005
1
0.2
0.95
5 0.999n
0.2
1
32
Experiment (IV)
  • Independent variables
  • Complex agent parametersASF probabilities (111
    combinations)
  • Environment parameterProbability that in a
    given round, the food patch moves in a random
    direction (0, 10-6, 10-5, 10-4, 10-3, 10-2, 10-1)

Movement Probability
  • One run for each combination of values

33
Results Static Environment
  • Best combination
  • Genetic-Parenting hybrid (PLrn 0)
  • PGen gt PPar
  • Pure genetics dont perform well
  • GA converges slower if not assisted by learning
    or parenting
  • Pure parenting performs poorly
  • For a given PPar, success improves as PLrn
    decreases

(Graph for movement prob. 0)
34
Results Low Dynamic Rate
  • Best combination
  • Genetic-Learning-Parenting hybrid
  • PLrn gt PGen PPar
  • PPar gt PGen
  • Pure parenting performs poorly

(Graph for movement prob. 10-4)
35
Results High Dynamic Rate
  • Best combination
  • Pure learning(PGen 0,Ppar 0)
  • Pure parenting performs poorly
  • Parenting loses effectiveness
  • Non-parenting agents have better success

(Graph for movement prob. 10-2)
36
Conclusions
  • Pure parenting doesnt work
  • Agent algorithm A will be defined as an
    action-augmentor of agent algorithm B if
  • A and B are always used for receiving perceptions
  • B is applied for executing an action in most
    steps
  • A is applied for executing an action in at least
    50 of the other steps
  • In a static enviornment (C1 C2), parenting
    helps when used as an action-augmentor for
    genetics
  • In slowly changing enviornments (C1 C2),
    parenting helps when used as an action-augmentor
    for learning
  • In quickly changing enviroments (C1 only),
    parenting doesnt work - pure learning is best

37
Bibliography (I)
  • Genetic Algorithm
  • R. Axelrod. The complexity of Cooperation
    Agent-Based Models of Competition and
    Collaboration. Princeton University Press, 1997.
  • H.G. Cobb and J.J. Grefenstette. Genetic
    algorithms for tracking changing environments. In
    Proceedings of the Fifth International Conference
    on Genetic Algorithms, pages 523-530, San Mateo,
    1993.
  • Q-Learning
  • T.W. Sandholm and R.H. Crites. Multiagent
    reinforcement learning in the iterated prisoners
    dilemma. Biosystems, 37 147-166, 1996.
  • Monte-Carlo methods, Q-Learning, Reinforcement
    Learning
  • R.S. Sutton and A.G. Barto. Reinforcement
    Learning An Introduction. The MIT Press, 1998.

38
Bibliography (II)
  • Genetic-Learning combinations
  • G. E. Hinton and S. J. Nowlan. How learning can
    guide evolution. In Adaptive Individuals in
    Evolving Populations Models and Algorithms,
    pages 447-454. Addison-Wesley, 1996.
  • T.D. Johnston. Selective costs and benefits in
    the evolution of learning. In Adaptive
    Individuals in Evolving Populations Models and
    Algorithms, pages 315-358. Addison-Wesley, 1996.
  • M. Littman. Simulations combining evolution and
    learning. In Adaptive Individuals in Evolving
    Populations Models and Algorithms, pages
    465-477. Addison-Wesley, 1996.
  • G. Mayley. Landscapes, learning costs and genetic
    assimilation. Evolutionary Computation, 4(3)
    213-234, 1996.

39
Bibliography (III)
  • Genetic-Learning combinations (cont.)
  • S. Nolfi, J.L. Elman and D. Parisi. Learning and
    evolution in neural networks. Adaptive Behavior,
    3(1) 5-28, 1994.
  • S. Nolfi and D. Parisi. Learning to adapt to
    changing environments in evolving neural
    networks. Adaptive Behavior, 5(1) 75-98, 1997.
  • D. Parisi and S.Nolfi. The influence of learning
    on evolution. Models and Algorithms, pages
    419-428. Addison-Wesley, 1996.
  • P.M. Todd and G.F. Miller. Exploring adaptive
    agency II Simulating the evolution of
    associative learning. In From Animals to Animats
    Proceedings of the First International Conference
    on Simulation of Adaptive Behavior, pages
    306-315, San Mateo, 1991.

40
Bibliography (IV)
  • Exploitation vs. Exploration
  • D. Carmel and S. Markovitch. Exploration
    strategies for model-based learning in multiagent
    systems. Autonomous Agents and Multi-agent
    Systems, 2(2) 141-172, 1999.
  • Subsumption architecture
  • R.A. Brooks. A robust layered control system for
    a mobile robot. IEEE Journal of Robotics and
    Automation, 2(1) 14-23, March 1986.

41
Backup - Qualitative Data
42
Qual. Data Mov. Prob. 0
Pure Parenting
Pure Genetics
Pure Learning
Best (0.7, 0, 0.3)
43
Qual. Data Mov. Prob. 10-4
Pure Parenting
Pure Learning
Best (0.03, 0.9, 0.07)
44
Qual. Data Mov. Prob. 10-2
Pure Parenting
(0.09, 0.9, 0.01)
Best Pure Learning
Write a Comment
User Comments (0)