Title: Advanced Artificial Intelligence Lecture 18: Genetic Programming
1Advanced Artificial IntelligenceLecture 18
Genetic Programming
- Bob McKay
- School of Computer Science and Engineering
- College of Engineering
- Seoul National University
2Outline
- Genetic Programming
- Introduction
- Applications
- Example
- Representation
- EDAs in GP
3Evolutionary ComputationUnderlying Idea
- If Darwinian evolution can create solutions to
complex problems of survival in the natural
World. - .Why not apply it to creating solutions to
problems of interest to us?
4Evolutionary ComputationGeneric Generational
Algorithm
- Generate the initial population P(0)
stochasticallyREPEAT Evaluate the fitness of
each individual in P(t) Select parents from P(t)
based on their fitness Use stochastic variation
operators to get P(t1)UNTIL termination
conditions are satisfied
5Evolutionary Computation Variants
- Evolution Strategies
- Evolutionary Programming
- Genetic Algorithms
- Classifier Systems
-
- Genetic Programming
6Tree Based Genetic Programming
- Original Idea
- Evolve populations of trees representing problem
solutions - Cramer (1985) Schmidhuber (1987) Koza (1992)
- Closure assumption any function can apply to any
argument
7GP Initialisation
- Typical ramped half-and-half initialisation
- Ramped
- Choose a lower and upper bound for tree depth
- Generate trees with maximum depths distributed
uniformly between these bounds - Half and half
- 50 full trees
- At depth bound, nodes chosen uniformly randomly
from constant symbols - Elsewhere, nodes chosen randomly from the
function symbols - 50 grow trees
- At depth bound, nodes chosen randomly from
constant symbols - Elsewhere, nodes chosen randomly from all symbols
8GP Selection
- Truncation selection
- Select the best k of the population
- Generally too eager
- Fitness proportionate selection
- Probability of selection proportionate to fitness
- Tournament selection
- Choose k individuals uniformly randomly
- Select the best of those individuals
- Eagerness tunable by k
- Larger k more eager algorithm
- The most commonly used today
9Stochastic Variation Operator Mutation
- Randomly choose a node in the parent tree
- Delete the sub-tree below that node
- Generate a new random sub-tree
10Stochastic Variation Operator Crossover
- Randomly choose a node in each parent tree
- Exchange the sub-trees rooted at those points
11Grammar-Based Representations
- The chromosome is a derivation tree in a
predefined grammar
- S ? B
- B ? B or B
- B ? B and B
- B ? not B
- B ? if B B B
- B ? a0 a1
- B ? d0 d1 d2 d3
12Graph-Based Representation
- PADO Teller Veloso 1995
- Graph represents execution sequence
- Permits parallelism
- Wide range of variants
13Logic-Based Representation
- Representation is a subset of prolog
- A number of implementations subsequent to
Giordanas REGAL (1993)
14Linear Chromosomes
- Wide variety of approaches
- Machine-code Genetic Programming
- The genotype is a machine-code program
- Stack-based Genetic Programming
- The genotype is a program in a stack-based
language - Somewhat forth-like
- Grammar-based (Grammatical Evolution)
- The genotype is a linear representation of a
grammar derivation tree
15Developmental GP
- The chromosome is a program for generating the
phenotype to be evaluated - Cellular developmental systems
- The program specifies rules for iteratively
rewriting the graph representing the phenotype - Best known example the phenotype is a circuit
diagram - L-systems
- The genotype is an L-system
- The phenotype is the generated tree
16Turing Completeness and Genetic Programming
- Generally Turing Complete
- Stack-based GP
- Machine coded GP
- Graph-based GP
- Logic-based GP
- Generally Turing Incomplete
- Tree-based GP
- Grammar-based GP
- Grammatical Evolution
- Developmental GP
- Does Turing completeness matter?
- The overwhelming majority of applications dont
use Turing completeness - The primary focus of this tutorial will be on
Turing-incomplete search spaces
17Electronic Design
18Quantum Algorithms
- Barnum, Bernstein and Spector Depth One OR Query
19Control System Parameters
- Koza et al
- Parameter Equations for Proportional Integral
Derivative (PID) Controller
20Bioinformatics
- Wide variety of applications
- Well known Motif detection for gene families
- D-E-A-D
- manganese superoxide dismutase
- Koza et al 1999
21Medical Data Mining
22Antenna Design
- Lohn et al (2003)
- Design of wire antenna for NASA spacecraft
23Chemical Dynamics Modelling
- Evolving systems of differential equations
- Predicting discharge behaviour of a battery
- Cao et al 2000
24Ecological Modelling
25Some GP Implementations
- C/C based GP systems
- http//garage.cps.msu.edu/software/software-index.
html - http//beagle.gel.ulaval.ca/index.html
- Java-Based
- http//cs.gmu.edu/eclab/projects/ecj/
- PushGP
- http//hampshire.edu/lspector/push.html
- Grammatical Evolution
- http//www.grammatical-evolution.org/src.html
- DCTG-GP
- http//sandcastle.cosc.brocku.ca/bross/research/
- TAG3P
- http//www.cs.adfa.edu.au/z3013620/we/hoai.htm
26Using GP Typical Steps
- Choose your favourite GP system
- Define the Chromosome
- Write code to implement the fitness function
- Set values for evolutionary parameters
- Population size
- Stopping criteria
- Minimum and maximum genotype sizes
- Tournament size
- etc
27Simple Example 6 Multiplexer
- Boolean Circuit
- Six inputs
- two address lines
- 4 data lines
- One output
28The 6 Multiplexer Problem
- From the 64 input-output pairs
- Learn an appropriate program
296 Multiplexer in DCTG-GP Grammar and Semantics
- bool terminalT ltgt(value(Input,V) -
Tvalue(Input,V)). - bool and, boolB1, boolB2
ltgt(value(Input,V) - B1value(Input,V1), B2
value(Input,V2), (V1 0 V2 0) -gt V
0 V 1. - bool or, boolB1, boolB2
ltgt(value(Input,V) - B1value(Input,V1), B2
value(Input,V2), (V1 1 V2 1) -gt V
1 V 0. - bool not, boolB1 ltgt(value(Input,V)
- B1value(Input,V1), V is 1 - V1).
terminal a0 ltgt(value(A0,_,_,_,_,_,A0)-
true). terminal a1 ltgt (value(_A1,_,_,_,_
,A1)-true). terminal d0 ltgt
(value(_,_,D0,_,_,_,D0)-true). terminal
d1 ltgt (value(_,_,_,D1,_,_,D1)-true). termi
nal d2 ltgt (value(_,_,_,_,D2,_,D2)-true
). terminal d3 ltgt(value(_,_,_,_,_,D3,D3)
-true).
306 Multiplexer in DCTG-GPFitness and parameters
- Fitness is the proportion of the 64 instances
correctly predicted - Max-depth(8).
- The maximum acceptable depth of the derivation
trees - Population_size(150).
- The number of individuals
- Generations(500).
- How long to run for
- Prob_crossover(0.9).
- Crossover rate
- Prob_mutation(0.1).
- Mutation rate
- Tournament_size(3).
- Determines the eagerness of the search
- Etc.
31Representation in GP
- Representation is a key issue in intelligent
systems - Emphasis on
- Sufficiency - the representation can encode the
class of problems - Effectiveness - the representation permits simple
search - Also a key issue in evolutionary systems
- How to design representations which give rise to
smoother fitness landscapes?
32Representation GP vs GA / ES
- Much of our insight into GP representation comes
from studies in Genetic Algorithms and Evolution
Strategies - However these insights must be tempered by key
differences between GP and GA / ES
representations - Feasibility and Connectivity
- Neighbourhood Complexity
- Genotype complexity
33Threshold Question
- What should we view as the underlying distance
metric in GP genotype/phenotype spaces? - Need a natural analogue of Manhattan distance in
GA - Fine-grained enough to underly other distance
metrics based on search operators - Taking into account both
- Content variation (as in GA)
- Structure change
34Edit Distance
- We follow OReilly (1997) in viewing edit
distance as a natural underlying metric - Edit distance is the number of operations
required to transform one genotype into another - Single node insertions
- Single node deletions
- Single node content substitutions
- Many variants, but generally only differ by O(1)
- But edit distance ignores symmetries of the
domain - AB and BA can be arbitrarily far apart
- Depending on complexity of A and B
- Can we design a better (perhaps domain-sensitive)
metric?
35Feasibility and Connectivity
- In GA, the Manhattan distance metric passes
through the feasible region - That is, for any distance ? lt d(A,C), there is a
valid genotype B with d(A,C) d(A,B)d(B,C) and
d(A,B) ? - In most GP representations, this is not the case
- Deletion and insertion from a GP tree will
usually result in an invalid tree - At best, wrong number of arguments for functions
- At worst, no longer a tree
- Feasible paths may be unboundedly longer than
feasible
36Neighbourhood Complexity
- For an individual A, the ?-neighbourhood is
defined - N?(A) X d(A,X) lt ?
- Neighbourhood size N?(A)
- In GA/ES, N?(A) is generally independent of A
- In GP, N?(A) varies over the search space
- If the search space is unbounded, the
neighbourhood size is generally monotonic in the
size of A - If we impose size or depth bounds, it may be
non-monotonic
- Neighbourhood connectivity
- In GA / ES, neighbourhoods are graph-connected
- In many GP representations, neighbourhoods are
not connected
37Genotype Complexity
- Virtually all GP representations offer
- (in principle) unbounded size individuals
- (in practice) individuals of varying complexity
- GA / ES representation studies generally assume
fixed complexity
vs
38Problem Specific Representation
- In most areas of evolutionary computation, there
is a strong emphasis on tailoring problem
representations - There are many feasible representations for a
problem - The representation is chosen to optimise
performance - Suitable (problem specific) operators
- Smooth fitness landscape
- Redundancy and neutral paths
39Problem Specific Representation GP
- Most GP representations are generic
- One basic representation fits all problems
- Permit only the tailoring necessary to encode the
problem - Suitable function set
40Problem Specific RepresentationGrammars
- Grammar-based GP permits further tailoring
- The same problem search space may be encoded
multiple ways - Usually used to bias search toward particular
subspaces - Little emphasis on transforming the fitness
landscape or tailoring operators - It is unclear whether grammar tailoring can
usefully transform the search space - Standard Context-Free Grammar GP introduces
stronger feasibility constraints than standard GP - The search spaces may be even more disconnected
41Problem Specific RepresentationSummary
- Problem-specific encoding in current GP systems
is very weak - There is a need for more flexible representations
permitting tailoring of the search space and
fitness landscape - For the moment, the focus is on the properties of
generic representations - There is still enormous potential for logic-based
representations
42Structural Difficulty and Connectivity
- Daida has demonstrated that standard tree-based
GP cannot search some regions effectively
- GP cannot effectively search for very full or
very narrow trees - Not (as Daida argues) a consequence of tree
representation - With variable-arity trees (TAG3P) we are able to
solve even with hillclimbing search - Rather, a consequence of poor neighbourhood
connectivity
43Genotype-Phenotype Mappings
- Most GP representations do not have an explicit
genotype-phenotype mapping - The genotype is the phenotype
- Grammar-guided systems usually do have an
explicit genotype-phenotype mapping
- Desirable properties
- Redundancy
- Connectedness
- Extension
- Continuity
44Redundant Genotype-Phenotype Mappings
- Mappings in which many genotypes map to one
phenotype - The pre-image of a point forms a neutral set
- Extensively studied (in GA) by Shipman et al
- Identified two desirable characteristics
- Connectedness
- The pre-image of a point is connected
- Hence forms a neutral path
- Extension
- The pre-images of points are intertwined
- Permitting movement between neutral paths
- Most GP Genotype-Phenotype maps appear to be
neither connected nor extensive
45Continuity in Genotype-Phenotype Mappings
- Continuity
- Neighbourhoods map to neighbourhoods
- Ie small genotype changes result in small
phenotype changes - Sometimes known as strong causality
- Often not a property of GP Mappings
46Operators
- Feasibility constraints make it difficult to
design fine-grained operators - With tree-based GP, the minimum step size is
determined by the height of the node - Operators applied high in the tree cause large
steps - A key reason why standard GP converges top-down
- Feasibility constraints make it difficult to
separate structure modification and content
modification - Search must optimise both at once
47Context Free Grammars
- Grammar represents structure of solution space
- S ? B
- B ? B or B
- B ? B and B
- B ? not B
- B ? if B B B
- B ? a0 a1
- B ? d0 d1 d2 d3
48Grammar Guided Genetic Programming (GGGP)
- Problem space represented by a Context Free
Grammar G - Individuals are derivation trees in G
- Crossover uses sub-tree crossover
- But the root nodes must have the same label
- Mutation uses sub-tree mutation
- But the generated sub-tree must be consistent
with the grammar
49GGGP Representation Properties
- Problem specific representation
- The same language can be represented multiple ways
- S ? B
- B ? B Op B
- B ? not B
- B ? if B B B
- B ? a0 a1
- B ? d0 d1 d2 d3
- Op ? and or
- It is unclear whether this can usefully improve
the fitness landscape - However it is clear that poor choice of grammar
can lead to serious search problems - Eg problems with initialisation
50GGGP Properties (2)
- Structural Difficulty
- Likely to be worse than tree-based GP
- Neighbourhood connectivity very poor
- Because of the constraints imposed by grammar
- Genotype-Phenotype mapping
- Redundant
- Dis-connected
- Extensive
- Continuous
51GGGP Properties (3)
- Operators
- Structure modification and content modification
may be partially separated by choice of grammar - Content as lexicon
- The second grammar above
- Step size distribution similar to standard GP
- Extremely difficult to define new
grammar-consistent operators
52The Grammatical Evolution Transformation
- Inorder traversal of numbered productions
- B ? a0
- B ? a1
- B ? d0
- B ? d1
- B ? d2
- B ? d3
- B ? B or B
- B ? B and B
- B ? not B
- B ? if B B B
1 4 5 4 6 10 9 4 6 8 7
53Grammatical Evolution
- Problem space represented by linear strings of
integers - Can apply normal GA-style operators
- Genotype-phenotype mapping uses the GE
transformation - Using modular arithmetic to guarantee feasibility
54GE Representation Properties
- Problem specific representation
- Essentially the same issues as GGGP
- Genotype-Phenotype mapping
- Redundant
- Dis-connected
- Non-extensive
- Highly dis-continuous
- Genotype-Phenotype mapping is context-dependent
- The same genotypic component generates different
phenotypic components depending on context
55GE Properties (2)
- Structural Difficulty
- Unlikely to be better than GGGP?
- Neighbourhood connectivity very poor
- Because of the constraints imposed by grammar
perators - Operators
- Difficult to separate structure and content
modification - Because of the discontinuity of the
genotype-phenotype mapping - Genotypic step size readily controllable
- Corresponds to highly discontinuous phenotypic
step size - Simple to define new grammar-consistent operators
56- Introduction
- Problems and Issues
- Representation
- GP vs GA/ES
- Issues in GP Representation
- Examples
- GGGP Representation
- GE Representation
- TAG Representation
- Tree Adjunct Grammars (TAGs)
- TAG for GP
- TAG3P Representation Properties
- Search Algorithms
- Structural Components and Incremental Learning
- Population Structure
- Biases in Learning
- Computational Cost
57Tree Adjunct Grammars Motivation
- A confession
- Originally conceived to address perceived
problems with many-one nature of GE
genotype-phenotype map - Actually, doesnt solve those problems
- Does have many other useful representational
properties - Fortunately
58Tree Adjunct Grammars (TAGs)
- Arise from more modern efforts to represent
natural language - Joshi et al 1975
- Two types of elementary trees
- ? trees
- Represent complete syntactic units
- ? trees
- Represent insertable elements
- Must have an identical non-terminal at root and
at frontier - The foot node
59TAG Elementary Trees
- ? tree examples
- ? tree example
60TAG Operations
61TAG to CFG Mapping
62TAG Genetic Programming (TAG3P)
- Basic Form
- Problem space represented by a TAG Grammar G
- Individuals are derivation trees in G
- Crossover uses sub-tree crossover
- But the root nodes must have the same label
- Mutation uses sub-tree mutation
- But the generated sub-tree must be consistent
with the grammar
63TAG3P vs GGGP
- Both tree-based representations
- What have we gained?
- GGGP trees have fixed arity
- Each production determines a fixed number of
children - TAG trees have flexible arity
- Any sub-tree may be deleted without affecting
tree validity - This non-fixed-arity buys us flexibility
64TAG3P Representation Properties
- Problem specific representation
- The same language can be represented multiple
ways - Similar issues to GGGP
- Tree adjunct languages are a super-set of context
free languages
65TAG3P Properties (2)
- Operators
- Structure modification and content modification
separated through substitution and adjunction - Content as lexicon
- Structure as TAG structure
- Easy to define new operators
- Point insertion
- Point deletion
- Transposition
- Relocation
- Duplication
- Step size may be minimal (point
insertion/deletion)
66TAG3P Properties (3)
- Structural Difficulty
- Dramatically less than tree-based GP
- Especially, using point insertion and deletion
operators - Neighbourhood connectivity good
- Because of the non-fixed-arity property
- Genotype-Phenotype mapping
- Redundant
- Dis-connected
- Extensive
- Continuous
67Some GP Representations
68Summary Representation Issues
- Connectedness of Neighbourhoods
- Genotype-Phenotype Mappings
- Connectedness
- Extension
- Continuity
- Operators
- Grain
- Separation of Structure and Content Change
- Problem-specific Representation
69The Issue
- GP research has demonstrated that
population-based stochastic search is effective
in GP problem spaces - But evolutionary search isnt the only
population-based stochastic search algorithm - Could other search methods be effective?
70Estimation of Distribution Algorithms
- EDAs have been extremely successful in
fixed-complexity search spaces - Could they extend to GP representations?
- EDA for GP must explicitly distinguish between
- Structure learning
- Content learning
- In EDA terms, explicitly distinguish
- Probability Model
- Probabilities
- Two primary strands
- Prototype Tree
- Grammar-based
71EDA Algorithm
- Initialise the EDA probability
modelRepeat Generate population from
probability model Evaluate population
fitness Optionally, generate a new probability
model Update the probabilities in the
model based on population fitnessUntil
stopping criteria are satisfied - (GP algorithms to date use truncation
selection, and update the probability tables to
increase the probability of generating the
truncated population)
72Exploration vs Exploitation in EDA
- If the algorithm simply learns the probability
distribution from the current population, it will
be overly-exploitative - Premature convergence
- EDAs avoid this by using a uniform prior
probability - Either explicitly, with a discount rate
- Or implicitly, by generating additional
individuals at random
73Prototype Tree EDAs
- The underlying model is a full tree of maximum
arity - Each node holds a probability table for the
content of the node - Original version (PIPE, Salustowicz
Schmidthuber 1997) has the node probabilities
independent - More recent versions learn dependent
probabilities - Prototype tree gives position-dependent
probabilities - Cannot learn position-independent building blocks
74PIPE Prototype Tree
75Grammar-Based EDAs
- The underlying model is a stochastic Context-Free
Grammar - B ? B or B 0.6
- B ? B and B 0.3
- B ? not B 0.1
- Permits position-independent building blocks
- B ? C or D 0.6
- C ? C and C 0.8
- D ? not D 0.8
76Learning Building Blocks
- Learning building blocks in a grammar-based EDA
is (relatively) easy if the grammar already
records the building block - Just a matter of learning the probabilities
- Learning new building blocks requires learning
new, more specific, grammar models - B ? C or D
- C ? C and C
- D ? not D
- From
- B ? B or B
- B ? B and B 0
- B ? not B
- Grammar Learning is extraordinarily
computationally intensive - Current grammar learning methods have been
developed for other tasks and are not optimal for
the purpose
77Learning New Grammars
- Grammar learning can be
- Top-down or bottom-up (or inside-out)
- Specific to general or general to specific (or
both) - We have experimented with
- Inside-out, general to specific
- PEEL system, 2003
- Bottom-up, specific to general
- GMPE system, 2004
- Clearly many other possibilities are possible
- But may require identifying large repeated
sub-trees
78Ant-Based Search
- Closely related to EDA search
- Also uses probability tables to generate
individuals - Primary differences
- Learning is from individual ants rather than
populations - Somewhat akin to the distinction between
generational and steady-state evolutionary
algorithms - Probability update functions are pragmatic
- Rather than statistically based
79EDA and Ant Search Issues
- Can non-grammar representations learn
position-independent building blocks? - If the probability model changes, what uniform
prior should be used - The uniform prior on the original model?
- The uniform prior on the new model?
- Or some combination?
- How should change of model be triggered?
- How can grammar learning methods, developed for
noise-free one-shot learning, be adapted for
multi-shot learning from noisy data
80Summary
- Genetic Programming
- Introduction
- Applications
- Example
- Representation
- EDAs in GP
81?????