Advanced Artificial Intelligence Lecture 18: Genetic Programming

About This Presentation

Title:

Advanced Artificial Intelligence Lecture 18: Genetic Programming

Description:

If Darwinian evolution can create solutions to complex problems of survival in ... AB and BA can be arbitrarily far apart. Depending on complexity of A and B ... – PowerPoint PPT presentation

Number of Views:304

Avg rating:3.0/5.0

Slides: 82

Provided by: scSn

more less

Transcript and Presenter's Notes

Title: Advanced Artificial Intelligence Lecture 18: Genetic Programming

1
Advanced Artificial IntelligenceLecture 18
Genetic Programming

Bob McKay
School of Computer Science and Engineering
College of Engineering
Seoul National University

2
Outline

Genetic Programming
Introduction
Applications
Example
Representation
EDAs in GP

3
Evolutionary ComputationUnderlying Idea

If Darwinian evolution can create solutions to
complex problems of survival in the natural
World.
.Why not apply it to creating solutions to
problems of interest to us?

4
Evolutionary ComputationGeneric Generational
Algorithm

Generate the initial population P(0)
stochasticallyREPEAT Evaluate the fitness of
each individual in P(t) Select parents from P(t)
based on their fitness Use stochastic variation
operators to get P(t1)UNTIL termination
conditions are satisfied

5
Evolutionary Computation Variants

Evolution Strategies
Evolutionary Programming
Genetic Algorithms
Classifier Systems
Genetic Programming

6
Tree Based Genetic Programming

Original Idea
Evolve populations of trees representing problem
solutions
Cramer (1985) Schmidhuber (1987) Koza (1992)
Closure assumption any function can apply to any
argument

7
GP Initialisation

Typical ramped half-and-half initialisation
Ramped
Choose a lower and upper bound for tree depth
Generate trees with maximum depths distributed
uniformly between these bounds
Half and half
50 full trees
At depth bound, nodes chosen uniformly randomly
from constant symbols
Elsewhere, nodes chosen randomly from the
function symbols
50 grow trees
At depth bound, nodes chosen randomly from
constant symbols
Elsewhere, nodes chosen randomly from all symbols

8
GP Selection

Truncation selection
Select the best k of the population
Generally too eager
Fitness proportionate selection
Probability of selection proportionate to fitness
Tournament selection
Choose k individuals uniformly randomly
Select the best of those individuals
Eagerness tunable by k
Larger k more eager algorithm
The most commonly used today

9
Stochastic Variation Operator Mutation

Randomly choose a node in the parent tree
Delete the sub-tree below that node
Generate a new random sub-tree

10
Stochastic Variation Operator Crossover

Randomly choose a node in each parent tree
Exchange the sub-trees rooted at those points

11
Grammar-Based Representations

The chromosome is a derivation tree in a
predefined grammar

S ? B
B ? B or B
B ? B and B
B ? not B

B ? if B B B
B ? a0 a1
B ? d0 d1 d2 d3

12
Graph-Based Representation

PADO Teller Veloso 1995
Graph represents execution sequence
Permits parallelism
Wide range of variants

13
Logic-Based Representation

Representation is a subset of prolog
A number of implementations subsequent to
Giordanas REGAL (1993)

14
Linear Chromosomes

Wide variety of approaches
Machine-code Genetic Programming
The genotype is a machine-code program
Stack-based Genetic Programming
The genotype is a program in a stack-based
language
Somewhat forth-like
Grammar-based (Grammatical Evolution)
The genotype is a linear representation of a
grammar derivation tree

15
Developmental GP

The chromosome is a program for generating the
phenotype to be evaluated
Cellular developmental systems
The program specifies rules for iteratively
rewriting the graph representing the phenotype
Best known example the phenotype is a circuit
diagram
L-systems
The genotype is an L-system
The phenotype is the generated tree

16
Turing Completeness and Genetic Programming

Generally Turing Complete
Stack-based GP
Machine coded GP
Graph-based GP
Logic-based GP

Generally Turing Incomplete
Tree-based GP
Grammar-based GP
Grammatical Evolution
Developmental GP

Does Turing completeness matter?
The overwhelming majority of applications dont
use Turing completeness
The primary focus of this tutorial will be on
Turing-incomplete search spaces

17
Electronic Design

Koza et al Zobel filter

18
Quantum Algorithms

Barnum, Bernstein and Spector Depth One OR Query

19
Control System Parameters

Koza et al
Parameter Equations for Proportional Integral
Derivative (PID) Controller

20
Bioinformatics

Wide variety of applications
Well known Motif detection for gene families
D-E-A-D
manganese superoxide dismutase
Koza et al 1999

21
Medical Data Mining
22
Antenna Design

Lohn et al (2003)
Design of wire antenna for NASA spacecraft

23
Chemical Dynamics Modelling

Evolving systems of differential equations
Predicting discharge behaviour of a battery
Cao et al 2000

24
Ecological Modelling
25
Some GP Implementations

C/C based GP systems
http//garage.cps.msu.edu/software/software-index.
html
http//beagle.gel.ulaval.ca/index.html
Java-Based
http//cs.gmu.edu/eclab/projects/ecj/
PushGP
http//hampshire.edu/lspector/push.html
Grammatical Evolution
http//www.grammatical-evolution.org/src.html
DCTG-GP
http//sandcastle.cosc.brocku.ca/bross/research/
TAG3P
http//www.cs.adfa.edu.au/z3013620/we/hoai.htm

26
Using GP Typical Steps

Choose your favourite GP system
Define the Chromosome
Write code to implement the fitness function
Set values for evolutionary parameters
Population size
Stopping criteria
Minimum and maximum genotype sizes
Tournament size
etc

27
Simple Example 6 Multiplexer

Boolean Circuit
Six inputs
two address lines
4 data lines
One output

28
The 6 Multiplexer Problem

From the 64 input-output pairs
Learn an appropriate program

29
6 Multiplexer in DCTG-GP Grammar and Semantics

bool terminalT ltgt(value(Input,V) -
Tvalue(Input,V)).
bool and, boolB1, boolB2
ltgt(value(Input,V) - B1value(Input,V1), B2
value(Input,V2), (V1 0 V2 0) -gt V
0 V 1.
bool or, boolB1, boolB2
ltgt(value(Input,V) - B1value(Input,V1), B2
value(Input,V2), (V1 1 V2 1) -gt V
1 V 0.
bool not, boolB1 ltgt(value(Input,V)
- B1value(Input,V1), V is 1 - V1).

terminal a0 ltgt(value(A0,_,_,_,_,_,A0)-
true). terminal a1 ltgt (value(_A1,_,_,_,_
,A1)-true). terminal d0 ltgt
(value(_,_,D0,_,_,_,D0)-true). terminal
d1 ltgt (value(_,_,_,D1,_,_,D1)-true). termi
nal d2 ltgt (value(_,_,_,_,D2,_,D2)-true
). terminal d3 ltgt(value(_,_,_,_,_,D3,D3)
-true).
30
6 Multiplexer in DCTG-GPFitness and parameters

Fitness is the proportion of the 64 instances
correctly predicted
Max-depth(8).
The maximum acceptable depth of the derivation
trees
Population_size(150).
The number of individuals
Generations(500).
How long to run for
Prob_crossover(0.9).
Crossover rate
Prob_mutation(0.1).
Mutation rate
Tournament_size(3).
Determines the eagerness of the search
Etc.

31
Representation in GP

Representation is a key issue in intelligent
systems
Emphasis on
Sufficiency - the representation can encode the
class of problems
Effectiveness - the representation permits simple
search
Also a key issue in evolutionary systems
How to design representations which give rise to
smoother fitness landscapes?

32
Representation GP vs GA / ES

Much of our insight into GP representation comes
from studies in Genetic Algorithms and Evolution
Strategies
However these insights must be tempered by key
differences between GP and GA / ES
representations
Feasibility and Connectivity
Neighbourhood Complexity
Genotype complexity

33
Threshold Question

What should we view as the underlying distance
metric in GP genotype/phenotype spaces?
Need a natural analogue of Manhattan distance in
GA
Fine-grained enough to underly other distance
metrics based on search operators
Taking into account both
Content variation (as in GA)
Structure change

34
Edit Distance

We follow OReilly (1997) in viewing edit
distance as a natural underlying metric
Edit distance is the number of operations
required to transform one genotype into another
Single node insertions
Single node deletions
Single node content substitutions
Many variants, but generally only differ by O(1)

But edit distance ignores symmetries of the
domain
AB and BA can be arbitrarily far apart
Depending on complexity of A and B
Can we design a better (perhaps domain-sensitive)
metric?

35
Feasibility and Connectivity

In GA, the Manhattan distance metric passes
through the feasible region
That is, for any distance ? lt d(A,C), there is a
valid genotype B with d(A,C) d(A,B)d(B,C) and
d(A,B) ?
In most GP representations, this is not the case
Deletion and insertion from a GP tree will
usually result in an invalid tree
At best, wrong number of arguments for functions
At worst, no longer a tree
Feasible paths may be unboundedly longer than
feasible

36
Neighbourhood Complexity

For an individual A, the ?-neighbourhood is
defined
N?(A) X d(A,X) lt ?

Neighbourhood size N?(A)
In GA/ES, N?(A) is generally independent of A
In GP, N?(A) varies over the search space
If the search space is unbounded, the
neighbourhood size is generally monotonic in the
size of A
If we impose size or depth bounds, it may be
non-monotonic

Neighbourhood connectivity
In GA / ES, neighbourhoods are graph-connected
In many GP representations, neighbourhoods are
not connected

37
Genotype Complexity

Virtually all GP representations offer
(in principle) unbounded size individuals
(in practice) individuals of varying complexity
GA / ES representation studies generally assume
fixed complexity

vs
38
Problem Specific Representation

In most areas of evolutionary computation, there
is a strong emphasis on tailoring problem
representations
There are many feasible representations for a
problem
The representation is chosen to optimise
performance
Suitable (problem specific) operators
Smooth fitness landscape
Redundancy and neutral paths

39
Problem Specific Representation GP

Most GP representations are generic
One basic representation fits all problems
Permit only the tailoring necessary to encode the
problem
Suitable function set

40
Problem Specific RepresentationGrammars

Grammar-based GP permits further tailoring
The same problem search space may be encoded
multiple ways
Usually used to bias search toward particular
subspaces
Little emphasis on transforming the fitness
landscape or tailoring operators
It is unclear whether grammar tailoring can
usefully transform the search space
Standard Context-Free Grammar GP introduces
stronger feasibility constraints than standard GP
The search spaces may be even more disconnected

41
Problem Specific RepresentationSummary

Problem-specific encoding in current GP systems
is very weak
There is a need for more flexible representations
permitting tailoring of the search space and
fitness landscape
For the moment, the focus is on the properties of
generic representations
There is still enormous potential for logic-based
representations

42
Structural Difficulty and Connectivity

Daida has demonstrated that standard tree-based
GP cannot search some regions effectively

GP cannot effectively search for very full or
very narrow trees
Not (as Daida argues) a consequence of tree
representation
With variable-arity trees (TAG3P) we are able to
solve even with hillclimbing search
Rather, a consequence of poor neighbourhood
connectivity

43
Genotype-Phenotype Mappings

Most GP representations do not have an explicit
genotype-phenotype mapping
The genotype is the phenotype
Grammar-guided systems usually do have an
explicit genotype-phenotype mapping

Desirable properties
Redundancy
Connectedness
Extension
Continuity

44
Redundant Genotype-Phenotype Mappings

Mappings in which many genotypes map to one
phenotype
The pre-image of a point forms a neutral set
Extensively studied (in GA) by Shipman et al
Identified two desirable characteristics
Connectedness
The pre-image of a point is connected
Hence forms a neutral path
Extension
The pre-images of points are intertwined
Permitting movement between neutral paths
Most GP Genotype-Phenotype maps appear to be
neither connected nor extensive

45
Continuity in Genotype-Phenotype Mappings

Continuity
Neighbourhoods map to neighbourhoods
Ie small genotype changes result in small
phenotype changes
Sometimes known as strong causality
Often not a property of GP Mappings

46
Operators

Feasibility constraints make it difficult to
design fine-grained operators
With tree-based GP, the minimum step size is
determined by the height of the node
Operators applied high in the tree cause large
steps
A key reason why standard GP converges top-down
Feasibility constraints make it difficult to
separate structure modification and content
modification
Search must optimise both at once

47
Context Free Grammars

Grammar represents structure of solution space

S ? B
B ? B or B
B ? B and B
B ? not B

B ? if B B B
B ? a0 a1
B ? d0 d1 d2 d3

48
Grammar Guided Genetic Programming (GGGP)

Problem space represented by a Context Free
Grammar G
Individuals are derivation trees in G
Crossover uses sub-tree crossover
But the root nodes must have the same label
Mutation uses sub-tree mutation
But the generated sub-tree must be consistent
with the grammar

49
GGGP Representation Properties

Problem specific representation
The same language can be represented multiple ways

S ? B
B ? B Op B
B ? not B

B ? if B B B
B ? a0 a1
B ? d0 d1 d2 d3
Op ? and or

It is unclear whether this can usefully improve
the fitness landscape
However it is clear that poor choice of grammar
can lead to serious search problems
Eg problems with initialisation

50
GGGP Properties (2)

Structural Difficulty
Likely to be worse than tree-based GP
Neighbourhood connectivity very poor
Because of the constraints imposed by grammar
Genotype-Phenotype mapping
Redundant
Dis-connected
Extensive
Continuous

51
GGGP Properties (3)

Operators
Structure modification and content modification
may be partially separated by choice of grammar
Content as lexicon
The second grammar above
Step size distribution similar to standard GP
Extremely difficult to define new
grammar-consistent operators

52
The Grammatical Evolution Transformation

Inorder traversal of numbered productions

S ? B

B ? a0
B ? a1
B ? d0
B ? d1
B ? d2
B ? d3

B ? B or B
B ? B and B
B ? not B
B ? if B B B

1 4 5 4 6 10 9 4 6 8 7
53
Grammatical Evolution

Problem space represented by linear strings of
integers
Can apply normal GA-style operators
Genotype-phenotype mapping uses the GE
transformation
Using modular arithmetic to guarantee feasibility

54
GE Representation Properties

Problem specific representation
Essentially the same issues as GGGP
Genotype-Phenotype mapping
Redundant
Dis-connected
Non-extensive
Highly dis-continuous
Genotype-Phenotype mapping is context-dependent
The same genotypic component generates different
phenotypic components depending on context

55
GE Properties (2)

Structural Difficulty
Unlikely to be better than GGGP?
Neighbourhood connectivity very poor
Because of the constraints imposed by grammar
perators
Operators
Difficult to separate structure and content
modification
Because of the discontinuity of the
genotype-phenotype mapping
Genotypic step size readily controllable
Corresponds to highly discontinuous phenotypic
step size
Simple to define new grammar-consistent operators

Introduction
Problems and Issues
Representation
GP vs GA/ES
Issues in GP Representation
Examples
GGGP Representation
GE Representation
TAG Representation
Tree Adjunct Grammars (TAGs)
TAG for GP
TAG3P Representation Properties
Search Algorithms
Structural Components and Incremental Learning
Population Structure
Biases in Learning
Computational Cost

57
Tree Adjunct Grammars Motivation

A confession
Originally conceived to address perceived
problems with many-one nature of GE
genotype-phenotype map
Actually, doesnt solve those problems
Does have many other useful representational
properties
Fortunately

58
Tree Adjunct Grammars (TAGs)

Arise from more modern efforts to represent
natural language
Joshi et al 1975
Two types of elementary trees
? trees
Represent complete syntactic units
? trees
Represent insertable elements
Must have an identical non-terminal at root and
at frontier
The foot node

59
TAG Elementary Trees

? tree examples
? tree example

60
TAG Operations

Adjunction
Substitution

61
TAG to CFG Mapping

Derivation Tree

Derived Tree

62
TAG Genetic Programming (TAG3P)

Basic Form
Problem space represented by a TAG Grammar G
Individuals are derivation trees in G
Crossover uses sub-tree crossover
But the root nodes must have the same label
Mutation uses sub-tree mutation
But the generated sub-tree must be consistent
with the grammar

63
TAG3P vs GGGP

Both tree-based representations
What have we gained?
GGGP trees have fixed arity
Each production determines a fixed number of
children
TAG trees have flexible arity
Any sub-tree may be deleted without affecting
tree validity
This non-fixed-arity buys us flexibility

64
TAG3P Representation Properties

Problem specific representation
The same language can be represented multiple
ways
Similar issues to GGGP
Tree adjunct languages are a super-set of context
free languages

65
TAG3P Properties (2)

Operators
Structure modification and content modification
separated through substitution and adjunction
Content as lexicon
Structure as TAG structure
Easy to define new operators
Point insertion
Point deletion
Transposition
Relocation
Duplication
Step size may be minimal (point
insertion/deletion)

66
TAG3P Properties (3)

Structural Difficulty
Dramatically less than tree-based GP
Especially, using point insertion and deletion
operators
Neighbourhood connectivity good
Because of the non-fixed-arity property
Genotype-Phenotype mapping
Redundant
Dis-connected
Extensive
Continuous

67
Some GP Representations
68
Summary Representation Issues

Connectedness of Neighbourhoods
Genotype-Phenotype Mappings
Connectedness
Extension
Continuity
Operators
Grain
Separation of Structure and Content Change
Problem-specific Representation

69
The Issue

GP research has demonstrated that
population-based stochastic search is effective
in GP problem spaces
But evolutionary search isnt the only
population-based stochastic search algorithm
Could other search methods be effective?

70
Estimation of Distribution Algorithms

EDAs have been extremely successful in
fixed-complexity search spaces
Could they extend to GP representations?

EDA for GP must explicitly distinguish between
Structure learning
Content learning
In EDA terms, explicitly distinguish
Probability Model
Probabilities

Two primary strands
Prototype Tree
Grammar-based

71
EDA Algorithm

Initialise the EDA probability
modelRepeat Generate population from
probability model Evaluate population
fitness Optionally, generate a new probability
model Update the probabilities in the
model based on population fitnessUntil
stopping criteria are satisfied
(GP algorithms to date use truncation
selection, and update the probability tables to
increase the probability of generating the
truncated population)

72
Exploration vs Exploitation in EDA

If the algorithm simply learns the probability
distribution from the current population, it will
be overly-exploitative
Premature convergence
EDAs avoid this by using a uniform prior
probability
Either explicitly, with a discount rate
Or implicitly, by generating additional
individuals at random

73
Prototype Tree EDAs

The underlying model is a full tree of maximum
arity
Each node holds a probability table for the
content of the node
Original version (PIPE, Salustowicz
Schmidthuber 1997) has the node probabilities
independent
More recent versions learn dependent
probabilities
Prototype tree gives position-dependent
probabilities
Cannot learn position-independent building blocks

74
PIPE Prototype Tree
75
Grammar-Based EDAs

The underlying model is a stochastic Context-Free
Grammar
B ? B or B 0.6
B ? B and B 0.3
B ? not B 0.1
Permits position-independent building blocks
B ? C or D 0.6
C ? C and C 0.8
D ? not D 0.8

76
Learning Building Blocks

Learning building blocks in a grammar-based EDA
is (relatively) easy if the grammar already
records the building block
Just a matter of learning the probabilities

Learning new building blocks requires learning
new, more specific, grammar models
B ? C or D
C ? C and C
D ? not D
From
B ? B or B
B ? B and B 0
B ? not B

Grammar Learning is extraordinarily
computationally intensive
Current grammar learning methods have been
developed for other tasks and are not optimal for
the purpose

77
Learning New Grammars

Grammar learning can be
Top-down or bottom-up (or inside-out)
Specific to general or general to specific (or
both)
We have experimented with
Inside-out, general to specific
PEEL system, 2003
Bottom-up, specific to general
GMPE system, 2004
Clearly many other possibilities are possible
But may require identifying large repeated
sub-trees

78
Ant-Based Search

Closely related to EDA search
Also uses probability tables to generate
individuals
Primary differences
Learning is from individual ants rather than
populations
Somewhat akin to the distinction between
generational and steady-state evolutionary
algorithms
Probability update functions are pragmatic
Rather than statistically based

79
EDA and Ant Search Issues

Can non-grammar representations learn
position-independent building blocks?
If the probability model changes, what uniform
prior should be used
The uniform prior on the original model?
The uniform prior on the new model?
Or some combination?
How should change of model be triggered?
How can grammar learning methods, developed for
noise-free one-shot learning, be adapted for
multi-shot learning from noisy data

80
Summary