Title: Genetic Programming
1Genetic Programming
- Using Simulated Natural Selection to
Automatically Write Programs
2Genetic Programming
- John Koza, Stanford University
- Principal proponent of GP
- Has obtained human-competitive results in a
number of problem domains - Reproduced existing patents
- Created new patentable designs
- Has written extensively on GP
- Four volume set on Genetic Programming
- Numerous papers on the GP
3Genetic Programming
- Basic Algorithm
- Create a population of programs
- Each program attempts to solve a set of problems
in a training set. - Program fitness is determined by success in
solving training set - More fit members have better chance to produce
offspring in the next generation - Offspring are produced using some form of
crossover
4Tree Structure of Genetic Programs
- Various structures are used to represent genetic
programs, but tree structures are the most well
known. - Nonterminal nodes are functions that take their
children as parameters. -
2
1
5Tree Structure
- Terminal Nodes, the nodes that make up the leaves
of a program tree, provide data to the program. - Constants
- Parameterless functions
- Inputs
6Genetic Program Components
- Terminal Set
- Work as set of primitive data types
- Constants
- Parameterless functions
- Input Values
- Function set
- Set of available functions
- Often tailored specifically for the needs of the
program domain.
7Initializing the Population
- The following two parameters are specified
- Maximum depth of a program tree
- Maximum number of nodes in a program tree
- Three methods in common use (Koza)
- Full
- Nonterminals are used to build a complete tree up
to the leaf nodes, which are then completely
populated with terminals. Every tree is grown to
maximum depth and has the maximum number of nodes
allowed.
8Initializing the Population (continued)
- Three methods in common use (Koza)
- Grow
- The root node is chosen from the function set
- All nodes not at maximum depth are chosen
randomly. - Growth for a branch ends when a terminal is
chosen. - Trees can have irregular shapes.
- Nodes at the maximum depth are chosen from the
terminal set only.
9Initializing the Population (continued)
- Three methods in common use (Koza)
- Ramped Half and Half
- M is the max depth of deepest partition in the
population - The population is separated into M partitions
- The ith partition, (i ranges from 0 to M-1) has
a max depth of M i. - Half of each partition is populated with grow,
the other half is populated with full.
10Genetic Operators Crossover
- Crossover
- Randomly select a node in the mother
- Randomly select a node in the father
- Swap the two nodes along with their subtrees
11Crossover Example
-
Parent 2
Parent 1
/
power
-
13
4
abs
2
1
2
2
-7
-
Child 1
Child 2
1
-
power
/
2
2
abs
2
-7
13
4
12Genetic Operations Mutation
- Mutation
- Randomly select a node in the program tree
- Remove that node and its subtree
- Replace the node with a new subtree, using the
same method used to initially instantiate the
population. - Typically, mutation is applied to a small number
of offspring after crossover.
13Mutation Example
Left subtree is randomly selected for mutation.
1
3
2
4
The entire subtree is replaced
-
1
2
2
7
4
14Fitness-based Selection
- Gives graded and continuous feedback about how
well a program performs on the training set
(Banzhaf et. al.) - Standardized Fitness
- Fitness scores are transformed so that 0 is the
fitness of the most fit member. - Normalized Fitness
- Fitness is transformed to values that always are
between 0 and 1.
15Different Selection Algorithms
- GA Scenario
- Same as that used in Genetic Algorithms
- Create gene pool by selecting parents based on
fitness - Next generation completely replaces current
generation - ES Scenario
- Same as used in Evolutionary Strategies
- Generate children first
- Apply fitness function to parents and children
- Select the next generation from children (and
possibly parents too) - Selection pressure can be tuned by adjusting the
ratio of the number of offspring to the number of
parents.
16Selection Pressure
- Ratio of the best individuals selection
probability to the average selection probability - MostFitSelectionProbability / AverageFitSelectionP
robability - The larger this ratio, the greater the selection
pressure.
17Sample Fitness Measures
- Error Fitness
- The sum of the absolute value of the differences
between the computed result and the desired
result.
Where fp is the fitness of the pth individual
in the population oi is the desired output for
the ith example in the training set pi is the
output from the pth individual on the ith example
in the training set
Squaring the expressing (pi-oi) can provide
larger penalties for errors.
18Fitness Measures can be as Varied as the
Applications
- Examples
- Number of correct solutions
- Number of wins competing against other members of
the population. - Number of errors navigating a maze
- Time required to solve a puzzle
19Truncation or (µ, ?) Selection
- A number of parents (µ) are allowed to breed and
produce (?) children. The µ best children are
used to produce the next generation. - A variation, (µ ?) selection includes the
parents in those considered for selection into
the next generation.
20Ranking Selection
- Selection Based on Fitness Order
- The members of the population are ranked from
best to worst. - The selection probability is assigned based on
the rank.
21Tournament Selection
- Select a subset of the population (the tournament
size) randomly. - More fit (winning) individuals are used to
generate replacements for less fit (losing)
individuals. - Accelerates processing time (compared with full
competition) - Facilitates parallel processing
22The Basic GP Algorithm (from Banzhaf, et. al)
- Define the terminal set
- Define the function set
- Define the fitness function
- Define parameters such as population size,
maximum individual size, crossover probability,
selection method, and termination criterion
23Generational GP
- Like what we have seen in GA
- New generation completely replaces the previous
generation. - Initialize the population
- Evaluate the individual programs
- Until a new population is fully populated, repeat
- Select an individual or individuals in the
population using selection algorithm - Perform genetic operations on the selected
individual or individuals - Insert the result of the genetic operations into
the new population - Best individual is the resulting program.
24Steady State GP
- There are no generations
- Initialize the population
- Randomly choose a subset of the population to
take part in the tournament - Evaluate the fitness value of each competitor in
the tournament. - Select the winner or winners from the competitors
in the tournament using the selection algorithm. - Apply genetic operators to the winner or winners
of the tournament
25Steady State GP (continued)
- Replace the losers in the tournament with the
results of the application of the genetic
operators to the winners of the tournament. - Repeat steps 2-6 until the termination criterion
is met.
26Introns
- Code sections (functions) that provide no real
value for the problem at hand - Introns do not directly affect the fitness of the
individual. - e.g., j j 0 or j j 1
- Early and middle sections of GP runs might
include 40-60 introns. - Later in the run, introns begin to dominate the
code. - Introns growth is exponential!
27Why GP Introns Emerge
- Children tend to be less fit than parents
- Crossover and mutation can be extremely
destructive - Introns reduce the destructive effects of genetic
operators - Parents generate introns when it is easier to
protect what they already can do, through the
creation of introns, than improve on what they
are currently doing.
28Effective Fitness
- Function of at least two factors
- The fitness of the parent
- Likelihood that genetic operators will affect the
fitness of the parents children
29Effects of Introns
- Introns may have differing effects before and
after exponential growth of introns begins - Different systems may generate different types of
introns with different probabilities. - The extent to which genetic operatos are
destructive in their effect is likely to be a
very important initial condition in intron
growth. - Mutation and crossover may affect different types
of introns differently.
30Problems Caused by Introns
- Run stagnation (no progress)
- Poor results (do nothing code)
- Drain on memory and CPU time (storing and
executing unnecessary code)
31Possible Beneficial Effects of Introns
- Introns might serve to isolate useful code blocks
- This facilitates the building block model by
protecting useful building blocks from disruption
32Methods of Handling Introns
- Reduce the destructiveness of genetic operators
- Reducing destructive crossover to 0 results in
hill climbing - Attach fitness penalty to the length of the
program. - Change the fitness function
- Provides the GP with a way to improve that is
better than just insulating the current best
solution.
33References
- Genetic Programming, An Introduction
- Wolfgang Banzhaf, Peter Nordin, Robert E. Keller,
Frank D. Francone - Genetic Programming Tutorial
- John Koza, Gecco 2005
- Genetic Programming The Movie
- John Koza