Title: Evolutionary Computation I
1Evolutionary Computation I
- COMP4001/7001
- 5 September 2005
2Learning ObjectivesAt the end of this lecture
students will understand
- What is an evolutionary algorithm?
- Effects of evolutionary operators
- The course of computational evolution
- Interactions between evolution and
3Learning ObjectivesAt the end of this lecture
students will understand
- What is an evolutionary algorithm?
- Effects of evolutionary operators
- The course of computational evolution
- Interactions between evolution and
4Why look to evolution?
- Evolution is inherently interesting
- Evolutionary theories are easy to generate,
almost impossible to test - EC modelling can be used as a proof of principle,
but never existence proof - Evolution is an optimization technique
- Biological evolution optimizes organisms to their
environments - EC can be used to optimize programs / processes
- Evolution is the only process to date which has
produced intelligence - EC can be used in an attempt to understand how
human intelligence works - May be the best hope for AI
5History of EC
- 1957 Fraser (Australian geneticist)
- bitstring representation of a chromosome and a
stochastic Monte Carlo approach - investigating questions such as the effects of
linkage on the efficiency of selection, the
relationship between the fitnesses of alleles and
factors such as population size and intensity of
selection, and the comparison of the efficiencies
of different breeding plans for varying degrees
of inter-locus interactions - The algorithm was run on SILLIAC, the parallel
computer at the University of Sydney. SILLIAC
(Sydney ILLIAC) was a slightly modified version
of the ILLIAC developed by the University of
Illinois, Urbana, United States. It cost 50,000
to construct, had a store of 1024 40-bit words,
could perform 13,333 additions/subtractions per
second, and read its input off punched paper tape
6More history
- 1957 Box
- evolutionary optimisation process for the
improvement of processes in a chemical plant,
involving carefully planned variations to the
procedures used in the operation of the plant
itself - 1966 Fogel, Owens and Walsh
- groundbreaking analysis of the possibilities of
simulated evolution for the development of
artificial intelligence - 1975 John Holland
- Adaptation in Natural and Artificial Systems an
Introductory Analysis with Applications to
Biology, Control and Artificial Intelligence - Classic GA
7Evolutionary algorithms
- All EC algorithms involve
- a population of individuals
- which undergo repeated generations of genetic
modification, fitness evaluation and
fitness-proportionate selection. - The genetic operators used to perform the
genetic modifications are simplified versions of
those found in biological systems. - Many operators have been described in the
literature - Lots of different flavours of EA
- Each makes different decisions about
implementation
8Learning ObjectivesAt the end of this lecture
students will understand
- What is an evolutionary algorithm?
- Effects of evolutionary operators
- The course of computational evolution
- Interactions between evolution and
9Operators
- Representation
- Hollands GA used binary chromosomes
(bitstrings) - representations ranging from strings of floating
point numbers to entire Lisp programs are used
for different problems by various practitioners - Mutation
- acts to introduce variability into the population
by altering the chromosome - most usual mutation operator for a bitstring
chromosome consists of flipping a bit from 0 to 1
or vice versa, with a given probability, the
mutation rate.
10More operators
- Crossover
- recombines parts of two (or more) chromosomes to
form new individuals - Single point crossover
11Selection
- Selection should be fitness proportionate
- fitter individuals should contribute more to the
next generation, on average, than less fit
individuals - selection method should have an element of
stochasticity so that every individual, no matter
how unfit, has a chance of becoming a parent - If only the fittest individuals in each
generation are allowed to breed the population
rapidly converges to the best solution found
early, which is very unlikely to be the global
best solution - Lots of different selections algorithms, produce
different types of selection pressure
12The Simple Genetic Algorithm
13Other Approaches
- Evolutionary Programming (EP)
- Fogel in the early 1960s, it has no genomic
representation. Each individual in the population
is an algorithm chosen at random over an
appropriate sample space. Mutation is the only
genetic operator used EP does not use crossover - Evolution Strategies (ES)
- Schwefel, also in the 1960s, as an optimisation
tool. ES uses a real-valued chromosome with a
population size of one and mutation as the only
genetic operator. In each generation the parent
is mutated to produce a descendant if the
descendant it fitter it becomes the parent for
the next generation, otherwise the original
parent is retained.
14And more
- Classifier Systems
- Holland (1975). A classifier takes inputs from
the environment and produces outputs indicating a
classification of the input events. A classifier
system produces new classifiers through the
action of a genetic algorithm on the systems
population of classifiers - Genetic Programming (GP)
- Koza in the late 1980s, the aim of GP is the
automatic programming of computers allowing
programs to evolve to solve a given problem. The
population consists of programs expressed as
parse trees operators used include crossover,
mutation and architecture-altering operations
patterned after gene duplication and gene
deletion in nature - Many others, often tailored to problem at hand
15Learning ObjectivesAt the end of this lecture
students will understand
- What is an evolutionary algorithm?
- Effects of evolutionary operators
- The course of computational evolution
- Interactions between evolution and
16Fitness landscapes
- Wright (1932) for a given set of genes each
possible combination of gene values (alleles)
could be assigned a fitness value for a
particular set of conditions - Entire genotype space can then be visualized as a
landscape, with genotypes of high fitness
occupying peaks and those of low fitness forming
troughs - Generally very high-dimensional
17The course of evolution in silico
This EA has a chromosome length of 10 bits and a
population of 10 individuals. The fitness
function is simply a count of the number of 1s in
the chromosome maximum fitness is therefore 10.
The EA uses elitism, where the fittest individual
in each generation is retained. Elitism ensures
that a good solution, once found, is never lost,
and means that the maximum fitness in the
population always increases
18Computational evolution
- Fitness originally random
- Increases over time
- Faster at first
- Eventually converges to a local optimum
- Not necessarily the global optimum
- Stochastic, so usually must be repeated
- Can be time consuming
- Can produce good solutions that work unexpectedly
19Schema Theorem
- Holland, 1975
- short, low-order, above-average schemata receive
exponentially increasing trials in subsequent
generations - If the chromosome is a bit string, a schema is a
set of building blocks described by a template
consisting of ones, zeros and asterisks - Template 100011 can be
- 10100111
- 10100101
- 10000101
- 10000111
20Schema theorem
- an evolutionary algorithm proceeds by identifying
short schemas of high fitness in different
individuals, and recombining them using crossover
in order to produce longer schemas of higher
fitness, and eventually entire individuals having
high fitness - attractive because it suggests that schemas can
be identified and the effects of mutation and
crossover upon schemas in a population of a given
size can be calculated exactly - mathematical tractability would potentially
provide useful insights into the way in which an
EA functions
21Testing schema theory
- Royal road functions - Mitchell, Forrest and
Holland (1991) - structured to provide a smooth, easy path to
maximum fitness under the assumptions of schema
theory - hierarchical fitness landscape, in which
crossover between instances of fit lower-order
schemas tends to produce ever fitter higher-order
schemas - relatively highly fit intermediate stages could
in fact interfere with the finding of fit
higher-order solutions, since once an instance of
a fit intermediate schema is discovered its
relatively high fitness allows it to spread
quickly throughout the population, carrying with
it hitchhiking genes in positions not included
in the schema. Low-order schemas tend to be
discovered more-or-less sequentially, rather than
in parallel
22Variability
- Basis of evolution
- Mostly mutation
- In Eas, mostly point mutations
23Mutation
24Mutation rate
- Mutational meltdown a mutation rate so high
that the species cannot survive in the face of
the number of errors generated - about 1 mutation per genome per generation given
that mutations occur at random - maximum rate at which an organism can expect to
produce at least one error-free offspring in its
lifetime - Many EC implementations use a mutation rate of
1/genome - In RNA viruses, about one nucleotide per genome
is incorrectly reproduced per replication for
retroviruses the rate is one nucleotide per ten
genomic replications and for DNA-based microbes
it is about one per 300 replications - Longer genomes do not have higher mutation rates
error-correcting machinery
25Error correction (Ridley, 2000)
- Autocopying the first reproducers were probably
molecules of RNA or something similar, that could
copy themselves using bases from their
environment - Copying enzymes the evolution of enzymes which
catalysed the copying process would also have
made the process more reliable - Double stranded genetic material organisms which
used DNA rather then RNA have the advantage of
having a more stable information carrying
molecule, plus the advantage of having a two
complementary copies of the sequence, to
facilitate error checking - Suite of proofreading and repair enzymes
- Development a developmental process translating
a genotype into a phenotype allows for the
correction of errors on the fly in the course of
development all errors do not have to be
corrected in the genome - Ploidy using two or more copies of each
chromosome provides redundancy of the genetic
information, permitting the identification and
correction of errors - Sex recombination of genetic material from more
than one individual introduces the possibility of
concentrating genetic errors in a small
proportion of scapegoat offspring, allowing the
other offspring to be error-free
26Neutrality
- Before the details of the molecular basis of
genetics were worked out in the late 1950s, it
was generally assumed that most mutations cause
phenotypic alterations that are immediately
subject to selection. Under these circumstances
all the variation in a population is adaptive - Electrophoresis huge amount of variability at
the protein level - Motoo Kimura (1968) evolution is driven
primarily by random drift among equally
well-adapted sequence variants - Ohta (1973) Nearly neutral variants which do,
in fact, have a small selective difference can
become effectively neutral in small populations,
where random events become more important - Neutral networks in EC have been demonstrated
to affect the course of evolution by facilitating
random drift to more useful areas of the search
space
27Managing variability
- Variability is systematically eroded by
selection, while at the same time being
replenished via mutation and recombination - Different flavours of EA emphasise the importance
of mutation (e.g. evolutionary programming)
versus recombination (e.g. genetic algorithms) in
generating novelty - Effects of selection tend to outweigh those of
mutation and recombination, and the population
converges towards a peak in fitness space - Neutral mutations rarely occur, unless
deliberately designed into the algorithm
28Premature convergence
- In most EAs the entire population eventually
reaches a single peak and tends to stay there - If this peak is not the global maximum, the EA is
considered to have converged prematurely - Premature convergence occurs when the population
loses the genetic variability which is essential
to continued evolution - This almost complete loss of genetic diversity is
never observed in biological populations
29Causes of premature convergence
- Haploid genotype exposes every mutation to
selection - Diploid genotype have been used require a
dominance map or equivalent - EAs using diploid chromosomes do
- tend to maintain more genetic variability
- than haploid EAs, but they rarely find
- better solutions
- benefit of recessively masked variability
- will only be realised if the environment
- in which the population is evolving changes
30Psuedo Founder Effects
- The Founder Effect occurs when a population
passes through a population size bottleneck, from
which only a few individuals emerge to establish
a new population, for example when a small number
of individuals colonize a new island - In EAs a related phenomenon frequently occurs
when a very fit individual arises in the
population it tends to dominate future
generations - since most individuals are descended from a
single individual they tend to be very similar in
sequence, and so the crossover operator will have
little effect - any genes which happen to be on the
pseudo-founders chromosome will also spread
throughout the population, whether or not they
are valuable, a phenomenon know as hitchhiking
31Other factors
- Intense, unidirectional selection pressure
- Development
- Troubleshooting mechanisms
- Added source of noise
- Environmental interactions
32Speciation
- Preselection two individuals are mated to
produce an offspring, which is compared with both
the parents. If the fitness of the child is
greater than that of the worst parent, it
replaces that parent in the population. The idea
is that individuals are replaced by others which
are fitter than they are, but similar in
sequence, so that a number of different solutions
can be maintained in the population, improving
gradually over time - Crowding the crowding of solutions in search
space is discouraged, for example by comparing a
new individual with a subset of the existing
population, and replacing the most similar of
that subset with the new individual. - Fitness Sharing when there are a number of
individuals with very similar sequences, the
fitness of that genotype is shared amongst them
all. This is a very popular diversity maintenance
operator, and there are a large number of
variants on the scheme.
33More speciation
- Niching encouraging the development of different
ecological niches in the population, using an
approach such as the spatially restrained
grid-based algorithm - Coevolution evolving more than one type of
individual at once, with different species
attempting as part of their fitness function to
maintain as much genetic distance from other
species as possible. - Restricted Mating individuals are only allowed
to mate if they are in the basin of attraction of
the same optimum. Once again, this scheme
attempts to replace like with like in the
population. There are a number of variants on the
restricted mating approach
34Hill Climbing
- implicit parallelization by maintaining a
population of candidate solutions which are
modified by mutation and/or crossover, the
algorithm is, in effect, exploring different
regions of its search space in parallel - simplest alternative to a population based EA is
a hill climber, an algorithm which has a
population of one individual, and performs a
strictly local search using mutation - The parallel nature of an EA provides no
advantages over multiple random restarts of a
hill climber in terms of the number of solution
evaluations performed
35When is an EA better?
- the action of the genetic operators used in the
EA provided advantages over local search, which
would, indeed, be the case if the schema theorem
was acting as described, with useful partial
solutions discovered by different individuals
being recombined to produce fitter individuals
more rapidly than could be done by mutation
alone or - the structure of the fitness landscape was such
that the implicit memory of a population-based
algorithm (i.e. the memory encoded into the
structure of the population itself as a result of
evolution) allowed it to concentrate its search
in areas of high fitness in a manner that would
not be possible for a hill climber - In practice, hill climbers with multiple restarts
often perform as efficiently as or better than
population-based algorithms
36Learning ObjectivesAt the end of this lecture
students will understand
- What is an evolutionary algorithm?
- Effects of evolutionary operators
- The course of computational evolution
- Interactions between evolution and
37Coevolution
- The interactions between two or more species as
they evolve - Kauffmans rubber sheet evolution by one species
modifies the fitness landscape for both species
the coevolving species is thus given a spur to
further evolution, as its environment changes - Fitness landscape is constantly changing
- powerful strategy for avoiding premature
convergence in evolutionary algorithms is less
chance of the population converging to a local
minimum, since local (and global) minima are
constantly forming and dissolving as the fitness
landscape changes
38Using coevolution
- Samuels checker players (1963)
- hill climber, in which two programs played
against each other - In the course of the game one program modified
its parameter settings, while the other remained
static - If the modified copy won the game, it was
accepted, otherwise the original was retained - eventually played checkers at the level of a
human champion - Fogel (2001) still using evolution to develop
checkers players (Blondie21)
39Learning and evolution
- Neural networks may be evolved architecture,
connection weights, or both - Baldwin Effect (Baldwin, 1896) learning on the
part of individuals could guide the course of
evolution in the population as a whole - A particular trait may be learned, or it may be
innate - A learned trait has the advantage of providing
flexibility, but the disadvantage of being slow
to acquire an innate trait is present from
birth, but inflexible - Traits which are initially learned may become,
over time, encoded into the genotype of the
population
40The Baldwin effect
- Two preconditions must be met
- The trait in question (which may be a behaviour
or a physical trait) must be influenced by
several interacting genes, so that a mutation in
one of these genes will make the phenotypic
expression of the trait more likely and - an individual bearing such a mutation can learn
to express the trait - learning acts to provide partial credit for a
mutation - An individual carrying a mutation that
predisposes it towards an advantageous phenotype
will learn the trait more easily than its less
fortunately genetically endowed conspecifics, and
thus will survive and pass on more copies of that
allele to the next generation. Over time,
multiple mutations will accumulate in the genes
for the desirable trait, which will thus become
innate in the population
41Baldwin landscapes
42Conclusions
- Evolutionary computation can be used to
- Model biological evolution
- Optimize arbitrary functions
- Evolve artificial intelligence?
- EC is a very simplified version of real evolution
- It is important to understand what these
simplifications involve - Can be combined with many of the other approaches
discussed to date - EC can be used to solve problems which are
otherwise intractable
43Learning ObjectivesAt the end of this lecture
students will understand
- What is an evolutionary algorithm?
- Population-based, adaptive, optimization, not
necessarily global optimum - Effects of evolutionary operators
- Mutation, crossover, selection, others
- The course of computational evolution
- Fitness increase, variability, premature
convergence, etc - Interactions between evolution and
- Coevolution, evolution and learning