Title: Evolution%20strategies%20(ES)
1Evolution strategies (ES)
2Evolution strategies
- Overview of theoretical aspects
- Algorithm
- The general scheme
- Representation and operators
- Example
- Properties
- Applications
3ES quick overview (I)
- Developed Germany in the 1970s
- Early names Ingo Rechenberg, Hans-Paul Schwefel
and and Peter Bienert (1965), TU Berlin - In the beginning, ESs were not devised to compute
minima or maxima of real-valued static functions
with fixed numbers of variables and without noise
during their evaluation. Rather, they came to the
fore as a set of rules for the automatic design
and analysis of consecutive experiments with
stepwise variable adjustments driving a suitably
flexible object / system into its optimal state
in spite of environmental noise. - Search strategy
- Concurrent, guided by absolute quality of
individuals
4ES quick overview (II)
- Typically applied to
- application concerning shape optimization a
slender 3D body in a wind tunnel flow into a
shape with minimal drag per volume. - numerical optimisation
- continuous parameter optimisation
- computational fluid dynamics the design of a 3D
convergent-divergent hot water flashing nozzle. - ESs are closer to Larmackian evolution (which
states that acquired characteristics can be
passed on to offspring). - The difference between GA and ES is the
Representation and Survival selection mechanism,
that imply survival in the new population of part
from the old population
5ES quick overview (III)
- Attributed features
- fast
- good optimizer for real-valued optimisation
(real-valued vectors are used to represent
individuals) - relatively much theory
- Strong emphasis on mutation for creating
offspring - Mutation is implemented by adding some random
noise drawn from Gaussian distribution - Mutation parameters are changed during a run of
the algorithm - In the ES the control parameter are included in
the chromosomes and co-evolve with the solutions. - Special
- self-adaptation of (mutation) parameters standard
6ES Algorithm - The general scheme
- An Example Evolution Strategy
- Procedure ES
- t 0
- Initialize P(t)
- Evaluate P(t)
- While (Not Done)
-
- Parents(t) Select_Parents(P(t))
- Offspring(t) Procreate(Parents(t))
- Evaluate(Offspring(t))
- P(t1) Select_Survivors(P(t),Offspring(t))
- t t 1
-
- The differences between GA and ES consists in
representation and survivors selection (in the
new population will survive the best of parents
and offspring unlike generational genetic
algorithms where children replaced the parents).
7ES technical summary tableau
Representation Real-valued vectors Encoding also the mutation rate
Recombination Discrete or intermediary
Mutation Gaussian perturbation
Parent selection Uniform random
Survivor selection (?,?) or (??)
Specialty Self-adaptation of mutation step sizes
8Evolution Strategies
- There are basically 4 types of ESs
- The Simple (11)-ES (In this strategy the aspect
of collective learning in a population is
missing. The population is composed of a single
individual). - The (?1)-ES (The first multimember ES. ? parents
give birth to 1 offspring) - For the next two ESs ? parents give birth to ?
offspring - The (??)-ES. P(t1) Best ? of the ??
individuals - The (?,?)-ES. P(t1) Best ? of the ?
offspring.
9(11) - Evolution Strategies (two membered
Evolution Strategy)
- Before the (11)-ES there were no more than two
rules - 1. Change all variables at a time, mostly
slightly and at random. - 2. If the new set of variables does not diminish
the goodness of the device, keep it, otherwise
return to the old status. - The Simple (11)-ES (In this strategy the aspect
of collective learning in a population is
missing. The population is composed of a single
individual). - (11)-ES is a stochastic optimization method
having similarities with Simulated Annealing. - Represents a local search strategy that perform
the current solution exploitation.
10(11) - Evolution Strategies features
- the convergence velocity, the expected distance
traveled into the useful direction per iteration,
is inversely proportional to the number of
variables of the objective function - linear convergence order can be achieved if the
mutation strength (or mean step-size or standard
deviation of each component of the normally
distributed mutation vector) is adjusted to the
proper order of magnitude, permanently - the optimal mutation strength corresponds to a
certain success probability that is independent
of the dimension of the search space and is the
range of one fifth for both model functions
(sphere model and corridor model). - the convergence (velocity) rate of a ES (1 1) is
defined as the ratio of the Euclidean Distance
(ED) traveled towards the optimal point and the
number of generations required for running this
distance.
11Introductory example
- Task minimise f Rn ? R
- Algorithm two-membered ES using
- Vectors from Rn directly as chromosomes
- Population size 1
- Only mutation creating one child
- Greedy selection
12Standard deviation. Normal distribution
- Consider X ? x1, x2, ,xn ? n-dimensional
random variable. - The mean (µ) M(X)(x1 x2,xn )/n.
- The square of standard deviation (also called
variance) - ?2 M(X-M(X))2?(xk - M(X))2/n
- Normal distribution
- N(µ,?)
The distribution with µ 0 and s?2 1 is called
the standard normal.
13Illustration of normal distribution
http//fooplot.com/
14Introductory example pseudocode
Minimization problem
- Set t 0
- Create initial point xt ? x1t,,xnt ?
- REPEAT UNTIL (TERMIN.COND satisfied) DO
- Draw zi from a normal distribution for all i
1,,n - yit xit zi or yit xit N(0, ?)
- IF f(xt) lt f(yt) THEN xt1 xt
- ELSE xt1 yt
- endIF
- Set t t1
- endDO
15Introductory example mutation mechanism
- z values drawn from normal distribution N(µ,?)
- Mean µ is set to 0
- Standard deviation ? is called the mutation step
size - ? is varied on the fly by the 1/5 success rule
- This rule resets ? after every k iterations by
- ? ? / c if Ps gt 1/5 (Foot of big hill ?
increase s) - ? ? c if Ps lt 1/5 (Near the top of the
hill ? decrease s) - ? ? if Ps 1/5
- where Ps is the of successful mutations (those
in which the child is fitter than parents), 0.8 ?
c ? 1, usualy c0.817 - Mutation rule for object variables x (xit) is
additive, while the mutation rule for dispersion
(?) is multiplicative.
16The Rechenbergs 1/5th - succes rule
- The 1/5th rule of success is a mechanism that
ensures efficient heuristic search with the price
of decreased robustness. - The ratio of successful mutations and other
mutations must be the fifth (1/5). - IF this ratio is greater than 1/5 the dispersion
must be increased (accelerates convergence). - ELSE
- IF this ratio is less than 1/5 the dispersion
must be decreased.
17The implementation of the Rechenbergs 1/5th -rule
1. perform the (1 1)-ES for a number G of
generations - keep s constant during this
period - count the number Gs of successful
mutations during this period 2. determine an
estimate of the success probability Ps by Ps
Gs/G 3. change s according to s s / c,
if Ps gt 1/5 s s c, if Ps lt 1/5 s
s, if Ps 1/5 4. goto 1.
The optimal value of the factor c depends on the
objective function to be optimized, the
dimensionality N of the search space, and on the
number G. If N is sufficiently large N 30, G
N is a reasonable choice. Under this condition
Schwefel (1975) recommended using 0.85 c lt
1. Since we are not finding better solutions, we
have reached the top of the hill. ? Rechenbergs
1/5 rule reduces the standard deviation s in the
case that the system was not very successful in
finding better solutions.
18Another historical examplethe jet nozzle
experiment
Task to optimize the shape of a jet
nozzle Approach random mutations to shape
selection
19Another historical examplethe jet nozzle
experiment contd
In order to be able to vary the length of the
nozzle and the position of its throat, gene
duplication and gene deletion was mimicked to
evolve even the number of variables, i.e., the
nozzle diameters at fixed distances. The perhaps
optimal, at least unexpectedly good and so far
best-known shape of the nozzle was
counter-intuitively strange, and it took a while,
until the one-component two-phase supersonic flow
phenomena far from thermodynamic equilibrium,
involved in achieving such good result, were
understood.
20The disadvantages of (11)-ES
- Fragile nature of the search point by point
based on the 1/5 successful rule may lead to
stagnation in a local minimum point. - Dispersion (step size) is the same for each
dimension (coordinate) within search space. - Does not use recombination it is not using a
real population - There is no mechanism to allow individual
adjustment of stride for each coordinate axis of
the search space. The lack of such a mechanism is
that the procedure will move slowly to the
optimum point.
21(??), (?,?) - (multi membered Evolution
Strategies)
? parents give birth to ? offspring
22Representation
- Chromosomes consist of three parts
- Object variables x1,,xn
- Strategy parameters
- Mutation step sizes ?1,,?n?
- Rotation angles ?1,, ?n?
- Not every component is always present
- Full size ? x1,,xn, ?1,,?n ,?1,, ?k ?
- where k n(n-1)/2 (no. of i,j pairs)
23Mutation
- Main mechanism changing value by adding random
noise drawn from normal distribution - xi xi N(0,?)
- Key idea
- ? is part of the chromosome ? x1,,xn, ? ?
- ? is also mutated into ? (see later how)
- Thus mutation step size ? is coevolving with the
solution x
24Mutate ? first
- Net mutation effect ? x, ? ? ? ? x, ? ?
- Order is important
- first ? ? ? (see later how)
- then x ? x x N(0,?)
- Rationale new ? x ,? ? is evaluated twice
- Primary x is good if f(x) is good
- Secondary ? is good if the x it created is
good - Reversing mutation order this would not work
25Mutation case 1Uncorrelated mutation with one ?
- Chromosomes ? x1,,xn, ? ?
- ? ? exp(? N(0,1))
- xi xi ? N(0,1)
- Typically the learning rate ? ? 1/ n½
- And we have a boundary rule ? lt ?0 ? ? ?0
26Mutants with equal likelihood
- Circle mutants having the same chance to be
created
27Mutation case 2Uncorrelated mutation with n ?s
- Chromosomes ? x1,,xn, ?1,, ?n ?
- ?i ?i exp(? N(0,1) ? Ni (0,1))
- xi xi ?i Ni (0,1)
- Two learning rate parmeters
- ? overall learning rate
- ? coordinate wise learning rate
- ? ? 1/(2 n)½ and ? ? 1/(2 n½) ½
- And ?i lt ?0 ? ?i ?0
28Mutants with equal likelihood
- Ellipse mutants having the same chance to be
created
29Mutation case 3Correlated mutations
- Chromosomes ? x1,,xn, ?1,, ?n ,?1,, ?k ?
- where k n (n-1)/2
- and the covariance matrix C is defined as
- cii ?i2
- cij 0 if i and j are not correlated
- cij ½ ( ?i2 - ?j2 ) tan(2 ?ij) if i and
j are correlated - Note the numbering / indices of the ?s
30Correlated mutations contd
- The mutation mechanism is then
- ?i ?i exp(? N(0,1) ? Ni (0,1))
- ?j ?j ? N (0,1)
- x x N(0,C)
- x stands for the vector ? x1,,xn ?
- C is the covariance matrix C after mutation of
the ? values - ? ? 1/(2 n)½ and ? ? 1/(2 n½) ½ and ? ? 5
- ?i lt ?0 ? ?i ?0 and
- ?j gt ? ? ?j ?j - 2 ? sign(?j)
31Mutants with equal likelihood
- Ellipse mutants having the same chance to be
created
32Recombination
- Creates one child
- Acts per variable / position by either
- Averaging parental values, or
- Selecting one of the parental values
- From two or more parents by either
- Using two selected parents to make a child
- Selecting two parents for each position anew
33Names of recombinations
Two fixed parents Two parents selected for each i
zi (xi yi)/2 Local intermediary Global intermediary
zi is xi or yi chosen randomly Local discrete Global discrete
34Parent selection
- Parents are selected by uniform random
distribution whenever an operator needs one/some - Thus ES parent selection is unbiased - every
individual has the same probability to be
selected - Note that in ES parent means a population
member (in GAs a population member selected to
undergo variation)
35Survivor selection
- Applied after creating ? children from the ?
parents by mutation and recombination - Deterministically chops off the bad stuff
- Basis of selection is either
- The set of children only (?,?)-selection
- The set of parents and children (??)-selection
36Survivor selection contd
- (??)-selection is an elitist strategy
- (?,?)-selection can forget
- Often (?,?)-selection is preferred for
- Better in leaving local optima
- Better in following moving optima
- Using the strategy bad ? values can survive in
?x,?? too long if their host x is very fit - Selective pressure in ES is very high (? ? 7 ?
is the common setting)
37Self-adaptation illustrated
- Given a dynamically changing fitness landscape
(optimum location shifted every 200 generations) - Self-adaptive ES is able to
- follow the optimum and
- adjust the mutation step size after every shift !
38Self-adaptation illustrated contd
Changes in the fitness values (left) and the
mutation step sizes (right)
39Prerequisites for self-adaptation
- ? gt 1 to carry different strategies
- ? gt ? to generate offspring surplus
- Not too strong selection, e.g., ? ? 7 ?
- (?,?)-selection to get rid of misadapted ?s
- Mixing strategy parameters by (intermediary)
recombination on them
40ES Applications
- Lens shape optimization required to Light
refraction - Distribution of fluid in a blood network
- Brachystochrone curve
- Solving the Rubik's Cube
41Example application the Ackley function (Bäck
et al 93)
- The Ackley function (here used with n 30)
- Evolution strategy
- Representation
- -30 lt xi lt 30 (coincidence of 30s!)
- 30 step sizes
- (30,200) selection
- Termination after 200000 fitness evaluations
- Results average best solution is 7.48 10 8
(very good)