Title: Folie 1
1Evolution Strategiesfor Industrial Applications
Thomas Bäck November 26, 2004
Natural Computing Leiden Institute for Advanced
Computer Science (LIACS) Niels Bohrweg 1 NL-2333
CA Leiden baeck_at_liacs.nl Tel. 31 (0) 71 527
7108 Fax 31 (0) 71 527 6985
Managing Director / CTO NuTech Solutions GmbH /
Inc. Martin-Schmeißer-Weg 15 D 44227
Dortmund baeck_at_nutechsolutions.de Tel. 49 (0)
231 / 72 54 63-10 Fax 49 (0) 231 / 72 54 63-29
2Background I
Daniel Dannett Biology Engineering
3Background II
Realistic Scenario ....
4Background III
Phenotypic evolution ... and genotypic
5 Optimization
- E.g. costs (min), quality (max), error (min),
stability (max), profit (max), ... - Difficulties
- High-dimensional
- Nonlinear, non-quadratic Multimodal
- Noisy, dynamic, discontinuous
- Evolutionary landscapes are like that !
6Overview
- Evolutionary Algorithm Applications Examples
- Evolutionary Algorithms Some Algorithmic Details
- Genetic Algorithms
- Evolution Strategies
- Some Theory of EAs
- Convergence Velocity Issues
- Other Examples
- Drug Design
- Inverse Design of Cas
- Summary
7Modeling Simulation - Optimization
Modeling / Data Mining
???
!!!
!!!
Simulation
!!!
???
!!!
Optimization
???
!!!
!!!
8General Aspects
9Examples I Inflatable Knee Bolster Optimization
MAZDA Picture
Low Cost ES 0.677 GA (Ford) 0.72 Hooke Jeeves
DoE 0.88
Initial position of knee bag model
deployed knee bag (unit only)
Tether FEM 5
Tether
Volume of 14L
Load distribution plate FEM 3
Support plate FEM 4
Support plate
Knee bag FEM 2
Load distribution plate
Straps are defined in knee bag(FEM 2)
Vent hole
10IKB Previous Designs
11IKB Problem Statement
Objective Min Ptotal Subject to Left Femur
load lt 7000 Right
Femur load lt 7000
12IKB Results I Hooke-Jeeves
Quality 8.888 Simulations 160
13IKB Results II (11)-ES
Quality 7.142 Simulations 122
14Optical Coatings Design Optimization
- Nonlinear mixed-integer problem, variable
dimensionality. - Minimize deviation from desired reflection
behaviour. - Excellent synthesis method robust and reliable
results.
15Dielectric Filter Design Problem
- Dielectric filter design.
- n40 layers assumed.
- Layer thicknesses xi in 0.01, 10.0.
- Quality function Sum of quadratic penalty
terms. - Penalty terms 0 iff constraints satisfied.
Client Corning, Inc., Corning, NY
16Results Overview of Runs
- Factor 2 in quality.
- Factor 10 in effort.
- Reliable, repeatable results.
Benchmark
17Problem Topology Analysis An Attempt
- Grid evaluation for 2 variables.
- Close to the optimum (from vector of quality
0.0199). - Global view (left), vs. Local
view (right).
18Examples II Bridgman Casting Process
18 Speed Variables (continuous) for Casting
Schedule
Turbine Blade after Casting
FE mesh of 1/3 geometry 98.610 nodes, 357.300
tetrahedrons, 92.830 radiation surfaces large
problem run time varies 16 h 30 min to
32 h (SGI, Origin, R12000, 400 MHz) at each
run 38,3 GB of view factors (49 positions) are
treated!
19Examples II Bridgman Casting Process
Global Quality
Turbine Blade after Casting
GCM(Commercial Gradient Based Method)
Initial (DoE)
Evolution Strategy
Quality Comparison of the Initial and Optimized
Configurations
20Examples IV Traffic Light Control
Client Dutch Ministry of Traffic Rotterdam, NL
- Generates green times for next switching
schedule. - Minimization of total delay / number of stops.
- Better results (3 5) / higher flexibility than
with traditional controllers. - Dynamic optimization, depending on actual traffic
(measured by control loops).
21Examples V Elevator Control
Client Fujitec Co. Ltd., Osaka, Japan
- Minimization of passenger waiting times.
- Better results (3 5) / higher flexibility
than with traditional controllers. - Dynamic optimization, depending on actual
traffic.
22Examples VI Metal Stamping Process
Client AutoForm Engineering GmbH, Dortmund
- Minimization of defects in the produced parts.
- Optimization on geometric parameters and
forces. - Fast algorithm finds very good results.
23Examples VII Network Routing
Client SIEMENS AG, München
- Minimization of end-to-end-blockings under
service constraints. - Optimization of routing tables for existing,
hard-wired networks. - 10-1000 improvement.
24Examples VIII Nuclear Reactor Refueling
Client SIEMENS AG, München
- Minimization of total costs.
- Creates new fuel assembly reload patterns.
- Clear improvements (1-5) of existing expert
solutions. - Huge cost saving.
25Two-Phase Nozzle Design (Experimental)
Experimental design optimisation Optimise
efficieny.
Initial design
... evolves...
Final design 32 improvement in efficieny.
26Multipoint Airfoil Optimization (1)
Client
Cruise
Low Drag!
High Lift!
Start
22 design parameters.
27Multipoint Airfoil Optimization (2)
Three compromise wing designs
Pareto set after 1000 Simulations
28Evolutionary Algorithms Some Algorithmic Details
29Unifying Evolutionary Algorithm
t 0 initialize(P(t)) evaluate(P(t)) while
not terminate do P(t) mating_selection(P(t))
P(t) variation(P(t)) evaluate(P(t))
P(t1) environmental_selection(P(t) u Q) t
t1 od
30Evolutionary Algorithm Taxonomy
Evolution Strategies
Genetic Algorithms
Classifier Systems
Genetic Programming
Evolutionary Programming
Many mixed forms agent-based systems, swarm
systems, A-life systems, ...
31Genetic Algorithms vs. Evolution Strategies
Genetic Algorithm
Evolution Strategies
- Real-valued representation
- Normally distributed mutations
- Fixed recombination rate ( 1)
- Deterministic selection
- Creation of offspring surplus
- Self-adaptation of strategy
- parameters
- Variance(s), Covariances
- Binary representation
- Fixed mutation rate pm ( 1/n)
- Fixed crossover rate pc
- Probabilistic selection
- Identical population size
- No self-adaptation
32 Genetic Algorithms
- Often binary representation.
- Mutation by bit inversion with probability pm.
- Various types of crossover, with probability pc.
- k-point crossover.
- Uniform crossover.
- Probabilistic selection operators.
- Proportional selection.
- Tournament selection.
- Parent and offspring population size identical.
- Constant strategy parameters.
33 Mutation
0
1
1
1
0
0
0
1
0
1
0
0
0
0
1
- Mutation by bit inversion with probability pm.
- pm identical for all bits.
- pm small (e.g., pm 1/l).
34 Crossover
- Crossover applied with probability pc.
- pc identical for all individuals.
- k-point crossover k points chosen randomly.
- Example 2-point crossover.
35 Selection
- Fitness proportional
- f fitness
- l population size
- Tournament selection
- Randomly select q ltlt l individuals.
- Copy best of these q into next generation.
- Repeat l times.
- q is the tournament size (often q 2).
36 Evolution Strategies
- Real-valued representation.
- Normally distributed mutations.
- Various types of recombination.
- Discrete (exchange of variables).
- Intermediate (averaging).
- Involving two or more parents.
- Deterministic selection, offspring surplus l gtgt
m. - Elitist (ml)
- Non-elitist (m,l)
- Self-Adaptation of strategy parameters.
37Mutation
Creation of a new solution
- s-adaptation by means of
- 1/5-success rule.
- Self-adaptation.
- More complex / powerful strategies
- Individual step sizes si.
- Covariances.
Convergence speed ? Ca. 10 ? n down to 5 ? n is
possible.
38Self-Adaptation
- Learning while searching Intelligent Method.
- Different algorithmic approaches, e.g
- Pure self-adaptation
- Mutational step size control MSC
- Derandomized step size adaptation
- Covariance adaptation
39Self-Adaptive Mutation
n 2, ns 1, na 0
n 2, ns 2, na 0
n 2, ns 2, na 1
40 Self-Adaptation
- Motivation General search algorithm
- Geometric convergence Arbitrarily slow, if s
wrongly controlled ! - No deterministic / adaptive scheme for arbitrary
functions exists. - Self-adaptation On-line evolution of strategy
parameters. - Various schemes
- Schwefel one s, n s, covariances Rechenberg MSA.
- Ostermeier, Hansen Derandomized, Covariance
Matrix Adaptation. - EP variants (meta EP, Rmeta EP).
- Bäck Application to p in GAs.
Step size Direction
41 Self-Adaptation Dynamic Sphere
- Optimum s
- Transition time proportionate to n.
- Optimum s learned by self-adaptation.
42Selection
(m,l)
(ml)
43Possible Selection Operators
- (11)-strategy one parent, one offspring.
- (1,l)-strategies one parent, l offspring.
- Example (1,10)-strategy.
- Derandomized / self-adaptive / mutative step size
control. - (m,l)-strategies mgt1 parents, lgtm offspring
- Example (2,15)-strategy.
- Includes recombination.
- Can overcome local optima.
- (ml)-strategies elitist strategies.
44 Advantages of Evolution Strategies
- Self-Adaptation of strategy parameters.
- Direct, global optimizers !
- Faster than GAs !
- Extremely good in solution quality.
- Very small number of function evaluations.
- Dynamical optimization problems.
- Design optimization problems.
- Discrete or mixed-integer problems.
- Experimental design optimisation.
- Combination with Meta-Modeling techniques.
45Some Theory of EAs
46 Robust vs. Fast
- Global convergence with probability one
- General, but for practical purposes useless.
- Convergence velocity
- Local analysis only, specific functions only.
47 Convergence Velocity Analysis, ES
- A convex function (sphere model).
- Simplest case (1,l)-ES
- Illustration (1,4)-ES
-
48 Convergence Velocity Analysis, ES
- Order statistics
- pnl(z) denotes the p.d.f. of Znl
- Idea Best offspring has smallest r / largest z.
- The following holds from geometric
considerations - One gets
-
49 Convergence Velocity Analysis, ES
- Using
- one finally gets
- One gets
-
-
(dimensionless)
l
50 Convergence Velocity Analysis, ES
- Convergence velocity, (1,l)-ES and (1l)-ES
-
51 Convergence Velocity Analysis, GA
- (11)-GA, (1,l)-GA, (1l)-GA.
- For counting ones function
- Convergence velocity
- Mutation rate p, q 1 p, kmax l fa.
52 Convergence Velocity Analysis, GA
- Optimum mutation rate ?
- Absorption times from transition matrix
- in block form, using where
53 Convergence Velocity Analysis, GA
p
- p too large
- Exponential
- p too small
- Almost constant.
- Optimal O(l ln l) .
54 Convergence Velocity Analysis, GA
- (1,l)-GA (kmin -fa), (1l)-GA (kmin 0)
55 Convergence Velocity Analysis, EA
- (1,l)-GA, (1l)-GA (1,l)-ES, (1l)-ES
Conclusion Unifying, search-space independent
theory !?
56 Current Drug Targets
http//www.gpcr.org/
GPCR
57 Goals (in Cooperation with LACDR)
- CI Methods
- Automatic knowledge extraction from biological
databases fuzzy rules. - Automatic optimisation of structures evolution
strategies. - Exploration for
- Drug Discovery,
- De novo Drug Design.
Initialisation Final (optimized)
Charge distribution on VdW surface of CGS15943
New derivative with good receptor affinity.
Fingerprint
58 Evolutionary DNA-Computing (with IMB)
- DNA-Molecule Solution candidate !
- Potential Advantage gt 1012 candidate solutions
in parallel. - Biological operators
- Cutting, Splicing.
- Ligating.
- Amplification.
- Mutation.
- Current approaches very limited.
- Our approach
- Suitable NP-complete problem.
- Modern technology.
- Scalability (n gt 30).
59UP of CAs ( Inverse Design of CAs)
- 1D CAs Earlier work by Mitchell et al., Koza,
... - Transition rule Assigns each neighborhood
configuration a new state. - One rule can be expressed by bits.
- There are rules for a binary 1D CA.
1
0
0
0
0
1
1
0
1
0
1
0
1
0
0
Neighborhood (radius r 2)
60UP of CAs (rule encoding)
- Assume r1 Rule length is 8 bits
- Corresponding neighborhoods
1
0
0
0
0
1
1
0
000 001 010 011 100 101 110 111
61Inverse Design of CAs 1D
62Inverse Design of CAs 1D
- Majority problem
- Particle-based rules.
- Fitness values
- 0.76, 0.75, 0.76, 0.73
63Inverse Design of CAs 1D
Block expanding rules
Dont care about initial state rules
Particle communication based rules
64Inverse Design of CAs 1D Majority Records
- Gacs, Kurdyumov, Levin 1978 (hand-written) 81.6
- Davis 1995 (hand-written) 81.8
- Das 1995 (hand-written) 82.178
- David, Forrest, Koza 1996 (GP) 82.326
65Inverse Design of Cas 2D
- Generalization to 2D (nD) CAs ?
- Von Neumann vs. Moore neighborhood (r 1)
- Generalization to r gt 1 possible
(straightforward) - Search space size for a GA vs.
1
1
0
1
1
0
1
1
0
1
0
0
0
1
66Inverse Design of CAs
- Learning an AND rule.
- Input boxes are defined.
- Some evolution plots
67Inverse Design of CAs
- Learning an XOR rule.
- Input boxes are defined.
- Some evolution plots
68Inverse Design of CAs
- Learning the majority task.
- 84/169 in a), 85/169 in b).
- Fitness value 0.715
69Inverse Design of CAs
- Learning pattern compression tasks.
70Evolution Computation ?
YesSearch Optimization are fundamental
problems / tasks in many applications (learning,
engineering, ...).
71Summary
- Explicative models based on Fuzzy Rules.
- Descriptive models based on e.g. Kriging method.
- Few data points necessary, high modeling
accuracy. - Used in product design, quality control,
management decision - support, prediction and optimization.
- Optimization based on Evolution Strategies
(and traditional methods). - Few function evaluations necessary.
- Robust, widely usable, excellent solution
quality. - Self-adaptivity (easy to use !).
- Patents US no. 5,826,251 Germany no. 43 08 083,
44 16 465, 196 40 635.
72Questions ?
Thank you very much for your time !
73- Evolutionary Algorithm Applications
- Few Evaluations Response Surface Approximations
74ES plus Response Surface Approximation
- Learning of a RSA.
- Utilization for selection
- of promising points.
Original
100 points
Bäck et al., Metamodel-Assisted Evolution
Strategies, PPSN6, Granada Spain (2002)
200 points
400 points
75Data infrastructure for optimization with RSA
Parameter ranges, initial values
Parameter values
Interface
Database of trial results
SIMULATOR e.g. CASTS, FLUENT
Optimizer
Response Surface Approximator
Aggregation
Quality and constraint values
76Criterion for selecting trial points
- Points with high expected quality.
- Points located in unexplored regions of search
space.
Estimation of approximation error
Estimation of quality value (from data base).
77Method Kriging Interpolation
- Invented for prediction of gold distribution in
African gold mines. - Used here as interpolation method for RSA.
- Model assumption Realizations of n-dimensional
random walk. - Reconstruction according to Maximum Likelyhood
Principle. - Error minimized quadratic error from Maximum
Likelyhood Analysis.
78Data-Driven Modeling by Kriging Interpolation
Given List of m trial results
Searched for Approximate estimation model for
unknown points.
Estimated approximation error
79Kriging Interpolation Theory
Stochastic Gaussian Process Covariance function
(kernel)
Maximum Likelihood estimation (model parameters
of Gaussian process)
1-dimensional optimization matrix inversion
at each iteration
80Kriging Interpolation Theory
Estimation of
Estimated error of the prediction
81Kriging-Method Approximation / Error Estimation
f(x)
f(x)
x
x
MSE Estimated Approximation Error
82Kriging-Method Airfoil Optimization Example
Comparison of various strategies
- Fast seq. Quadratic programming.
- Pattern search.
- MC sampling.
- (210)-ES.
- RSA-supported ES.
x
MSE Estimated Approximation Error
83- Evolutionary Algorithm Applications
- Noisy Objectives Thresholding
84Noisy Fitness Functions Thresholding
- Fitness evaluation is disturbed by noise,
e.g. stochastic distribution of passengers
within an elevator system. - Traffic control problems in general.
- Probability of generating a real improvement is
very small. - Introduce explicit barrier into the (11)-ES to
distinguish real improvements from overvalued
individualsOnly accept offspring if it
outperforms the parent by at least a value of ?
(threshold).
85Finding the Optimal Threshold
- For Gaussian noise
- General optimal threshold
- For the sphere model (where R is the distance to
the optimum)
86Influence of Thresholding (I)
solid linesopt. threshold dashed
lines crossesdata measured in ES-runs
(sphere) noise strength (from top to bottom)
normalized progress rate ?
normalized mutation strength ?
87Influence of Thresholding (II)
noise strength (from top to bottom)
normalized progress rate ?
normalized threshold ?
88Applications elevator control
- Simulation of an elevator group controller takes
a long time - Instead use artificial problem tightly related to
the real-world problem S-Ring
89Application in Elevator Controller S-Ring
- Only thresholding leads to a positive quality
gain (? 0.3) - A too large threshold value does not permit any
progress(? 1.0)
Quality gain q
distance to optimum R 0 greedy, 100 optimum
90 Evolutionary DNA-Computing
- Example Maximum Clique Problem
- Problem Instance Graph
- Feasible Solution V such that
- Objective Function Size V of clique V
- Optimal Solution Clique V that maximizes V .
- Example
2,3,6,7 Maximum Clique (01100110) 4,5,8
Clique. (00011001)
91DNA-Computing Classical Approach
1 X randomly generate DNA strands
representing all candidates 2 Remove the set Y
of all non-cliques from X C X Y 3 Identify
with smallest length (largest
clique)
- Based on filtering out the optimal solution.
- Fails for large n (exponential growth).
- Applied in the lab for n6 (Ouyang et el., 1997)
limited to n36 (nanomole operations).
92DNA-Computing Evolutionary Approach
1 Generate an initial random population P,
2 while not terminate do 3 P
amplify and mutate P 4 Remove the set Y of all
non-cliques from P P P - Y 5 P select
shortest DNA strands from P 6 od
- Based on evolving an (near-) optimal solution.
- Also applicable for large n.
- Currently tested in the lab (Leiden, IMB).
93Scalability Issues
Maximum Clique Simulation Results (1,l)-GA (best
of 10) Problem n l10 100 1000 10000 Opt. brock
200_1 200 14 17 17 19 21 brock200_2 200 6 9
8 9 12 brock200_3 200 10 11 12 12 15 brock200_4
200 11 12 13 14 17 hamming8-4 256 --- 12 12 16 16
p_hat300-1 300 --- 6 7 7
8 p_hat300-2 300 --- 19 19 20 25
- Averages (not shown here) confirm trends.
- Theory for large (NOT infinite) population sizes
(other than cu,l) ?
94- Evolutionary Algorithm Applications
- Multiple Criteria Decision Making (MCDM)
95Multi Criteria Optimization (1)
- Most Problems More than one aspect to optimise.
- Conflicting Criteria !
- Classical optimization techniques map multiple
criteria to one single value, e.g. by weighted
sum - But How can optimal weights be determined?
- Evolution Strategies can directly use the concept
of Pareto Dominance
96Multi Criteria Optimization (2)
- Multi Criteria Optimization does not mean
- Decide on What is a good compromise before
optimization (e.g. by choosing weighting
factors). - Find one single optimal solution.
- Multi Criteria Optimization means
- Decide on a compromise after optimization.
- Find a set of multiple compromise solutions.
- Evolutionary Multi Criteria Optimization means
- Use the population structure to represent the set
of multiple compromise solutions. - Use the concept of Pareto Dominance
97Multi Criteria Optimization (3)
98Pareto Dominance Definition
- If all fi(a) are better than fi(b), then a
dominates b. - If all fi(b) are better than fi(a), then b
dominates a. - If there are i and j, such that
- fi(a) is better than fi(b), but
- fj(b) is better than fj(a), then
- a and b do not dominate each other
(are equal,
are incomparable)
99Multipoint Airfoil Optimization (3)
Pressure profile at high lift flow conditions
Pressure profile at low drag flow conditions
100Evolutionary Multi Criteria Optimization
- General idea After selection, the parent
population should ... - ... contain as many dominating individuals as
possible - ... be evenly distributed along the Pareto front
- Two concepts to achieve this
- NSGA-II (Deb) - NSES
- SPEA2 (Zitzler, Thiele)
101Nondominated Sorting Evolutionary Algorithm (1)
- Selecting dominating individuals by Nondominated
Sorting - 1. Start with rank 1 and the complete population.
- 2. In the current population Find the Pareto
set. - 3. Assign the individuals of the Pareto set the
same rank. - 4. Increment the rank.
- 5. Remove the Pareto set from the current
population. - 6. Start over at 2.
- Individuals with low ranks are prefered!
102Nondominated Sorting Evolutionary Algorithm (2)
Third Pareto Front Rank 3
First Pareto Front Rank 1
Second Pareto Front Rank 2
103Nondominated Sorting Evolutionary Algorithm (3)
- Estimating population density depending on
distances to next neighbours. - Prefer individuals with low density!
d1
d2
104NSEA and Evolution Strategies
- Observation
- Pareto dominance is expressed as integer value
(Rank). - Density estimation is expressed as a real value
between 0 and 1 (?). - Idea Set Fitness to Rank Density.
- With increasing density, the fitness increases
towards Rank 1. - With decreasing density, the fitness decreses
towards Rank. - Dominating individuals have low fitness,
independent of their density. - Use this fitness value in a standard (??)
Evolution Strategy (archive implicit !).