CHAPTER 10 EVOLUTIONARY COMPUTATION II: GENERAL METHODS AND THEORY - PowerPoint PPT Presentation

About This Presentation

Title:

CHAPTER 10 EVOLUTIONARY COMPUTATION II: GENERAL METHODS AND THEORY

Description:

In both CGA case (Rudolph, 1994) and case with elitism (Suzuki, 1995) the limit exists: ... Suzuki (1995) assumes each population includes one elite element and that ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 21

Provided by: jims1

Learn more at: https://www.jhuapl.edu

Category:

more less

Transcript and Presenter's Notes

Title: CHAPTER 10 EVOLUTIONARY COMPUTATION II: GENERAL METHODS AND THEORY

1
CHAPTER 10EVOLUTIONARY COMPUTATION II GENERAL
METHODS AND THEORY
Slides for Introduction to Stochastic Search and
Optimization (ISSO) by J. C. Spall

Organization of chapter in ISSO
Introduction
Evolution strategy and evolutionary programming
comparisons with GAs
Schema theory for GAs
What makes a problem hard?
Convergence theory
No free lunch theorems

2
Methods of EC

Genetic algorithms (GAs), evolution strategy
(ES), and evolutionary programming (EP) are most
common EC methods
Many modern EC implementations borrow aspects
from one or more EC methods
Generally ES generally for function
optimization EP for AI applications such as
automatic programming

3
ES Algorithm with Noise-Free Loss Measurements

Step 0 (initialization) Randomly or
deterministically generate initial population of
N values of ? ? ? and evaluate L for each of the
values.
Step 1 (offspring) Generate ? offspring from
current population of N candidate ? values such
that all ? values satisfy direct or indirect
constraints on ?.
Step 2 (selection) For (N???)-ES, select N best
values from combined population of N original
values plus ? offspring for (N,??)-ES, select N
best values from population of ? gt N offspring
only.
Step 3 (repeat or terminate) Repeat steps 1 and 2
or terminate.

4
Schema Theory for GAs

Key innovation in Holland (1975) is a form of
theoretical foundation for GAs based on schemas
Represents first attempt at serious theoretical
analysis
But not entirely successful, as leap of faith
required to relate schema theory to actual
convergence of GA
GAs work by discovering, emphasizing, and
recombining good building blocks of solutions
in a highly parallel fashion. (Melanie Mitchell,
An Introduction to Genetic Algorithms p. 27,
1996, paraphrasing John Holland)
Statement above more intuitive than formal
Notion of building block is characterized via
schemas
Schemas are propagated or destroyed according to
the laws of probability

5
Schema Theory for GAs

Schema is template for chromosomes in GAs
Example 1 0 1, where the symbol
represents a dont care (or free) element
1?1?0?0?1?1?0?1 is specific instance of this
schema
Schemas sometimes called building blocks of GAs
Two fundamental results Schema theorem and
implicit parallelism
Schema theorem says that better templates
dominate the population as generations proceed
Implicit parallelism says that GA processes gtgt N
schemas at each iteration
Schema theory is controversial
Not connected to algorithm performance in same
direct way as usual convergence theory for
iterates of algorithm

6
Convergence Theory via Markov Chains

Schema theory inadequate
Mathematics behind schema theory not fully
rigorous
Unjustified claims about implications of schema
theory
More rigorous convergence theory exists
Pertains to noise-free loss (fitness)
measurements
Pertains to finite representation (e.g., bit
coding or floating point representation on
digital computer)
Convergence theory relies on Markov chains
Each state in chain represents possible
population
Markov transition matrix P contains all
information for Markov chain analysis

7
GA Markov Chain Model

GAs with binary bit coding can be modeled as
(discrete state) Markov chains
Recall states in chain represent possible
populations
i?th element of probability vector pk represents
probability of achieving i?th population at
iteration k
Transition matrix The i, j element of P
represents the probability of population i
producing population j through the selection,
crossover and mutation operations
Depends on loss (fitness) function, selection
method, and reproduction and mutation parameters
Given transition matrix P, it is known that

8
Rudolph (1994) and Markov Chain Analysis for
Canonical GA

Rudolph (1994, IEEE Trans. Neural Nets.) uses
Markov chain analysis to study canonical GA
(CGA)
CGA includes binary bit coding, crossover,
mutation, and roulette wheel selection
CGA is focus of seminal book, Holland (1975)
CGA does not include elitism?lack of elitism is
critical aspect of theoretical analysis
CGA assumes mutation probability 0 lt Pm lt 1 and
single-point crossover probability 0 ? Pc ? 1
Key preliminary result CGA is ergodic Markov
chain
Exists a unique limiting distribution for the
states of chain
Nonzero probability of being in any state
regardless of initial condition

9
Rudolph (1994) and Markov Chain Analysis for CGA
(contd)

Ergodicity for CGA provides a negative result on
convergence in Rudolph (1994)
Let denote lowest of N ( population
size) loss values within population at iteration
k
represents loss value for ? in
population k that has maximum fitness value
Main theorem CGA satisfies
(above limit on left-hand side exists by
ergodicity)
Implies CGA does not converge to the global
optimum

10
Rudolph (1994) and Markov Chain Analysis for CGA
(contd)

Fundamental problem with CGA is that optimal
solutions are found but then lost
CGA has no mechanism for retaining optimal
solution
Rudolph discusses modification to CGA yielding
positive convergence results
Appends super individual to each population
Super individual represents best chromosome so
far
Not eligible for GA operations (selection,
crossover, mutation)
Not same as elitism
CGA with added super individual converges in
probability

11
Contrast of Suzuki (1995) and Rudolph (1994) in
Markov Chain Analysis for GA

Suzuki (1995, IEEE Trans. Systems, Man, and
Cyber.) uses Markov chain analysis to study GA
with elitism
Same as CGA of Rudolph (1994) except for elitism
Suzuki (1995) only considers unique states
(populations)
Rudolph (1994) includes redundant states
With N population size and B no. of
bits/chromosome
unique states in Suzuki (1995),
2NB states in Rudolph (1994) (much larger than
number of unique states above)
Above affects bookkeeping does not fundamentally
change relative results of Suzuki (1995) and
Rudolph (1994)

12
Convergence Under Elitism

In both CGA case (Rudolph, 1994) and case with
elitism (Suzuki, 1995) the limit exists
(dimension of differs according to
definition of states, unique or nonunique as on
previous slide)
Suzuki (1995) assumes each population includes
one elite element and that crossover probability
Pc 1
Let represent j?th element of , and J
represent indices j where population j includes
chromosome achieving L(??)
Then from Suzuki (1995)
Implies GA with elitism converges in probability
to set of optima

13
Calculation of Stationary Distribution

Markov chain theory provides useful conceptual
device
Practical calculation difficult due to explosive
growth of number of possible populations (states)
Growth is in terms of factorials of N and bit
string length (B)
Practical calculation of pk usually impossible
due to difficulty in getting P
Transition matrix can be very large in practice
E.g., if N B 6, P is 108??108 matrix!
Real problems have N and B much larger than 6
Ongoing work attempts to severely reduce
dimension by limiting states to only most
important (e.g., Spears, 1999 Moey and Rowe,
2004)

14
Example 10.2 from ISSO Markov Chain Calculations
for Small-Scale Implementation

Consider L(?) ? ? ?
0,?15
Function has local and global minimum plot on
next slide
Several GA implementations with very small
population sizes (N) and numbers of bits (B)
Small scale implementations imply Markov
transition matrices are computable
But still not trivial, as matrix dimensions
range from approximately 2000?2000 to 4000?4000

15
Loss Function for Example 10.2 in ISSOMarkov
chain theory provides probability of finding
solution (?? 15) in given number of iterations
16
Example 10.2 (contd) Probability Calculations
for Very Small-Scale GAs
17
Summary of GA Convergence Theory

Schema theory (Holland, 1975) was most popular
method for theoretical analysis until
approximately mid-1990s
Schema theory not fully rigorous and not fully
connected to actual algorithm performance
Markov chain theory provides more formal means of
convergenceand convergence rateanalysis
Rudolph (1994) used Markov chains to provide
largely negative result on convergence for
canonical GAs
Canonical GA does not converge to optimum
Suzuki (1995) considered GAs with elitism unlike
Rudolph (1994), GA is now convergent
Challenges exist in practical calculation of
Markov transition matrix

18
No Free Lunch Theorems (Reprise, Chap. 1)

No free lunch (NFL) Theorems apply to EC
algorithms
Theorems imply there can be no universally
efficient EC algorithm
Performance of one algorithm when averaged over
all problems is identical to that of any other
algorithm
Suppose EC algorithm A applied to loss L
Let denote lowest loss value from most
recent N population elements after n ? N unique
function evaluations
Consider the probability that after n
unique evaluations of the loss

NFL theorems state that the sum of above
probabilities over all loss functions is
independent of A
19
Comparison of Algorithms for Stochastic
Optimization in Chaps. 2 10 of ISSO

Table next slide is rough summary of relative
merits of several algorithms for stochastic
optimization
Comparisons based on semi-subjective impressions
from numerical experience (author and others) and
theoretical or analytical evidence
NFL theorems not generally relevant as only
considering typical problems of interest, not
all possible problems
Table does not consider root-finding per se
Table is for basic implementation forms of
algorithms
Ratings range from L (low), ML (medium-low), M
(medium), MH (medium?high), and H (high)
These scales are for stochastic optimization
setting and have no meaning relative to classical
deterministic methods