CHAPTER 2 DIRECT METHODS FOR STOCHASTIC SEARCH - PowerPoint PPT Presentation

About This Presentation

Title:

CHAPTER 2 DIRECT METHODS FOR STOCHASTIC SEARCH

Description:

Step 1 (Reflection) Identify where max, second highest, and min loss values ... Step 2a (Accept reflection) If L( min) L( refl) L( 2max), then refl replaces ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 18

Provided by: Moor97

Learn more at: https://www.jhuapl.edu

Category:

more less

Transcript and Presenter's Notes

Title: CHAPTER 2 DIRECT METHODS FOR STOCHASTIC SEARCH

1
CHAPTER 2 DIRECT METHODS FOR STOCHASTIC SEARCH
Slides for Introduction to Stochastic Search and
Optimization (ISSO) by J. C. Spall

Organization of chapter in ISSO
Introductory material
Random search methods
Attributes of random search
Blind random search (algorithm A)
Two localized random search methods (algorithms B
and C)
Random search with noisy measurements
Nonlinear simplex (Nelder-Mead) algorithm
Noise-free and noisy measurements

2
Some Attributes of Direct Random Search with
Noise-Free Loss Measurements

Ease of programming
Use of only L values (vs. gradient values)
Avoid artful contrivance of more complex
methods
Reasonable computational efficiency
Generality
Algorithms apply to virtually any function
Theoretical foundation
Performance guarantees, sometimes in finite
samples
Global convergence in some cases

3
Algorithm A Simple Random (Blind) Search

Step 0 (initialization) Choose an initial value
of ? inside of ?. Set k 0.
Step 1 (candidate value) Generate a new
independent value ?new(k??1) ? ?, according to
the chosen probability distribution. If
L(?new(k??1)) lt set
?new(k??1). Else take .
Step 2 (return or stop) Stop if maximum number of
L evaluations has been reached or user is
otherwise satisfied with the current estimate for
? else, return to step 1 with the new k set to
the former k??1.

4
First Several Iterations of Algorithm A on
Problem with Solution ?? 1.0, 1.0T (Example
2.1 in ISSO)
5
Algorithm B Localized Random Search

Step 0 (initialization) Choose an initial value
of ? inside of ?. Set k 0.
Step 1 (candidate value) Generate a random dk.
Check if ??. If not, generate new dk or
move to nearest valid point. Let
?new(k??1) ? ? be or the modified
point.
Step 2 (check for improvement) If L(?new(k??1))
lt set ?new(k??1). Else
take .
Step 3 (return or stop) Stop if maximum number of
L evaluations has been reached or if user
satisfied with current estimate else, return to
step 1 with new k set to former k??1.

6
Algorithm C Enhanced Localized Random Search

Similar to algorithm B
Exploits knowledge of good/bad directions
If move in one direction produces decrease in
loss, add bias to next iteration to continue
algorithm moving in good direction
If move in one direction produces increase in
loss, add bias to next iteration to move
algorithm in opposite way
Slightly more complex implementation than
algorithm B

7
Formal Convergence of Random Search Algorithms

Well-known results on convergence of random
search
Applies to convergence of ? and/or L values
Applies when noise-free L measurements used in
algorithms
Algorithm A (blind random search) converges under
very general conditions
Applies to continuous or discrete functions
Conditions for convergence of algorithms B and C
somewhat more restrictive, but still quite
general
ISSO presents theorem for continuous functions
Other convergence results exist
Convergence rate theory also exists how fast to
converge?
Algorithm A generally slow in high-dimensional
problems

8
Functions for Convergence (Parts (a) and(b)) and
Nonconvergence (Part (c)) of Blind Random Search
(a) Continuous L(?) probability density for ?new
is gt 0 on ? 0, ?)
(b) Discrete L(?) discrete sampling for ?new
with P(?new i ) gt 0 for i 0, 1, 2,...
(c) Noncontinuous L(?) probability density for
?new is gt 0 on ? 0, ?)
??
9
Example Comparison of Algorithms A, B, and C

Relatively simple p 2 problem (Examples 2.3 and
2.4 in ISSO)
Quartic loss function (plot on next slide)
One global solution several local minima/maxima
Started all algorithms at common initial
condition and compared based on common number of
loss evaluations
Algorithm A needed no tuning
Algorithms B and C required trial runs to tune
algorithm coefficients

10
Multimodal Quartic Loss Function for p 2
Problem (Example 2.3 in ISSO)
11
Example 2.3 in ISSO (contd) Sample Means of
Terminal Values L(??) in Multimodal Loss
Function(with Approximate 95 Confidence
Intervals)
12
Examples 2.3 and 2.4 in ISSO (contd)Typical
Adjusted Loss Values ( L(??)) and ?
Estimates in Multimodal Loss Function (One Run)
13
Random Search Algorithms with Noisy Loss Function
Measurements

Basic implementation of random search assumes
perfect (noise-free) values of L
Some applications require use of noisy
measurements y(?) L(?) noise
Simplest modification is to form average of y
values at each iteration as approximation to L
Alternative modification is to set threshold ? gt
0 for improvement before new value is accepted in
algorithm
Thresholding in algorithm B with modified step 2
Step 2 (modified) If y(?new(k??1)) lt
set ?new(k??1). Else
take .
Very limited convergence theory with noisy
measurements

14
Nonlinear Simplex (Nelder-Mead) Algorithm

Nonlinear simplex method is popular search method
(e.g., fminsearch in MATLAB)
Simplex is convex hull of p 1 points in ?p
Convex hull is smallest convex set enclosing the
p 1 points
For p 2 ? convex hull is triangle
For p 3 ? convex hull is pyramid
Algorithm searches for ?? by moving convex hull
within ?
If algorithm works properly, convex hull
shrinks/collapses onto ??
No injected randomness (contrast with algorithms
A, B, and C), but allowance for noisy loss
measurements
Frequently effective, but no general convergence
theory and many numerical counterexamples to
convergence

15
Steps of Nonlinear Simplex Algorithm

Step 0 (Initialization) Generate initial set of p
1 extreme points in ?p, ?i (i 1, 2, , p
1), vertices of initial simplex
Step 1 (Reflection) Identify where max, second
highest, and min loss values occur denote them
by ?max, ?2max, and ?min, respectively. Let
?cent centroid (mean) of all ?i except for
?max. Generate candidate vertex ?refl by
reflecting ?max through ?cent using ?refl (1
?)?cent ? ? ?max (? gt 0).
Step 2a (Accept reflection) If L(?min) ? L(?refl)
lt L(?2max), then ?refl replaces ?max proceed to
step 3 else go to step 2b.
Step 2b (Expansion) If L(?refl) lt L(?min), then
expand reflection using ?exp ??refl (1 ?
?)?cent, ? gt 1 else go to step 2c. If L(?exp) lt
L(?refl), then ?exp replaces ?max otherwise
reject expansion and replace ?max by ?refl. Go
to step 3.

16
Steps of Nonlinear Simplex Algorithm (contd)

Step 2c (Contraction) If L(?refl) ? L(?2max),
then contract simplex Either case (i) L(?refl) lt
L(?max), or case (ii) L(?max) ? L(?refl).
Contraction point is ?cont ??max/refl (1 ?
?)?cent?, 0 ? ? ? 1, where ?max/refl ?refl if
case (i), otherwise ?max/refl ?max. In case
(i), accept contraction if L(?cont) ? L(?refl)
in case (ii), accept contraction if L(?cont) lt
L(?max). If accepted, replace ?max by ?cont and
go to step 3 otherwise go to step 2d.
Step 2d (Shrink) If L(?cont) ? L(?max), shrink
entire simplex using a factor 0 lt ? lt 1,
retaining only ?min. Go to step 3.
Step 3 (Termination) Stop if convergence
criterion or maximum number of function
evaluations is met else return to step 1.