Title: Stochastic%20Relaxation,%20Simulating%20Annealing,%20Global%20Minimizers
1Stochastic Relaxation,Simulating
Annealing,Global Minimizers
2Different types of relaxation
- Variable by variable relaxation strict
minimization - Changing a small subset of variables
simultaneously Window strict minimization
relaxation - Stochastic relaxation may increase the energy
should be followed by strict minimization
3Complex landscape of E(X)
4How to escape local minima?
- First go uphill, then may hit a lower basin
- In order to go uphill should allow increase in
E(x) - Add stochasticity allow E(x) to increase with
probability which is governed by an external
temperature-like parameter T - The Metropolis Algorithm (Kirpartick et al.
1983) - Assume xold is the current state, define xnew
to be a neighboring state and
delEE(xnew)-E(xold) then - If delElt0 replace xold by xnew
- else choose xnew with probability P(xnew)
- and xold with probability P(xold)1- P(xnew)
5The probability to accept an increasing energy
move
6The Metropolis Algorithm
- As T 0 and when delEgt0 P(xnew) 0
- At T0 strict minimization
- High T randomizes the configuration away from the
minimum - Low T cannot escape local minima
- Starting from a high T, the slower T is decreased
the lower E(x) is achieved - The slow reduction in T allows the material to
obtain a more arranged configuration increase
the size of its crystals and reduce their defects
7Fast cooling amorphous solid
8 Slow cooling - crystalline solid
9SA for the 2D Ising E-Sijsisj , i and j are
nearest neighbors
Eold-2
10SA for the 2D Ising E-Sijsisj , i and j are
nearest neighbors
Eold-2
Enew2
11SA for the 2D Ising E-Sijsisj , i and j are
nearest neighbors
Eold-2
Enew2
delEEnew- Eold4gt0 P(Enew)exp(-4/T)
12SA for the 2D Ising E-Sijsisj , i and j are
nearest neighbors
Eold-2
Enew2
delEEnew- Eold4gt0 P(Enew)exp(-4/T) 0.3 gt
T-4/ln0.3 3.3
Reduce T by a factor a, 0ltalt1 Tn1aTn
13Exc7 SA for the 2D Ising (see Exc1)
- Consider the following cases
- 1. For h1 h20 set a stripe of width 3,6 or 12
with opposite sign - 2. For h1-0.1, h20.4 set -1 at h1 and 1 at h2
- 3. Repeat 2. with 2 squares of 8x8 plus spins
with h20.4 located apart from each other - Calculate T0 to allow 10 flips of a spin
surrounded by 4 neighbors of the same sign - Use faster / slower cooling scheduling
- a. What was the starting T0 , E in each case
- b. How was T0 decreased, how many sweeps were
employed - c. What was the final configuration, was the
global minimum achievable? If not try different
T0 - d. Is it harder to flip a wider stripe?
- e. Is it harder to flip 2 squares than just one?
14SA for the bisectioning problem
R
R
i
15SA for the bisectioning problem individual
temperature
R
R
i
The probability of i to belong to R depends on
Si Sj in R aij / S aij
P(i in R)
1 delElt0 exp-delE/(TSi)
delEgt 0
16SA for the bisectioning problem individual
temperature
R
R
i
The probability of i to belong to R should
increase if a bigger change along the cut line is
made
If delE is small enough it is expected that
further moves will indeed eventually produce a
lower E
17SA for the bisectioning problem how to choose T
R
R
i
- Calculate delE/Si along the cut line and sort
them - Decide upon the of changes desired
- Find the appropriate T by demanding P()0.5
18SA for the linear ordering problems multiple
choices for a variable
- Try to move node i up to k moves to the right and
to the left
choose between the 2k1 possibilities - For j-k,..,-1,1,..,k , P(j)z min1 ,
exp(-delE(j)/T(j)) - For k0 P(0)z minj1 - P(j)/z
- z is calculated from the normalization Sj P(j)1
- T(j) is calculating apriori for each j aiming at
a certain acceptance rate (e.g. 60)
19The Metropolis Algorithm (cont.)
- May result in a very slow processing
- Still, SA is considered to be a powerful global
minimizer - Instead of very slow cooling schedule, repeat
heating-cooling several times
20Heating-cooling scheduling
T
relaxation sweeps
21The Metropolis Algorithm (cont.)
- May result in a very slow processing
- Still, SA is considered to be a powerful global
minimizer - Instead of very slow cooling schedule, repeat
heating-cooling several times and keep track of
the best-so-far configuration - The best-so-far has a non-increasing E
- It is an outside observer
- The best-so-far is actually the calculated
minimum
22Heating-cooling scheduling
T
relaxation sweeps
Store the best-so-far
23The Metropolis Algorithm (cont.)
- May result in a very slow processing
- Still, SA is considered to be a powerful global
minimizer - Instead of very slow cooling schedule, repeat
heating-cooling several times and keep track of
the best-so-far configuration - The best-so-far has a non-increasing E
- It is an outside observer
- The best-so-far is actually the calculated
minimum - Problem heating may destroy already achieved
minima in various subregions - Add memory of the best-so-far for those
subregions
24Lowest Common Configuration
The global minimum
25Lowest Common Configuration
C1
The global minimum
26Lowest Common Configuration
C1
C2
The global minimum
27Lowest Common Configuration
C1
C2
The global minimum
E(LCC(C1, C2))lt minE(C1), E(C2)
LCC(C1, C2)
28Heating-cooling scheduling
T
relaxation sweeps
Apply LCC
29Heating-cooling scheduling
T
relaxation sweeps
best-so-far ? LCC (best-so-far , the new T0)
30Exc8 LCC for the bisectioning problem
R
R
i
Given 2 partitions, find a linear time algorithm
for the construction of their LCC
31Exc8 LCC for linear ordering problems
- Find a (nearly) linear time algorithm
- (e.g. sorting is allowed) for the LCC of 2
- permutations, in which subpermutations are
- detected and chosen into the best-so-far
32Multilevel Simulated Annealing
- Do not increase T by much
avoid destroying the global solution inherited
from the coarser levels - Reduce T quickly typically 2-3 values of Tgt0
(followed by strict minimization) are sufficient - Repeat heating-cooling several times per level
- Accumulate the minimal solution into the
best-so-far by applying the LCC at the end of
T0 - Interpolate the best-so-far to the next level
33Genetic algorithmA global minimizer
34Genetic algorithm
- A global search technique inspired by
evolutionary biology - Start from a population of individuals (randomly
generated) this is the 1st generation - The next generation follows by
- 1. selection of individuals from the current
generation to breed the next generation according
to some fitness measure - 2. crossover (recombination) of pair of (randomly
chosen) parents to produce an offspring - 3. mutations are applied randomly to enhance the
diversity of the individuals in the generation
35A genetic algorithm for the linear arrangement
problem P1
- Initial population 1. select a starting vertex
2. built the permutation by the greedy frontal
increase minimization algorithm
36A genetic algorithm for the linear arrangement
problem P1
- Initial population 1. select a starting vertex
2. built the permutation by the greedy frontal
increase minimization algorithm
Fi
Choose a node
from Fi
The one which is mostly connected to the already
placed nodes
37A genetic algorithm for the linear arrangement
problem P1
- Initial population 1. select a starting vertex
2. built the permutation by the greedy frontal
increase minimization algorithm - Selection of survivals is based on the E(x)
- Recombinate two randomly chosen parents
Parent 1 5 7 2 3 8 6 9 1 4
Parent 2 6 1 2 4 3 5 9 8 7
38A genetic algorithm for the linear arrangement
problem P1
- Initial population 1. select a starting vertex
2. built the permutation by the greedy frontal
increase minimization algorithm - Selection of survivals is based on the E(x)
- Recombinate two randomly chosen parents
Parent 1 5 7 2 3 8 6 9 1 4
Parent 2 6 1 2 4 3 5 9 8 7
Offspring 2 9
39A genetic algorithm for the linear arrangement
problem P1
- Initial population 1. select a starting vertex
2. built the permutation by the greedy frontal
increase minimization algorithm - Selection of survivals is based on the E(x)
- Recombinate two randomly chosen parents
Parent 1 5 7 2 3 8 6 9 1 4
Parent 2 6 1 2 4 3 5 9 8 7
Offspring 2 9
_ 3 2 6 _ 7 9 5 _
40A genetic algorithm for the linear arrangement
problem P1
- Initial population 1. select a starting vertex
2. built the permutation by the greedy frontal
increase minimization algorithm - Selection of survivals is based on the E(x)
- Recombinate two randomly chosen parents
Parent 1 5 7 2 3 8 6 9 1 4
Parent 2 6 1 2 4 3 5 9 8 7
Offspring 2 9
_ 3 2 6 _ 7 9 5 _
_ 3 2 6 8 7 9 5 4
41A genetic algorithm for the linear arrangement
problem P1
- Initial population 1. select a starting vertex
2. built the permutation by the greedy frontal
increase minimization algorithm - Selection of survivals is based on the E(x)
- Recombinate two randomly chosen parents
Parent 1 5 7 2 3 8 6 9 1 4
Parent 2 6 1 2 4 3 5 9 8 7
Offspring 2 9
_ 3 2 6 _ 7 9 5 _
_ 3 2 6 8 7 9 5 4
1 3 2 6 8 7 9 5 4
42A genetic algorithm for the linear arrangement
problem P1
- Initial population 1. select a starting vertex
2. built the permutation by the greedy frontal
increase minimization algorithm - Selection of survivals is based on the E(x)
- The next generation is constructed by
- 1. Recombinations of 2 randomly chosen parents
- 2. Improving the E(x) of the offspring by local
processing, e.g. by Simulated Annealing - 3. Choose the best individuals from the pool of
parents and children
43Spectral Sequencing A global minimizer
44Spectral Sequencing a global minimizer
- Given a weighted graph where wij is the edge
weight between the nodes i and j - Define the graph Laplacian A to be
- aij -wij
- aii Sjwij
- A is symmetric semipositive definite
- Consider the eigenvalue problem Axlx
- Arranging the nodes of the graph according to the
eigenvector associated with the 2nd smallest
eigenvalue has been shown by Hall (1970) to be
the solution to the problem
min Sjwij(xi - xj)2 for real variables x
45Spectral Sequencing a global minimizer
- SS has been used extensively to solve a large
variety of ordering problems - Linear ordering problem P1,2,
- Partitioning problems
- Embedding to lower dimensions, etc.
- To calculate the eigenvectors use multilevel
- The direct use of multilevel to solve the
original problem produces better results than
using the ordering dictated by SS
46P2 Multilevel approach vs. Spectral method
ratio
graphs
The results of the multilevel approach were
obtained without post-processing!
Ilya Safro, Dorit Ron, A. Brandt J. Graph Alg.
Appl. 10 (2006) 237-258