Title: This time
1This time
- Iterative improvement
- Hill climbing
- Simulated annealing
- Genetic Algorithms
2Iterative improvement
- In many optimization problems, path is
irrelevant - the goal state itself is the solution.
- Then, state space space of complete
configurations. - Algorithm goal
- - find optimal configuration (e.g., TSP), or,
- - find configuration satisfying constraints
- (e.g., n-queens)
- In such cases, can use iterative improvement
algorithms keep a single current state, and
try to improve it.
3Iterative improvement example vacuum world
Simplified world 2 locations, each may or not
contain dirt, each may or not contain vacuuming
agent. Goal of agent clean up the dirt. If path
does not matter, do not need to keep track of it.
4Iterative improvement example n-queens
- Goal Put n chess-game queens on an n x n board,
with no two queens on the same row, column, or
diagonal. - Here, goal state is initially unknown but is
specified by constraints that it must satisfy.
5Hill climbing (or gradient ascent/descent)
- Iteratively maximize value of current state, by
replacing it by successor state that has highest
value, as long as possible.
6Question What is the difference between this
problem and our problem (finding global minima)?
7Hill climbing
- Note minimizing a value function v(n) is
equivalent to maximizing v(n), - thus both notions are used interchangeably.
- Notion of extremization find extrema (minima
or maxima) of a value function.
8Hill climbing
- Problem depending on initial state, may get
stuck in local extremum.
9Minimizing energy
- Lets now change the formulation of the problem a
bit, so that we can employ new formalism - - lets compare our state space to that of a
physical system that is subject to natural
interactions, - - and lets compare our value function to the
overall potential energy E of the system. - On every updating,
- we have DE ? 0
10Minimizing energy
- Hence the dynamics of the system tend to move E
toward a minimum. - We stress that there may be different such states
they are local minima. Global minimization is
not guaranteed.
11Local Minima Problem
- Question How do you avoid this local minimum?
barrier to local search
starting point
descend direction
local minimum
global minimum
12Consequences of the Occasional Ascents
desired effect
Help escaping the local optima.
adverse effect
(easy to avoid by keeping track of best-ever
state)
Might pass global optima after reaching it
13Boltzmann machines
- The Boltzmann Machine of
- Hinton, Sejnowski, and Ackley (1984)
- uses simulated annealing to escape local minima.
- To motivate their solution, consider how one
might get a ball-bearing traveling along the
curve to "probably end up" in the deepest
minimum. The idea is to shake the box "about h
hard" then the ball is more likely to go from
D to C than from C to D. So, on average, the
ball should end up in C's valley.
14Simulated annealing basic idea
- From current state, pick a random successor
state - If it has better value than current state, then
accept the transition, that is, use successor
state as current state - Otherwise, do not give up, but instead flip a
coin and accept the transition with a given
probability (that is lower as the successor is
worse). - So we accept to sometimes un-optimize the value
function a little with a non-zero probability.
15Boltzmanns statistical theory of gases
- In the statistical theory of gases, the gas is
described not by a deterministic dynamics, but
rather by the probability that it will be in
different states. - The 19th century physicist Ludwig Boltzmann
developed a theory that included a probability
distribution of temperature (i.e., every small
region of the gas had the same kinetic energy). - Hinton, Sejnowski and Ackleys idea was that this
distribution might also be used to describe
neural interactions, where low temperature T is
replaced by a small noise term T (the neural
analog of random thermal motion of molecules).
While their results primarily concern
optimization using neural networks, the idea is
more general.
16Boltzmann distribution
- At thermal equilibrium at temperature T, the
- Boltzmann distribution gives the relative
- probability that the system will occupy state A
vs. - state B as
- where E(A) and E(B) are the energies associated
with states A and B.
17Simulated annealing
- Kirkpatrick et al. 1983
- Simulated annealing is a general method for
making likely the escape from local minima by
allowing jumps to higher energy states. - The analogy here is with the process of annealing
used by a craftsman in forging a sword from an
alloy. - He heats the metal, then slowly cools it as he
hammers the blade into shape. - If he cools the blade too quickly the metal will
form patches of different composition - If the metal is cooled slowly while it is shaped,
the constituent metals will form a uniform alloy.
18Real annealing Sword
- He heats the metal, then slowly cools it as he
hammers the blade into shape. - If he cools the blade too quickly the metal will
form patches of different composition - If the metal is cooled slowly while it is shaped,
the constituent metals will form a uniform alloy.
19Simulated annealing in practice
- set T
- optimize for given T
- lower T (see Geman Geman, 1984)
- repeat
20Simulated annealing in practice
- set T
- optimize for given T
- lower T
- repeat
MDSA Molecular Dynamics Simulated Annealing
21Simulated annealing in practice
- set T
- optimize for given T
- lower T (see Geman Geman, 1984)
- repeat
- Geman Geman (1984) if T is lowered
sufficiently slowly (with respect to the number
of iterations used to optimize at a given T),
simulated annealing is guaranteed to find the
global minimum. - Caveat this algorithm has no end (Geman
Gemans T decrease schedule is in the 1/log of
the number of iterations, so, T will never reach
zero), so it may take an infinite amount of time
for it to find the global minimum.
22Simulated annealing algorithm
- Idea Escape local extrema by allowing bad
moves, but gradually decrease their size and
frequency.
Note goal here is to maximize E.
-
23Simulated annealing algorithm
- Idea Escape local extrema by allowing bad
moves, but gradually decrease their size and
frequency.
Algorithm when goal is to minimize E.
-
lt
-
24Note on simulated annealing limit cases
- Boltzmann distribution accept bad move with
?Elt0 (goal is to maximize E) with probability
P(?E) exp(?E/T) - If T is large ?E lt 0
- ?E/T lt 0 and very small
- exp(?E/T) close to 1
- accept bad move with high probability
- If T is near 0 ?E lt 0
- ?E/T lt 0 and very large
- exp(?E/T) close to 0
- accept bad move with low probability
25Note on simulated annealing limit cases
- Boltzmann distribution accept bad move with
?Elt0 (goal is to maximize E) with probability
P(?E) exp(?E/T) - If T is large ?E lt 0
- ?E/T lt 0 and very small
- exp(?E/T) close to 1
- accept bad move with high probability
- If T is near 0 ?E lt 0
- ?E/T lt 0 and very large
- exp(?E/T) close to 0
- accept bad move with low probability
Random walk
Deterministic down-hill
26Summary
- Best-first search general search, where the
minimum-cost nodes (according to some measure)
are expanded first. - Greedy search best-first with the estimated
cost to reach the goal as a heuristic measure. - - Generally faster than uninformed search
- - not optimal
- - not complete.
- A search best-first with measure path cost
so far estimated path cost to goal. - - combines advantages of uniform-cost and
greedy searches - - complete, optimal and optimally efficient
- - space complexity still exponential
27Genetic Algorithms
28The Traditional Approach
- Ask an expert
- Adapt existing designs
- Trial and error
29Natures Starting Point
Alison Everitts A Users Guide to Men
30Optimised Man!
31Example Pursuit and Evasion
- Using NNs and Genetic algorithm
- 0 learning
- 200 tries
- 999 tries
32Comparisons
- Traditional
- best guess
- may lead to local, not global optimum
- Nature
- population of guesses
- more likely to find a better solution
33More Comparisons
- Nature
- not very efficient
- at least a 20 year wait between generations
- not all mating combinations possible
- Genetic algorithm
- efficient and fast
- optimization complete in a matter of minutes
- mating combinations governed only by fitness
34The Genetic Algorithm Approach
- Define limits of variable parameters
- Generate a random population of designs
- Assess fitness of designs
- Mate selection
- Crossover
- Mutation
- Reassess fitness of new population
35A Population
36Ranking by Fitness
37Mate Selection Fittest are copied and replaced
less-fit
38Mate Selection RouletteIncreasing the
likelihood but not guaranteeing the fittest
reproduction
39CrossoverExchanging information through some
part of information (representation)
40Mutation Random change of binary digits from 0
to 1 and vice versa (to avoid local minima)
41Best Design
42The GA Cycle
43The Process
44Genetic Algorithms
- Adv
- Good to find a region of solution including the
optimal solution. But slow in giving the optimal
solution
45Genetic Approach
- When applied to strings of genes, the approaches
are classified as genetic algorithms (GA) - When applied to pieces of executable programs,
the approaches are classified as genetic
programming (GP) - GP operates at a higher level of abstraction than
GA
46Typical Chromosome
47Summary
- Time complexity of heuristic algorithms depend on
quality of heuristic function. Good heuristics
can sometimes be constructed by examining the
problem definition or by generalizing from
experience with the problem class. - Iterative improvement algorithms keep only a
single state in memory. - Can get stuck in local extrema simulated
annealing provides a way to escape local extrema,
and is complete and optimal given a slow enough
cooling schedule.