Title: Discrete Optimization
1Discrete Optimization
- Last Time
- Parallel Machine Scheduling
- Lagrange Relaxation for Flow Shop Scheduling
- Today
- Local Search
- Meta-Heuristic Search
- Specific Search Strategies
- Simulated Annealing
- Stochastic Machines
- Convergence Analysis
2009/11/21
Shi-Chung Chang, NTUEE, GIIE, GICE, Spring, 2008
2- Reading Assignments
- S. Kirkpatrick and C. D. Gelatt and M. P.
Vecchi, Optimization by Simulated Annealing,
Science, Vol 220, Number 4598, pages 671-680,
1983. - "General Simulated Annealing Algorithm" An
open-source MATLAB program for general simulated
annealing exercises - http//www.mathworks.com/matlabcentral/fileexchang
e/loadFile.do?objectId10548objectTypefile
2009/11/21
Shi-Chung Chang, NTUEE, GIIE, GICE
3Combinatorial Optimization
- Combinatorial Optimization Problem
- Consists of (R, C), where
- R is a set of configuration
- C R ? ?, is a cost function
- Given (R, C), find s?R, such that
- C(s) mins?R C(s)
Example 1 Travelling Salesman Problem (TSP)
Given n cities, and distance matrix dij To
find shortest tour of n cities (visit each city
exactly once) R all cyclic
permutations ? of the n cities
4Combinatorial Optimization
Example 2 Minimum Spanning Tree Problem (MST)
Given G (V, E), and symmetric distance matrix
dij To find spanning tree T of G with
minimum total edge cost R T T (V,
E) is a spanning tree of G
Example 3 Linear Programming (LP)
5Some Common Types of COP
- TSP, Quadratic Assignment Problem
- Bin Packing and Generalised Assignment Problems
- Hub allocation problems
- Graph Colouring Partitioning
- Vehicle Routing
- Single Multiple Knapsack
- Set Partitioning Set Covering Problems
- Processor Allocation Problem
- Various Staff Scheduling Problems
- Job Shop Scheduling
6Outline
- Today
- Local Search
- Meta-Heuristic Search
- Specific Search Strategies
- Simulated Annealing
- Stochastic Machines
- Convergence Analysis
2009/11/21
Shi-Chung Chang, NTUEE, GIIE, GICE, Spring, 2008
7Techniques for Tackling COPs
- COPs often formulated as Integer Linear Programs
(ILPs) - But far too slow to solve them this way
- Operations Research
- Branch Bound, Cutting Plane Algorithms
- A.I. (excluding neural nets)
- A, Constraint Logic Programming
- All these techniques can guarantee an optimal
solution, but suffer from exponential runtime in
the worst case
8Techniques for Tackling COPs
- Heuristics solutions (do not guarantee optimal)
- Tailored heuristics
- e.g. Lin Kernigan for TSP, GP
- In general run very fast, find good solutions
- But cannot be applied to other problems
- Meta-heuristic search algorithms (MHs)
- Guide the local search for new (improved)
solutions - Hill climbing is a kind of local search,
- may form part of a metaheuristic
9Neighborhood Search Algorithms
- Start with a feasible solution x
- Define a neighborhood of x
- Identify an improved neighbor y
- Replace x by y and repeat
..
10Generic Neighbourhood Search
Generic Neighbourhood_Search begin s ?
initial solution s0 repeat choose s?
N(s) s ? s until (Terminating
condition) end
11Example Map Coloring (1)
- Start with random coloring of nodes
- Change color of one node to reduce of con
- Repeat 2
12Example Graph Colouring (2)
conflicts unchanged, but different solution.
13Neighbourhood Search (history)
- Neighbourhood Search dates back a long way
- Simplex Algorithm for LP
- Viewed as a search technique
- KL / FM algorithm for Graph Partitioning
- Augmenting path algorithms
- maximum flow in networks
- bipartite matching
- non-bipartite matching
- Iteratively improves the current solution.
14Shape of the Cost Function (1-D)
15Shape of the Cost Function (2-D)
- Multiple Local Optima
- Plateaus
- Ridges going up only in a narrow direction.
-
16Outline
- Today
- Local Search
- Meta-Heuristic Search
- Specific Search Strategies
- Simulated Annealing
- Stochastic Machines
- Convergence Analysis
2009/11/21
Shi-Chung Chang, NTUEE, GIIE, GICE, Spring, 2008
17Meta-Heuristic Search
- Design Issues
- Solution Representation
- Initial Solution (where to start)
- Neighbourhood (strength size)
- Cost Function (viz-a-viz objective function)
- Search strategy (how to move)
- Refinements
- Termination (when to stop)
18Local Search
- A local search procedure looks for the best
solution that is near another solution - by repeatedly making local changes to current
soln - until no further improved solutions can be found
- Local search procedures are generally problem
specific - Most MHs use some kind of local search procedure
to actually perform the search
19Dont Settle for Second Best
- MHs are generally attracted to good solutions
local optima - Simple MHs find it difficult or impossible to
escape these attractive points in the search
space - All of the popular MHs employed today have some
mechanism(s) for escaping local optima - These will be discussed for each MH we cover today
20Ways of Getting Around(in Search Space)
- MHs can be roughly categorized into
- Iterative
- Population-based
- Constructive
- Of course, in an attempt to prove by exhaustive
search that there really are an infinite number
of research papers - Researches have combined these broad approaches
in many and varied ways
21Popular Iterative MHs
- Iterative MHs start with one or more feasible
solutions - Apply transitions to reach new solutions (i.e.
local search) - Transitions include move, swap, add, drop, invert
plus others - The neighbourhood ( N(X) ) of a solution consists
of those solutions reachable by applying
(generally) one transition - Simulated Annealing
- Tabu Search
22Outline
- Today
- Local Search
- Meta-Heuristic Search
- Specific Search Strategies
- Simulated Annealing
- Stochastic Machines
- Convergence Analysis
2009/11/21
Shi-Chung Chang, NTUEE, GIIE, GICE, Spring, 2008
23Search Strategies
- Uninformed Use only information available in the
problem formulation - Breadth-first
- Uniform-cost
- Depth-first
- Depth-limited
- Iterative deepening
- Informed Use heuristics to guide the search
- Best first
- A
CS 561, Session 7
23
24Evaluation of search strategies
- Search algorithms are commonly evaluated
according to the following four criteria - Completeness does it always find a solution if
one exists? - Time complexity how long does it take as a
function of number of nodes? - Space complexity how much memory does it
require? - Optimality does it guarantee the least-cost
solution? - Time and space complexity are measured in terms
of - b max branching factor of the search tree
- d depth of the least-cost solution
- m max depth of the search tree (may be
infinity)
CS 561, Session 7
24
25Informed search
- Use heuristics to guide the search
- Best first
- A
- Heuristics
- Hill-climbing
- Simulated annealing
CS 561, Session 7
25
26Best-first search
- Idea
- use an evaluation function for each node
estimate of desirability - expand most desirable unexpanded node.
- Implementation
- QueueingFn insert successors in decreasing
order of desirability - Special cases
- greedy search
- A search
CS 561, Session 7
26
27Romania with step costs in km
CS 561, Session 7
27
28Greedy search
- Estimation function
- h(n) estimate of cost from n to goal
(heuristic) - For example
- hSLD(n) straight-line distance from n to
Bucharest - Greedy search expands first the node that appears
to be closest to the goal, according to h(n).
CS 561, Session 7
28
29CS 561, Session 7
29
30CS 561, Session 7
30
31CS 561, Session 7
31
32CS 561, Session 7
32
33Properties of Greedy Search
- Complete? Does it always give a solution if one
exists? - Time? How long does it take?
- Space? How much memory is needed?
- Optimal? Does it give the optimal path?
CS 561, Session 7
33
34A search
- Idea combine greedy approach and avoid expanding
paths that are already expensive - evaluation function f(n) g(n) h(n) with
- g(n) cost so far to reach n
- h(n) estimated cost to goal from n
- f(n) estimated total cost of path through n
to goal - A search uses an admissible heuristic, that is,
- h(n) ? h(n) where h(n) is the true cost from
n. - For example hSLD(n) never overestimates actual
road distance. - Theorem A search is optimal
CS 561, Session 7
34
35CS 561, Session 7
35
36CS 561, Session 7
36
37CS 561, Session 7
37
38CS 561, Session 7
38
39CS 561, Session 7
39
40CS 561, Session 7
40
41Optimality of A (standard proof)
- Suppose some suboptimal goal G2 has been
generated and is in the queue. Let n be an
unexpanded node on a shortest path to an optimal
goal G1.
(since g(G1)gtf(n))
CS 561, Session 7
41
42Optimality of A (more useful proof)
CS 561, Session 7
42
43f-contours
How do the contours look like when h(n) 0?
CS 561, Session 7
43
44Properties of A
- Complete?
- Time?
- Space?
- Optimal?
CS 561, Session 7
44
45Proof of lemma pathmax
CS 561, Session 7
45
46Admissible heuristics
CS 561, Session 7
46
47Relaxed Problem
- How to determine an admissible heuristics?
- E.g. h1 and h2 in the 8-puzzle problem
- Admissible heuristics can be derived from the
exact solution cost of a relaxed version of the
problem. - Example
- A tile can move from square A to square B
- If A is adjacent to B and
- If B in blank
- Possible relaxed problems
- A tile can move from square A to square B if A is
adjacent to B - A tile can move from square A to square B if B is
blank - A tile can move from square A to square B
CS 561, Session 7
47
48Next
- Iterative improvement
- Hill climbing
- Simulated annealing
CS 561, Session 7
48
49Iterative improvement
- In many optimization problems, path is
irrelevant - the goal state itself is the solution.
- Then, state space space of complete
configurations. - Algorithm goal
- - find optimal configuration (e.g., TSP), or,
- - find configuration satisfying constraints
(e.g., n-queens) - In such cases, can use iterative improvement
algorithms keep a single current state, and
try to improve it.
CS 561, Session 7
49
50Iterative improvement example n-queens
- Goal Put n chess-game queens on an n x n board,
with no two queens on the same row, column, or
diagonal. - Here, goal state is initially unknown but is
specified by constraints that it must satisfy.
CS 561, Session 7
50
51Hill climbing (or gradient ascent/descent)
- Iteratively maximize value of current state, by
replacing it by successor state that has highest
value, as long as possible.
CS 561, Session 7
51
52Hill climbing
- Note minimizing a value function v(n) is
equivalent to maximizing v(n), - thus both notions are used interchangeably.
- Notion of extremization find extrema (minima
or maxima) of a value function.
CS 561, Session 7
52
53Hill climbing
- Problem depending on initial state, may get
stuck in local extremum. - Any suggestion?
Does it matter if you start from left or from
right?
CS 561, Session 7
53
54Outline
- Today
- Local Search
- Meta-Heuristic Search
- Specific Search Strategies
- Simulated Annealing
- Stochastic Machines
- Convergence Analysis
2009/11/21
Shi-Chung Chang, NTUEE, GIIE, GICE, Spring, 2008
55Minimizing energy
- new formalism
- - lets compare our state space to that of a
physical system that is subject to natural
interactions, - - and lets compare our value function to the
overall potential energy E of the system. - On every updating we have DE ? 0
CS 561, Session 7
55
56Minimizing energy
- On every updating we have DE ? 0
- Hence the dynamics of the system tend to move E
toward a minimum. - Note There may be different such local minma.
- Global minimization is not guaranteed.
CS 561, Session 7
56
57Local Minima Problem
- Question How do you avoid this local minima?
barrier to local search
starting point
descend direction
local minima
global minima
CS 561, Session 7
57
58Consequences of the Occasional Ascents
desired effect
Help escaping the local optima.
adverse effect
Might pass global optima after reaching it
CS 561, Session 7
58
59Boltzmann machines
- The Boltzmann Machine of
- Hinton, Sejnowski, and Ackley (1984)
- uses simulated annealing to escape local minima.
- consider how one might get a ball-bearing
traveling along the curve to "probably end up" in
the deepest minimum. - The idea is to shake the box "about h hard"
then the ball is more likely to go from D to C
than from C to D. - So, on average, the ball should end up in C's
valley.
CS 561, Session 7
59
60Question What is the difference between this
problem and our problem (finding global minima)?
CS 561, Session 7
60
61Simulated annealing basic idea
- From current state, pick a random successor
state - If it has better value than current state, then
accept the transition, - that is, use successor state as current state
- Otherwise, instead flip a coin and accept the
transition with a given probability (that is
lower as the successor is worse). - So we accept to sometimes un-optimize the value
function a little with a non-zero probability.
CS 561, Session 7
61
62Boltzmanns statistical theory of gases
- In the statistical theory of gases, the gas is
described not by a deterministic dynamics, but
rather by the probability that it will be in
different states. - The 19th century physicist Ludwig Boltzmann
developed a theory that included a probability
distribution of temperature (i.e., every small
region of the gas had the same kinetic energy).
CS 561, Session 7
62
63Boltzmann distribution
- At thermal equilibrium at temperature T, the
- Boltzmann distribution gives the relative
- probability that the system will occupy state A
vs. state B as - where E(A) and E(B) are the energies associated
with states A and B.
CS 561, Session 7
63
64Simulated annealing
- Kirkpatrick et al. 1983
- Simulated annealing is a general method for
making likely the escape from local minima by
allowing jumps to higher energy states. - The analogy here is with the process of annealing
used by a craftsman in forging a sword from an
alloy.
CS 561, Session 7
64
65Simulated annealing Sword
- He heats the metal, then slowly cools it as he
hammers the blade into shape. - If he cools the blade too quickly the metal will
form patches of different composition - If the metal is cooled slowly while it is shaped,
the constituent metals will form a uniform alloy.
CS 561, Session 7
65
66Simulated annealing Sword
- He heats the metal, then slowly cools it as he
hammers the blade into shape. - If he cools the blade too quickly the metal will
form patches of different composition - If the metal is cooled slowly while it is shaped,
the constituent metals will form a uniform alloy. - Example arranging cubes in a box (sugure cubes)
CS 561, Session 7
66
67Simulated annealing in practice
- set T
- optimize for given T
- lower T
- Repeat
- (see Geman Geman, 1984)
CS 561, Session 7
67
68Simulated annealing in practice
- Geman Geman (1984) if T is lowered
sufficiently slowly (with respect to the number
of iterations used to optimize at a given T),
simulated annealing is guaranteed to find the
global minimum. - Caveat this algorithm has no end (Geman
Gemans T decrease schedule is in the 1/log of
the number of iterations, so, T will never reach
zero), so it may take an infinite amount of time
for it to find the global minimum.
CS 561, Session 7
68
69Simulated annealing algorithm
- Idea Escape local extrema by allowing bad
moves, but gradually decrease their size and
frequency.
Algorithm when goal is to minimize E.
-
lt
-
CS 561, Session 7
69
70Note on simulated annealing limit cases
- Boltzmann distribution accept bad move with
?Elt0 (goal is to maximize E) with probability
P(?E) exp(?E/T) - If T is large ?E lt 0
- ?E/T lt 0 and very small
- exp(?E/T) close to 1
- accept bad move with high probability
- If T is near 0 ?E lt 0
- ?E/T lt 0 and very large
- exp(?E/T) close to 0
- accept bad move with low probability
Random walk
Deterministic down-hill
CS 561, Session 7
70
71Summary
(Skip)
- A search best-first with measure path cost
so far estimated path cost to goal. - - combines advantages of uniform-cost and
greedy searches - - complete, optimal and optimally efficient
- - space complexity still exponential
CS 561, Session 7
71
72Summary
- Time complexity of heuristic algorithms depend on
quality of heuristic function. Good heuristics
can sometimes be constructed by examining the
problem definition or by generalizing from
experience with the problem class. - Iterative improvement algorithms keep only a
single state in memory. - Can get stuck in local extrema simulated
annealing provides a way to escape local extrema,
and is complete and optimal given a slow enough
cooling schedule.
CS 561, Session 7
72
73Outline
- Today
- Local Search
- Meta-Heuristic Search
- Specific Search Strategies
- Simulated Annealing
- Stochastic Machines
- Convergence Analysis
2009/11/21
Shi-Chung Chang, NTUEE, GIIE, GICE, Spring, 2008
74- Stochastic Machines
- CS679 Lecture Note
- by Jin Hyung Kim
- Computer Science Department
- KAIST
75Statistical Machine
- Root at statistical mechanics
- derive thermodynamic properties of macroscopic
bodies from microscopic elements - probabilistic nature due to enormous degree of
freedom - concept of entropy plays the central role
- Gibbs distribution
- Markov Chain
- Metropolis algorithm
- Simulated Annealing
- Boltzman Machine
- device for modeling the underlying probability
distribution of data set
76Statistical Mechanics
- In thermal equilibrium, probability of state i
- energy of state i
- absolute temperature
- Boltzman constant
77Markov Chain
- Stochastic process of Markov property
- state Xn1 at time n1depends only on state Xn
- Transition probability Stochastic matrix
- m-step transition probability
78Markov Chain
- Recurrent state
- P(ever returning to the state i) 1
- Transient state
- P(ever returning to the state i) lt 1
- mean recurrence time of state i Ti(k)
- expectation of time elapsed between (k-1)th
return to kth return - steady-state probability of state i, ?i
- ?I 1/(mean recurrence time)
- ergodicity
- long-term proportion of time spent in state i
approaches to the steady-state probability
79Convergence to stationary distribution
- State distribution vector
- starting from arbitrary initial distribution,
transition prob will converge to stationary
distribution for ergodic Markov chain - independent of initial distribution
80Metropolis algorithm
- Modified Monte Carlo method
- Suppose our objective is to reach the state
minimizing energy function - 1. Randomly generate a new state, Y, from state X
- 2. If ?E(energy difference between Y and X) lt 0
- then move to Y (set Y to X) and goto 1
- 3. Else
- 3.1 select a random number, ?
- 3.2 if ? lt exp(- ?E / T)
- then move to Y (set Y to X) and goto 1
- 3.3 else goto 1
81Metropolis algrthm and Markov Chain
- choose probability distribution so that Markov
chain converge to be a Gibbs distribution - then
- where
- Metropolis algorithm is equivalent to random step
in stationary Markov chain - shown that such choice satisfied principle of
detailed balance
82Simulated Annealing
- Solves combinatorial optimization
- variant of Metropolis algorithm
- by S. Kirkpatric (83)
- finding minimum-energy solution of a neural
network finding low temperature state of
physical system - To overcome local minimum problem
- Key idea
- Instead always going downhill, try to go downhill
most of the time
83 Iterative Statistical
- Simple Iterative Algorithm (TSP)
- 1. find a path p
- 2. make p, a variation of p
- 3. if p is better than p, keep p as p
- 4. goto 2
- Metropolis Algorithm
- 3 if (p is better than p) or (random lt Prob),
then keep p as p - a kind of Monte Carlo method
- Simulated Annealing
- T is reduced as time passes
84About T
- Metropolis Algorithm
- Prob p(DE) exp ( DE / T)
- Simulated Annealing
- Prob pi(DE) exp ( DE / Ti)
- if Ti is reduced too fast, poor quality
- if Tt gt T(0) / log(1t) - Geman
- System will converge to minimun configuration
- Tt k/1t - Szu
- Tt a T(t-1) where a is in between 0.8 and 0.99
85Function Simulated Annealing
- current ? select a node (initialize)
- for t ? 1 to ? do
- T ? schedulet
- if T0 then return current
- next ? a random selected successor of current
- ?E ? valuenext - valuecurrent
- if ?E gt 0 then current ? next
- else current ? next only with probability e?E /T
86Outline
- Today
- Local Search
- Meta-Heuristic Search
- Specific Search Strategies
- Simulated Annealing
- Stochastic Machines
- Convergence Analysis
2009/11/21
Shi-Chung Chang, NTUEE, GIIE, GICE, Spring, 2008
87- Convergence Analysis of Simulated Annealing
- Ref E. Aarts, J. Korst, P. Van Laarhoven,
Simulated Annealing, in Local Search in
Combinatorial Optimization, edited by E. Aarts
and J. Lenstra, 1997, pp. 98-104.
88Boltzmann distribution
- At thermal equilibrium at temperature T, the
- Boltzmann distribution gives the relative
- probability that the system will occupy state A
vs. state B as - where E(A) and E(B) are the energies associated
with states A and B.
89Simulated annealing in practice
- Geman Geman (1984) if T is lowered
sufficiently slowly (with respect to the number
of iterations used to optimize at a given T),
simulated annealing is guaranteed to find the
global minimum. - Caveat this algorithm has no end (Geman
Gemans T decrease schedule is in the 1/log of
the number of iterations, so, T will never reach
zero), so it may take an infinite amount of time
for it to find the global minimum.
90Simulated annealing algorithm
- Idea Escape local extrema by allowing bad
moves, but gradually decrease their size and
frequency.
Algorithm when goal is to minimize E.
-
lt
-
91Note on simulated annealing limit cases
- Boltzmann distribution accept bad move with
?Elt0 (goal is to maximize E) with probability
P(?E) exp(?E/T) - If T is large ?E lt 0
- ?E/T lt 0 and very small
- exp(?E/T) close to 1
- accept bad move with high probability
- If T is near 0 ?E lt 0
- ?E/T lt 0 and very large
- exp(?E/T) close to 0
- accept bad move with low probability
Random walk
Deterministic down-hill
92Markov Model
- Solution ??State
- Cost of a solution ?? Energy of a state
- Generation Probability of state j from state i
Gij
93Acceptance and Transition Probabilities
- Acceptance probability of state j as next state
at state i
- Transition probability from state i to state j
94Irreducibility and Aperiodicity of M.C.
95Theorem 1 Existence of Unique Stationary State
Distribution
- Finite homogeneous M.C.
- Irriducibility Aperiodicity
?existence of unique stationary distribution
96Theorem 2 Asymptotic Convergence of Simulated
Annealing
- P(k) the transition matrix of the homogeneous
M.C. associated - with the S.A. algorithm
- Ck C for all k
? existence of a unique stationary distribution
97Asymptotic Convergence of Simulated Annealing
From Theorem 2