Another Stochastic Technique

About This Presentation

Title:

Another Stochastic Technique

Description:

Simulated annealing ... Simulated Annealing. Does not use ... In optimization applications, simulated annealing is used as a training/searching technique. ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 25

Provided by: coo49

Category:

more less

Transcript and Presenter's Notes

Title: Another Stochastic Technique

1
Another Stochastic Technique
2
Stochastic Methods

For problems of high dimensionality and/or
complexity , i.e. a very large search space,
stochastic methods are often the only feasible
methods for global optimization.

3
Stochastic Techniques

Simulated annealing
Start at a random location in the solution space
and update the parameter(s) by some random
amount.
If the new solution is better, accept it
If the new solution is not better, accept it
probabilitistically
With each group of steps reduce the probability
of accepting a solution that does not improve
Stop when this probability goes to 0

4
Simulated Annealing

The success of global search methods in finding a
global maximum/minimum, hinges on a balance
between an exploration process, a guidance
process, and a convergence inducing process.
Exploration process gives the search a mechanism
for sampling a sufficiently diverse set of
points. This process is usually stochastic in
nature.
Guidance process an explicit or implicit process
that evaluates the relative quality of two search
points. Biases the search to move toward regions
of high quality or improving solutions

5
Simulated Annealing

Convergence-inducing process ensures the ultimate
convergence of the search to a fixed optimal
solution.
Simulated Annealing
Does not use gradient information to guide its
search
Is thus applicable to a wider range of problems,
i.e. those for which the gradient is expensive to
compute or cannot be computed.

6
Simulated Annealing

Has its foundation in metallurgy
If one heats a metal to a high temperature, then
the crystalline structure to which it stabilizes
as it cools is a function of the rate at which it
cools
If cool slowly enough it will settle in a minimum
energy state, i.e. optimal (strongest)
crystalline structure
In optimization applications, simulated annealing
is used as a training/searching technique.

7
Simulated Annealing

Another way to think of simulated annealing (less
accurate) is to consider a cylinder containing a
bunch of objects of different shapes. How would
you go about packing them in most efficiently?
As you add them to the cylinder, be shaking the
cylinder

8
Simulated Annealing

The important thing to remember about simulated
annealing is that if you start at a high enough
temperature, and decrease the temperature (cool)
slowly enough, you are guaranteed to find the
global minimum.
Questions How high is high enough and how slow
is slow enough?

9
SA Algorithm

As an example, we will use a neural network
problem.
Lets say we have a two layer NN with 3 hidden
layer neurons (tanh) and one linear output layer
neuron. The input vector is X x1, , x4 . We
will combine the bias into the weight vector and
so the first layer weights will be W1 w1, ,
w5 . The output layer weight will be W2 w1,
, w4 .

10
SA Algorithm

We will assume this is a function approximation
problem. Thus, the goal is to minimize the mean
squared error.
Another way to think of this is to say that we
want to find that set of weights W which will
minimize the error for the training data.
For any given set of weights (there are 19 in
total for this network), find the mean squared
error using these weights and the training data
to perform the test.

11
SA Algorithm

Start
First select an initial temperature and an
annealing (cooling) schedule
Initialization
Initialize the Ws to small random values (just
like with a BPNN algorithm)
Evaluate (compute) the mean squared error for the
training data E(Wi).

12
SA Algorithm

Select a single random weight and add a small
random value to it (positive or negative).
Compute E(Wi1)
If E(Wi1) lt E(Wi) let Wi1 be the new Wi
Else
If E(Wi1) gt E(Wi) let Wi1 be the new Wi with
probability e(-?/T) where ? E(Wi1) - E(Wi) and
T is the temperature
Reduce the temperature according to the cooling
schedule and repeat

13
How is the probability evaluated?

In the previous slide, we said if the error
increases, we still accept the change with a
probability of e-?/T
Note this value is always lt1
If ? does not change, as T gets smaller e-?/T
gets smaller
Lets say that e-?/T0.25. Then we generate a
random number (P) from a uniform distribution
0,1), and accept the change only if Plt0.25

14
SA Algorithm

The question now is, when do we stop?
T0?
No improvement for ?? Epochs
Validation set is worse N times?
Error lt e
Continue until the number of accepted
perturbations is small
???

15
Annealing/Cooling Schedule

There are lots of cooling schedules
There are three questions to ask about a cooling
schedule
What (how large) is the initial temperature?
The larger the initial temperature, the slower
the cooling
How often do we cool?
After how many updates do we lower the
temperature?
By how much do we cool?
By how much do we reduce the temperature

16
Annealing/Cooling Schedule

How often do we cool?
One does not generally cool after every test
value (perturbation) is presented
Since we are selecting values to change randomly,
if we have N values or weights it will likely
take gtN selections to change every weight at
least once.
Generally would not want to reduce the
temperature any more often that 10-100 epochs
An epoch would be the presentation of N
perturbations.

17
Annealing/Cooling Schedule (one Example)

How much do we cool?
Lets say we have chosen an initial temperature
T0, typically we want T1 / T0 0.7-0.9
We would then perform K epochs of perturbations,
and update T.

18
Annealing/Cooling Schedule (one Example)

Lets say T1 900 and T0 1000, thus T1 / T0
0.9
We test for one epoch with TT1 , we wont use T0
Now we compute T2 as (0.9)900 810, and
perturb for one epoch

19
Annealing/Cooling Schedule (one Example)

Now after the perturbations with T 810, we
compute T3
etc.

20
Other schedules

Let k be the time or epoch step count
Tk1 T0 ?k (linear cooling)
Tk1 c/(log(k1)) where c is a user set
constant, could be T0

21
Simulated Annealing

On the plus side, it can be shown (statistically)
that by using simulated annealing properly, one
is assured to reach the global optimum.
On the minus side, simulated annealing is very
time intensive, i.e. slow.

22
Simulated Annealing

On the plus side, it can be shown (statistically)
that by using simulated annealing properly, one
is assured to reach the global optimum.
On the minus side, simulated annealing is very
time intensive, i.e. slow.

23
Homework Traveling Salesperson Problem

Given a 2D array in which each (row,column)
element represents a distance from city row to
city column.
Find the minimal length transit that visits all
cities once and only once.
For 30 cities there are gt1031 possible paths.

24
Traveling Salesperson Problem

Choose a cooling schedule
Choose an initial temperature
Generate an initial list or city sequence
Perturb this list
Compute new distance
Accept if better, or with probability P accept if
worse
If at end of epoch, cool
Go to 4 if not completed

Write a Comment

User Comments (0)

About PowerShow.com

Another Stochastic Technique - PowerPoint PPT Presentation

Another Stochastic Technique

Simulated annealing ... Simulated Annealing. Does not use ... In optimization applications, simulated annealing is used as a training/searching technique. ... – PowerPoint PPT presentation