Another Stochastic Technique - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Another Stochastic Technique

Description:

Simulated annealing ... Simulated Annealing. Does not use ... In optimization applications, simulated annealing is used as a training/searching technique. ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 25
Provided by: coo49
Category:

less

Transcript and Presenter's Notes

Title: Another Stochastic Technique


1
Another Stochastic Technique
2
Stochastic Methods
  • For problems of high dimensionality and/or
    complexity , i.e. a very large search space,
    stochastic methods are often the only feasible
    methods for global optimization.

3
Stochastic Techniques
  • Simulated annealing
  • Start at a random location in the solution space
    and update the parameter(s) by some random
    amount.
  • If the new solution is better, accept it
  • If the new solution is not better, accept it
    probabilitistically
  • With each group of steps reduce the probability
    of accepting a solution that does not improve
  • Stop when this probability goes to 0

4
Simulated Annealing
  • The success of global search methods in finding a
    global maximum/minimum, hinges on a balance
    between an exploration process, a guidance
    process, and a convergence inducing process.
  • Exploration process gives the search a mechanism
    for sampling a sufficiently diverse set of
    points. This process is usually stochastic in
    nature.
  • Guidance process an explicit or implicit process
    that evaluates the relative quality of two search
    points. Biases the search to move toward regions
    of high quality or improving solutions

5
Simulated Annealing
  • Convergence-inducing process ensures the ultimate
    convergence of the search to a fixed optimal
    solution.
  • Simulated Annealing
  • Does not use gradient information to guide its
    search
  • Is thus applicable to a wider range of problems,
    i.e. those for which the gradient is expensive to
    compute or cannot be computed.

6
Simulated Annealing
  • Has its foundation in metallurgy
  • If one heats a metal to a high temperature, then
    the crystalline structure to which it stabilizes
    as it cools is a function of the rate at which it
    cools
  • If cool slowly enough it will settle in a minimum
    energy state, i.e. optimal (strongest)
    crystalline structure
  • In optimization applications, simulated annealing
    is used as a training/searching technique.

7
Simulated Annealing
  • Another way to think of simulated annealing (less
    accurate) is to consider a cylinder containing a
    bunch of objects of different shapes. How would
    you go about packing them in most efficiently?
  • As you add them to the cylinder, be shaking the
    cylinder

8
Simulated Annealing
  • The important thing to remember about simulated
    annealing is that if you start at a high enough
    temperature, and decrease the temperature (cool)
    slowly enough, you are guaranteed to find the
    global minimum.
  • Questions How high is high enough and how slow
    is slow enough?

9
SA Algorithm
  • As an example, we will use a neural network
    problem.
  • Lets say we have a two layer NN with 3 hidden
    layer neurons (tanh) and one linear output layer
    neuron. The input vector is X x1, , x4 . We
    will combine the bias into the weight vector and
    so the first layer weights will be W1 w1, ,
    w5 . The output layer weight will be W2 w1,
    , w4 .

10
SA Algorithm
  • We will assume this is a function approximation
    problem. Thus, the goal is to minimize the mean
    squared error.
  • Another way to think of this is to say that we
    want to find that set of weights W which will
    minimize the error for the training data.
  • For any given set of weights (there are 19 in
    total for this network), find the mean squared
    error using these weights and the training data
    to perform the test.

11
SA Algorithm
  • Start
  • First select an initial temperature and an
    annealing (cooling) schedule
  • Initialization
  • Initialize the Ws to small random values (just
    like with a BPNN algorithm)
  • Evaluate (compute) the mean squared error for the
    training data E(Wi).

12
SA Algorithm
  • Select a single random weight and add a small
    random value to it (positive or negative).
  • Compute E(Wi1)
  • If E(Wi1) lt E(Wi) let Wi1 be the new Wi
  • Else
  • If E(Wi1) gt E(Wi) let Wi1 be the new Wi with
    probability e(-?/T) where ? E(Wi1) - E(Wi) and
    T is the temperature
  • Reduce the temperature according to the cooling
    schedule and repeat

13
How is the probability evaluated?
  • In the previous slide, we said if the error
    increases, we still accept the change with a
    probability of e-?/T
  • Note this value is always lt1
  • If ? does not change, as T gets smaller e-?/T
    gets smaller
  • Lets say that e-?/T0.25. Then we generate a
    random number (P) from a uniform distribution
    0,1), and accept the change only if Plt0.25

14
SA Algorithm
  • The question now is, when do we stop?
  • T0?
  • No improvement for ?? Epochs
  • Validation set is worse N times?
  • Error lt e
  • Continue until the number of accepted
    perturbations is small
  • ???

15
Annealing/Cooling Schedule
  • There are lots of cooling schedules
  • There are three questions to ask about a cooling
    schedule
  • What (how large) is the initial temperature?
  • The larger the initial temperature, the slower
    the cooling
  • How often do we cool?
  • After how many updates do we lower the
    temperature?
  • By how much do we cool?
  • By how much do we reduce the temperature

16
Annealing/Cooling Schedule
  • How often do we cool?
  • One does not generally cool after every test
    value (perturbation) is presented
  • Since we are selecting values to change randomly,
    if we have N values or weights it will likely
    take gtN selections to change every weight at
    least once.
  • Generally would not want to reduce the
    temperature any more often that 10-100 epochs
  • An epoch would be the presentation of N
    perturbations.

17
Annealing/Cooling Schedule (one Example)
  • How much do we cool?
  • Lets say we have chosen an initial temperature
    T0, typically we want T1 / T0 0.7-0.9
  • We would then perform K epochs of perturbations,
    and update T.

18
Annealing/Cooling Schedule (one Example)
  • Lets say T1 900 and T0 1000, thus T1 / T0
    0.9
  • We test for one epoch with TT1 , we wont use T0
  • Now we compute T2 as (0.9)900 810, and
    perturb for one epoch

19
Annealing/Cooling Schedule (one Example)
  • Now after the perturbations with T 810, we
    compute T3
  • etc.

20
Other schedules
  • Let k be the time or epoch step count
  • Tk1 T0 ?k (linear cooling)
  • Tk1 c/(log(k1)) where c is a user set
    constant, could be T0

21
Simulated Annealing
  • On the plus side, it can be shown (statistically)
    that by using simulated annealing properly, one
    is assured to reach the global optimum.
  • On the minus side, simulated annealing is very
    time intensive, i.e. slow.

22
Simulated Annealing
  • On the plus side, it can be shown (statistically)
    that by using simulated annealing properly, one
    is assured to reach the global optimum.
  • On the minus side, simulated annealing is very
    time intensive, i.e. slow.

23
Homework Traveling Salesperson Problem
  • Given a 2D array in which each (row,column)
    element represents a distance from city row to
    city column.
  • Find the minimal length transit that visits all
    cities once and only once.
  • For 30 cities there are gt1031 possible paths.

24
Traveling Salesperson Problem
  • Choose a cooling schedule
  • Choose an initial temperature
  • Generate an initial list or city sequence
  • Perturb this list
  • Compute new distance
  • Accept if better, or with probability P accept if
    worse
  • If at end of epoch, cool
  • Go to 4 if not completed
Write a Comment
User Comments (0)
About PowerShow.com