CS 478 Machine Learning - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

CS 478 Machine Learning

Description:

Individuals survive based on their ability to adapt to the pressures of their ... Fitter individuals tend to have more offspring, thus driving the population as a ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 37
Provided by: mauc3
Category:

less

Transcript and Presenter's Notes

Title: CS 478 Machine Learning


1
CS 478 - Machine Learning
  • Genetic Algorithms (I)

2
Darwins Origin of Species Basic Principles (I)
  • Individuals survive based on their ability to
    adapt to the pressures of their environment
    (i.e., their fitness)
  • Fitter individuals tend to have more offspring,
    thus driving the population as a whole towards
    favorable traits
  • During reproduction, the traits found in parents
    are passed onto their offspring
  • In sexual reproduction, the chromosomes of the
    offspring are a mix of those of their parents
  • The traits of offspring are partially inherited
    from their parents and partially the result of
    new genes/traits created during the process of
    reproduction
  • Nature produces individuals with differing traits
  • Over long periods, variations can accumulate,
    producing entirely new species whose traits make
    them especially suited to particular ecological
    niches

3
Darwins Origin of Species Basic Principles (II)
  • Evolution is effected via two main genetic
    mechanisms
  • Crossover
  • Take 2 candidate chromosomes
  • Randomly choose 1 or 2 crossover points
  • Swap the respective components to create 2 new
    chromosomes
  • Mutation
  • Choose a single offspring
  • Randomly change some aspect of it

4
Intuition
  • Essentially a pseudo-random walk through the
    population with the aim of maximizing some
    fitness function
  • From a starting population
  • Crossover ensures exploitation
  • Mutation ensures exploration
  • GAs are based on these principles

5
Natural vs Artificial
  • Individual
  • Population
  • Fitness
  • Chromosome
  • Gene
  • Crossover and mutation
  • Natural selection
  • Candidate solution
  • Set of candidate solutions
  • Measure of quality of solutions
  • Encoding of candidate solutions
  • Part of the encoding of a solution
  • Search operators
  • Re-use of good (sub-)solutions

6
Phenotype vs. Genotype
  • In Genetic Algorithms (GAs), there is a clear
    distinction between phenotype (i.e., the actual
    individual or solution) and genotype (i.e., the
    individual's encoding or chromosome). The GA, as
    in nature, acts on genotypes only. Hence, the
    natural process of growth must be implemented as
    a genotype-to-phenotype decoding.
  • The original formulation of genetic algorithms
    relied on a binary-encoding of solutions, where
    chromosomes are strings of 0s and 1s. Individuals
    can then be anything so long as there is a way of
    encoding/decoding them using binary strings.

7
Simple GA
  • One often distinguishes between two types of
    genetic algorithms, based on whether there is a
    complete or partial replacement of the population
    between generations (i.e., whether there is
    overlap or not between generations).
  • When there is complete replacement, the GA is
    said to generational, whilst when replacement is
    only partial, the GA is said to be steady-state.
    If you look carefully at the algorithms below,
    you will notice that even the generational GA
    gives only partial replacement when cloning takes
    place (i.e., cloning causes overlap between
    generations). Moreover, if steady-state is
    performed on the whole population (rather than on
    a proportion of fittest individuals), then the GA
    is generational.
  • Hence, the distinction is more a matter of how
    reproduction takes place than a matter of overlap.

8
Generational GA
  • Randomly generate a population of chromosomes
  • While (termination condition not met)
  • Decode chromosomes into individuals
  • Evaluate fitness of all individuals
  • Select fittest individuals
  • Generate new population by cloning, crossover and
    mutation

9
Steady-state GA
  • Randomly generate a population of chromosomes
  • While (termination condition not met)
  • Decode chromosomes into individuals
  • Evaluate fitness of all individuals
  • Select fittest individuals
  • Produce offspring by crossover and mutation
  • Replace weakest individuals with offspring

10
Genetic Encoding / Decoding
  • We focus on binary encodings of solutions
  • We first look at single parameters (i.e., single
    gene chromosomes) and then vectors of parameters
    (i.e., multi-gene chromosomes)

11
Integer Parameters
  • Let p be the parameter to be encoded. There are
    three distinct cases to consider
  • p takes values from 0, 1, ..., 2N-1 for some N
  • Then p can be encoded directly by its equivalent
    binary representation
  • p takes values from M, M1, ..., M2N-1 for
    some M and N
  • Then (p - M) can be encoded directly by its
    equivalent binary representation.
  • p takes values from 0, 1, ..., L-1 for some L
    such that there exists no N for which L2N
  • Then there are two possibilities clipping or
    scaling

12
Clipping
  • Clipping consists of taking Nlog(L)1 bits and
    encoding all parameter values 0 ? p ? L-2 by
    their equivalent binary representation, letting
    all other N-bit strings serve as encodings of
    pL-1.
  • For example, assume p takes values in 0, 1, 2,
    3, 4, 5, i.e., L6. Then Nlog(6)13.
  • Here, not only is 101 an (expected) encoding of
    pL-15, but so are 110 and 111
  • Advantages easy to implement.
  • Disadvantages strong representational bias,
    i.e., all parameter values between 0 and L-2 have
    a single encoding, whilst the single parameter
    value L-1 has 2N-L1 encodings.

13
Scaling
  • Scaling consists of taking Nlog(L)1 bits and
    encoding p by the binary representation of the
    integer value e such that p e(L-1)/(2N-1)
  • For example, assume p takes values in 0, 1, 2,
    3, 4, 5, i.e., L6. Then Nlog(6)13.
  • Here, the binary encodings are not generally
    numerically equivalent to the integer values they
    code
  • Advantages easy to implement and smaller
    representational bias than clipping (each value
    of p has 1 or 2 encodings, with double encodings
    evenly spread over the values of p)
  • Disadvantages more computation needed and still
    a small representational bias

14
Real-valued Parameters (I)
  • Real values may be encoded as fixed point numbers
    or integers via scaling and quantization
  • If p ranges over min, max, then p is encoded by
    the binary representation of the integer part of

15
Real-valued Parameters (II)
  • Real values may also be encoded using thermometer
    encoding
  • Let T be an integer greater than 1
  • Thermometer encoding of real values on T bits
    consists of normalizing all real values to the
    interval 0,1 and converting each normalized
    value x to a bit-string of xT (rounded down) 1s
    followed by trailing 0s as needed.

16
Vectors of Parameters
  • Vectors of parameters are encoded on multi-gene
    chromosomes by combining the encodings of each
    individual parameter
  • Let eibi0, ..., biN be the encoding of the ith
    of M parameters
  • There are two possibilities for combining the ei
    's onto a chromosome
  • Concatenating Here, individual encodings simply
    follow each other in some pre-defined order,
    e.g., b10, ..., b1N, ..., bM0, ..., bMN
  • Interleaving Here, the bits of each individual
    encoding are interleaved, e.g., b10, ..., bM0,
    ..., b1N, ..., bMN
  • The order of parameters in the vector (resp.,
    genes on the chromosome) is important, especially
    for concatenated encodings

17
Gray Coding (I)
  • A Gray code represents each number in the
    sequence of integers 0, 1, ..., 2N-1 as a binary
    string of length N, such that adjacent integers
    have representations that differ in only one bit
    position
  • A number of different Gray codes exist. One
    simple algorithm to produce gray codes starts
    with all bits set to 0 and successively flips the
    right-most bit that produces a new string

18
Gray Coding (II)
19
Gray Coding (III)
  • Advantages random bit-flips (e.g., during
    mutation) are more likely to produce small
    changes (i.e., there are no Hamming cliffs since
    adjacent integers' representations differ by
    exactly one bit).
  • Disadvantages big changes are rare but bigger
    than with binary codes.
  • For example, consider the string 001. There are 3
    possible bit flips leading to the strings 000,
    011 and 101
  • With standard binary encoding, 2 of the 3 flips
    lead to relatively large changes (from 001(1) to
    011(3) and from 001(1) to 101(5),
    respectively)
  • With Gray coding, 2 of the 3 flips produce small
    changes (from 001(1) to 000(0) and from 001(1)
    to 011(2), respectively)
  • However, the less probable (1 out of 3) flip from
    001 to 101 produces a bigger change under Gray
    coding (to 6) than under standard binary encoding
    (to 5)

20
GA operators
  • We will restrict our discussion to binary strings
  • The basic GA operators are
  • Selection
  • Crossover
  • Mutation

21
Selection
  • Selection is the operation by which chromosomes
    are selected for reproduction
  • Chromosomes corresponding to individuals with a
    higher fitness have a higher probability of being
    selected
  • There are a number of possible selection schemes
    (we discuss some here)
  • Fitness-based selection makes the following
    assumptions
  • There exists a known quality measure Q for the
    solutions of the problem
  • Finding a solution can be achieved by maximizing
    Q
  • For all potential solutions (good or bad), Q is
    positive.
  • A chromosome's fitness is taken to be the quality
    measure of the individual it encodes

22
Fitness-proportionate Selection
  • This selection scheme is the most widely used in
    GAs
  • Let fi be the fitness value of individual i and
    let favg be the average population fitness
  • Then, the probability of an individual i being
    selected is given by

23
Roulette Wheel
  • Fitness-proportionate selection (FPS) can be
    implemented with the roulette-wheel algorithm
  • A wheel is constructed with markers corresponding
    to fitness values
  • For each fitness value fi, the size of the marker
    (i.e., the proportion of the wheel's
    circumference) associated to fi is given by pi as
    defined above
  • Hence, when the wheel is spun, the probability of
    the roulette landing on fi (and thus selecting
    individual i) is given by pi, as expected

24
Vector Representation
  • A vector v of M elements from 1, ..., N is
    constructed so that each subsequent i in 1, ...,
    N has M.pi entries in v
  • A random index r from 1, ..., M is selected and
    individual v(r) is selected
  • Example
  • 4 individuals such that f1f210, f315 and f425
  • If M12, then v(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4,
    4)
  • Generate r6, then individual v(6)3 is selected

25
Cumulative Distribution
  • A random real-valued number r in
  • is chosen and individual i, such that
  • is selected (if i1, the lower bound sum is 0).

26
Discussion (I)
  • The implementation based on cumulative
    distribution is effective but relatively
    inefficient, whilst the implementation based on
    vector representation is efficient but its
    effectiveness depends on M (i.e., the value of M
    determines the level of quantization of the pi's
    and thus accuracy depends on M).
  • Assume that N individuals have to be selected for
    reproduction. The expected number of copies of
    each individual in the mating pool is
  • Hence, individuals with above-average fitness
    tend to have more than one copy in the mating
    pool, whilst individuals with below-average
    fitness tend not to be copied. This leads to
    problems with FPS.

27
Discussion (II)
  • Premature convergence
  • Assume an individual X with fi gtgt favg but fi ltlt
    fmax is produced in an early generation. As Ni gtgt
    1, the genes of X quickly spread all over the
    population. At that point, crossover cannot
    generate any new solutions (only mutation can)
    and favg ltlt max forever.
  • Stagnation
  • Assume that at the end of a run (i.e., in one of
    the consecutive generations) all individuals have
    a relatively high and similar fitness, i.e., fi
    is almost fmax for all i. Then, Ni is almost 1
    for all i and there is virtually no selective
    pressure.
  • Both of these problems can be solved with fitness
    scaling techniques

28
Fitness Scaling
  • Essentially, fitness values are scaled down at
    the beginning and scaled up towards the end
  • There are 3 general scaling methods
  • Linear scaling
  • f is replaced by fa.f b, where a and b are
    chosen such that
  • favgfavg (i.e., the scaled average is the same
    as the raw average)
  • fmaxc. favg (c is the number of expected copies
    desired for the best individual usually c2)
  • The scaled fitness function may take on negative
    values if there are a few bad individuals with
    fitness much lower than favg and favg is close to
    fmax . One solution is to arbitrarily assign the
    value 0 to all negative fitness values.
  • Sigma truncation
  • f is replaced by f'f - (favg - c.?), where ? is
    the population standard deviation, c is a
    reasonable multiple of ? (usually 1 ? c ? 3) and
    negative results are arbitrarily set to 0.
    Truncation removes the problem of scaling to
    negative values. (Note that truncated fitness
    values may also be scaled if desired)
  • Power law scaling
  • f is replaced by f'fk for some suitable k. This
    method is not used very often. In general, k is
    problem-dependent and may require dynamic change
    to stretch or shrink the range as needed

29
Rank Selection
  • All individuals are sorted by increasing values
    of their fitness
  • Then, each individual is assigned a probability
    pi of being selected from some prior probability
    distribution
  • Typical distributions include
  • Linear Here, pia.i b
  • Negative exponential Here, pia.eb.i c
  • Rank selection (RS) has little biological
    plausibility. However, it has the following
    desirable features
  • No premature convergence. Because of the ranking
    and the probability distribution imposed on it,
    even less fit individuals will be selected (e.g.,
    let there be 3 individuals such that f190, f27,
    f33, and pi-0.4i 1.3. With FPS, p10.9 gtgt
    p20.07 and p30.03, so that individual 1 comes
    to saturate the population. With RS, p10.9,
    p20.5 and p30.1, so that individual 2 is also
    selected).
  • No stagnation. Even at the end, N1 ? N2 ? ...
    (similar argument to above).
  • Explicit fitness values not needed. To order
    individuals, only the ability of comparing pairs
    of solutions is necessary.
  • However rank selection introduces a reordering
    overhead and makes a theoretical analysis of
    convergence difficult.

30
Tournament Selection
  • Tournament selection can be viewed as a noisy
    version of rank selection.
  • The selection process is two-stage
  • Select a group of N (? 2) individuals
  • Select the individual with the highest fitness
    from the group and discard all others
  • Tournament selection inherits the advantages of
    rank selection. In addition, it does not require
    global reordering and is more naturally-inspired.

31
Elitist Selection
  • The idea behind elitism is that at least one copy
    of the best individual in the population is
    always passed onto the next generation.
  • The main advantage is that convergence is
    guaranteed (i.e., if the global maximum is
    discovered, the GA converges to that maximum). By
    the same token, however, there is a risk of being
    trapped in a local maximum.
  • One alternative is to save the best individual so
    far in some kind of register and, at the end of
    each run, to designate it as the solution instead
    of using the best of the last generation.

32
1-point Crossover
  • Here, the chromosomes of the parents are cut at
    some randomly chosen common point and the
    resulting sub-chromosomes are swapped
  • For example
  • P11010101010 and P21110001110
  • Crossover point between the 6th and 7th bits
  • Then the offspring are
  • O11010101110
  • O21110001010

33
2-point Crossover
  • Here, the chromosomes are thought of as rings
    with the first and last gene connected (i.e.,
    wrap-around structure)
  • The rings are cut in two sites and the resulting
    sub-rings are swapped
  • For example
  • P11010101010 and P21110001110
  • Crossover points are between the 2nd and 3rd
    bits, and between the 6th and 7th bits
  • Then the offspring are
  • O11110101110
  • O21010001010

34
Uniform Crossover
  • Here, each gene of the offspring is selected
    randomly from the corresponding genes of the
    parents
  • For example
  • P11010101010 and P21110001110
  • Then the offspring could be
  • O1110101110
  • Note produces a single offspring

35
Mutation
  • Mutation consists of making (usually small)
    alterations to the values of one or more genes in
    a chromosome
  • In binary chromosomes, it consists of flipping
    random bits of the genotype. For example,
    1010101010 may become 1011101010 if the 4th bit
    is flipped.

36
Prototypical Steady-state GA
  • P ? p randomly generated hypotheses
  • For each h in P, compute fitness(h)
  • While maxh fitness(h) lt threshold ()
  • Ps ? Select r.p individuals from P (e.g., FPS,
    RS, tournament)
  • Apply crossover to random pairs in Ps and add all
    offspring to Po
  • Select m of the individuals in Po with uniform
    probability and apply mutation (i.e., flip one of
    their bits at random)
  • Pw ? r.p weakest individuals in P
  • P ? P Pw Po
  • For each h in P, compute fitness(h)
Write a Comment
User Comments (0)
About PowerShow.com