Discrete and Genetic Algorithms in Bioinformatics - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Discrete and Genetic Algorithms in Bioinformatics

Description:

Discrete and Genetic Algorithms in Bioinformatics */27 – PowerPoint PPT presentation

Number of Views:356
Avg rating:3.0/5.0
Slides: 28
Provided by: Hsu
Category:

less

Transcript and Presenter's Notes

Title: Discrete and Genetic Algorithms in Bioinformatics


1
Discrete and Genetic Algorithms in Bioinformatics
  • ???
  • ????????

2
Discrete Algorithms
  • Discrete Math. lies in the foundation of modern
    computer science
  • Most algorithms we have learned in computer
    science are discrete
  • Discrete algorithms emphasize worst case
    analysis
  • Many sequence manipulation algorithms in
    bioinformatics are discrete

3
Natural Problems (1)
  • Natural problems Problems arisen from nature,
    which are guaranteed to have feasible solutions
    if data is collected accurately.
  • But because of noises in sampled data, such
    solutions are hard to come by.
  • To tackle these problems one should focus on real
    data rather than worst case analysis.

4
Natural Problems (2)
  • Techniques taking advantage of the natural
    constraints of these problems do not necessarily
    work for general data (especially the worst
    case), but could perform very well for those
    well-structured problems.
  • Examples
  • many computational problems arisen from biology,
    speech recognition, and image processing

5
Constraints with Errors
  • In ordinary constraint optimization problems, one
    naturally assumes that the constraints are
    correct.
  • What if these constraints are inconsistent?
  • There is no feasible solution satisfying them
  • What if every constraint is only partially
    correct?

6
Explicit Solution Candidates
  • In ordinary optimization problems, most
    algorithms do not generate plausible solutions in
    the interim
  • However, there are advantages to have some
    solution candidates when there are errors in the
    constraints.

7
Plausible Solution Candidates
  • For some optimization problems, machine learning
    approaches generate plausible solutions in the
    interim.
  • Solutions are getting better while the machine
    learning approach refines solution patterns
    iteratively.
  • A better solution emerges from the cooperation of
    plausible solution candidates.

8
Fitness Landscape
  • Each solution candidate has its fitness score for
    the optimization problem.
  • A fitness landscape shows the fitness
    distribution of the whole search space.
  • Solution candidates are ranked by fitness
    judgment.

9
Genetic Algorithm
  • A search technique to find the exact or
    approximate solutions to optimization problems.
  • It is based on the principle of evolution
  • Survival of the fittest in Natural Selection
  • Two basic processes from evolution
  • Inheritance (passing of features from one
    generation to the next)
  • Competition (survival of the fittest)

10
Basic description of GA
  • Algorithm is started with a set of solutions
    (represented by chromosomes) called population.
  • Solutions from one population are taken and used
    to form a new population.
  • The new population (offspring) will be better
    than the old one (parent).
  • Solutions which are selected to form new
    solutions are selected according to their fitness
    - the more suitable they are the more chances
    they have to reproduce.

11
GA in Pseudo-code
  • Choose initial population
  • Evaluate the fitness of each individual in the
    population
  • Repeat
  • Select best-ranking individuals to reproduce
  • Breed new generation through crossover and
    mutation (genetic operations) and give birth to
    offspring
  • Evaluate the individual fitness of the offspring
  • Replace worst ranked part of population with
    offspring
  • Until termination

12
Building Block Hypothesis
  • Building block a short and highly fit schema
    providing benefit for the solution.
  • The global optimal solution is made up of
    building blocks.
  • Identify, recombine, and resample small building
    blocks to form a new solution with potentially
    higher fitness.
  • By working with these particular building blocks,
    we have reduced the complexity of our problem.

13
The Fitness Function
  • Plays the role of a judge
  • Give more scores if the individual owns more
    building blocks
  • Refine the fitness function based on the
    evolution results

14
Physical Mapping
15
Cutting and reassembling for DNA sequence
  • Cut a DNA sequence into small pieces in different
    ways and reassemble them together
  • the small pieces (called clones) are still too
    large to find complete sequences
  • biologically, use probeto mark the clones
  • each probe could mark several clones clone could
    contain several probes

16
The Physical Mapping Problem with Noisy Genomic
DataJournal of Computational Biology 10(5),
709-735, 2003
  • Each row represents a clone Each column
    represents a probe
  • Diagram on the left input clone-probe matrix
  • Diagram on the right after probe arrangement the
    clones are put in correct positions

17
Consecutive Ones with Errors
18
False Positives and False Negatives
19
A genetic algorithm for physical mapping
  • A two-stage genetic algorithm
  • First stage generate the neighborhood
    information among probes
  • Second stage generate the maximum length of
    connecting probes

20
The first stage of GA (GA1)
  • Purpose find a probe ordering with the highest
    fitness score for each clone.
  • Pseudo Code
  • Random generate a population of probe
    permutations
  • Evaluate the fitness of each individual in the
    population
  • Repeat
  • Select best-ranking individuals to reproduce
  • Breed new generation through crossover and
    mutation (genetic operations) and give birth to
    offspring
  • Evaluate the individual fitnesses of the
    offspring
  • Replace worst ranked part of population with
    offspring
  • Until termination

21
The first stage of GA (GA1)
4 1 2 3 5 8 6 9 11 12 13 14 15 17 18
? ? ?
? ?
? ? ?
? ? ? ? ? ? ?
? ? ?
? ? ? ? ?
? ? ? ?
Two building blocks that make partial
consecutive ones
? ? ? ?
22
Crossover Operation
2 3 6 8 1 9 10 12 13 5 11 14 15 17 18
P1
9 10 11 12 13 14 8 18 17 6 5 3 2 1 15
P2
Child
2 3 6 8 1 9 10 11 12 13 14 18 17 5 15
2 3 6 8 1 9
2 3 6 8 1 9 10
2 3 6 8 1 9 10 11
2 3 6 8 1 9 10 11 12
23
Mutations
2 3 6 8 1 9 10 12 13 5 11 12 15 17 18
2 3 6 8 5 9 10 12 13 1 11 12 15 17 18
24
Detection of false Negatives
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
? ? ? ?
? ? ?
? ? ? ? ? ? ?
? ? ? ? ?
? ? ? ? ? ?
? ? ? ? ? ?
? ? ? ? ? ?
? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
25
The first stage of GA (GA1)
  • Construct the probe neighboring information
    according to the GA1 results

1 2 3 5 6 8 9 10 11 12 13 14 15 17 18
Probe ordering result for probe segment 1
5 6 7 8 9 10 11 13 14 15 16 17 18 19 20
Probe ordering result for probe segment 2
.
83 85 86 87 88 89 90 91 92 93 95 96 97 98 99
Probe ordering result for probe segment 20
5 3, 6 6 5, 8 8 6, 9 18 17
5 6 6 5, 7 7 6, 8 20 19
5 3, 6 6 5, 7, 8 7 6, 8, 9 20 19
A neighboring probe list

Probe neighboring information
26
The second stage of GA (GA2)
  • Purpose find the longest connecting probe
    sequence according to the probe neighboring
    information.
  • Pseudo Code
  • Random generate a population of probe
    permutations
  • Evaluate the fitness of each individual in the
    population
  • Repeat
  • Select best-ranking individuals to reproduce
  • Breed new generation through crossover and
    mutation (genetic operations) and give birth to
    offspring
  • Evaluate the individual fitnesses of the
    offspring
  • Replace worst ranked part of population with
    offspring
  • Until termination

27
The second stage of GA (GA2)
  • Generate a probe ordering according to the probe
    neighboring information

1 2 2 1, 3 3 2, 4, 5 4 3, 5 5 3, 4,
6 6 5, 7, 8 7 6, 8, 9 99 97, 98
2 3 5 4 71 72 73 55 56 57 99 98 97 96
1 2 3 4 5 6 7 93 94 95 96 97 98 99
Write a Comment
User Comments (0)
About PowerShow.com