Genetic Threading - PowerPoint PPT Presentation

About This Presentation
Title:

Genetic Threading

Description:

X-ray crystallography requires significant amounts of purified protein molecules ... X-ray crystallography. NMR (Nuclear Magnetic Resonance) Protein structure ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 39
Provided by: aaa288
Learn more at: http://cse.unl.edu
Category:

less

Transcript and Presenter's Notes

Title: Genetic Threading


1
Genetic Threading
  • By J.Yadgari and A.Amir
  • Published special issue on Bioinformatics in
    Journal of Constraints, June 2001

Alexandre Tchourbanov University of Nebraska at
Lincoln CSCE 421-821 December 4, 2001
2
Structure of the presentation
  • Introduction to protein native structure
  • Methods of finding a native structure
  • Physical
  • Computational
  • Common methods and principles
  • Protein threading method
  • Protein threading using genetic approach

3
Problem of protein structure prediction
  • Proteins are key molecules in all life processes
  • The function of a protein directly related to its
    three dimensional structure
  • Knowing and understanding the structure of
    proteins will have a tremendous impact on
    understanding of biological processes, medical
    discoveries, and biotechnological inventions

4
Problem of protein structure preduction
  • Given a sequence of amino acids, predict the
    unique 3D folding of molecule minimizing its free
    energy

1
2
3
Lys
Computational Methods of prediction
Practical use of the 3D structural knowledge
Gly
Leu
Physical methods of prediction
Primary structure
5
Protein structure
  • A protein is built up from a chain of amino acids
    linked by peptide bonds
  • There are 20 amino acids that can be divided into
    several classes based on size and other chemical
    and physical properties
  • Depending on type of a residue, protein could be
    either hydrophilic (water loving) or hydrophobic
    (water hating)

6
General structure of an amino acid
  • Each amino acid consists of
  • Common main chain part, containing the heavy
    atoms N, C, O, C? forming amide plane
  • Chain residue of size 0 10 additional atoms

7
Peptide bond
  • Peptide bond connects carboxyl group of the first
    amino acid with amino group of the second acid
  • Peptide bonds are planar and rigid

8
Sequence of amino acids
  • Sequence of amino acids, connected by peptide
    bonds, form protein
  • There is no flexibility for rotation around
    peptide bond
  • There is more flexibility for protein to rotate
    around N-C?-bond (called the ?-angle) and around
    C-C?-bond (?-angle)
  • These angles are restricted to small regions in
    natural proteins

9
Part of Protein (PheAspAla)
10
Protein folding
  • Using the freedom of rotations, the protein can
    fold into a specific and unique three dimensional
    structure (called conformation), forming a native
    structure

11
Physical methods of determiningprotein native
structure
X-ray crystallography
Physical methods
NMR (Nuclear Magnetic Resonance)
  • X-ray crystallography requires significant
    amounts of purified protein molecules (1014) to
    grow a crystal and protein needs to crystallize
  • NMR method applicable to proteins of small and
    average size, which do not crystallize
  • Both methods are expensive and give coherent
    results on the same protein, proving to be
    correct
  • Structure of many important proteins is still
    unknown

12
Protein structure in X-ray crystallography
  • X-ray diffraction pattern is recorded and
    processed using FFT to form electron density map
  • Regions of map with the highest electron density
    reveal the location of atomic nuclei

13
Family of structures in NMR method
  • Absorption of radio frequency energy is recorded
    as a 2D spectrum
  • Possible 3D structures are constructed by
    computer according to NMR signal

14
Computational methods to find a protein structure
  • The unique 3D arrangement of protein corresponds
    to lowest free energy conformation
  • Most computational approaches for solving the
    protein folding problem look for the lowest free
    energy conformation
  • Two principal methods are currently in use for
    computing the lowest energy conformation
  • Molecular dynamics
  • Monte Carlo

15
Molecular dynamics
  • Forces acting on each atom at a particular state
    of the system are calculated using an empirical
    force field
  • Atoms allowed to move with accelerations
    resulting from forces, changing conformation
  • Once atom moved significantly, acting forces are
    recalculated (every 10-15 sec)
  • Even super computers can simulate only 10-9 sec
    of folding time, which is insufficient

16
Monte Carlo method
  • Used with simplified model of protein (does not
    consider structure of every amino acid)
  • Procedure makes random move from current
    conformation and evaluates resulting energy
    changes
  • If new conformation is better, it replaces old
    one with newly generated, and process repeats
  • Method is not powerful enough to find an optimal
    conformation even for simple cases

17
Protein threading
  • Many proteins in nature are homologous, having
    different primary structure, but forming the same
    conformation to carry out the same functionality
    in a living matter and having the same
    evolutionary origin
  • Most protein share the secondary structure
    motifs
  • Helices
  • Extended strands forming sheets
  • Specific turns
  • Random coils

18
Protein threading
  • Threading means mapping a given sequence to a
    given structure
  • To assign a structure to a sequence one would
    then need to thread the sequence through all
    known conformations, evaluating compatibility,
    and assign the most compatible structure to the
    sequence
  • Upon discovery of completely different structure
    from any known, enter it into database of
    structures

19
Protein threading
  • Structure is presented by the black trace
  • Sequence (at the top) is threaded through the
    structure, encoding an alignment (at the bottom)
  • Zero means structure deletion, values greater
    that one mean sequence deletion, while one is a
    fit

20
Protein threading
  • The size of the search space to thread sequence
    of length k into structure of size n could be
    found as a selection with repetition
  • Search space is huge and problem appears to be
    NP-complete Unger,R., Moult,J. (1993)

21
Protein threading
  • In order to reduce complexity of search task, (m
    1) core and m non-core regions are introduced
  • Usually ?-helices and ?-sheets are core regions,
    connected by loops
  • Total number of amino acids in core regions is c

22
Protein threading
  • Although suffering from some inherent limitations
    (such as prediction of the right structure with
    completely wrong threading), method became a
    significant tool in protein structure prediction
  • Any threading procedure must contain two major
    components
  • An alignment algorithm to position a sequence on
    a structure
  • Score function to evaluate the energy of the
    sequence in given conformation

23
Protein threading possible implementations
  • Protein threading could be implemented using
  • Enumeration for small problems,
  • Dynamic programming to find core regions to
    freeze,
  • Monte Carlo variants with Gibbs sampling
  • Branch and bound search
  • Genetic programming with constraints seems to be
    a decent alternative in comparison with other
    methods

24
Protein threading using genetic programming
  • Genetic Algorithms are parallel computational
    tools that are based on the principle of
    diversity and selection
  • Solutions are represented as strings, for example
    11111100111311
  • Sum of all terms in the string needs to be equal
    to the number of amino acids in the sequence, as
    well as length of the string equal to the length
    of the structure

25
Protein threading using genetic programming
  • These strings are maintained as a population that
    undergoes evolutionary process via generic
    operators such as
  • Replication (copying of the string to the next
    generation)
  • Mutation (changing bits in the string)
  • Crossover (concatenating a prefix of one string
    with suffix of another)
  • Energy function is a good candidate to evaluate
    fit of an offspring

26
Energy function
  • Energy functions are subject to minimizations
  • Energy functions are calculated by extracting
    from the structural database frequencies of
    interactions between pairs of residues as a
    function of amino acids types and distance
  • Tendency of certain hydrophilic residues to be on
    the surface can be approximated by energy term
    related to the position

27
Implementing mutation
  • An example of mutation could be transformation of
    1111100111311 into 11111100211211, which is also
    a valid encoding
  • We need to have validity check every time we do
    mutation and compensate for problems
  • Reverting of substrings is especially interesting
    mutation, since it does not violate a valid
    structure of the solution

28
Implementing crossovers
Parent 1
1
1
2
0
1
1
2
0
1
1
1
1
1
1
Parent 2
1
1
1
1
1
1
0
0
1
1
1
3
1
1
Offsprings
1
1
2
0
1
1
2
0
1
1
1
3
1
1
1
1
1
1
1
1
0
0
1
1
1
1
1
1
29
Following issues were addressed
  • The linear trade-off between population size and
    the number of generations
  • Optimal level of mutation rate
  • Locality of mutation operator
  • Locality of the crossover operator
  • Regular mutations versus reverse mutations
  • Magnitude of the mutation operation
  • Quality control of the crossover operation

30
Results
  • For authors examples, the optimal performance is
    achieved with population size of 300 solutions
    and duration of 1000 generations
  • The optimal rate of mutations is 0.25 to 0.3 of
    the populations

31
The minimal energy of threading runs
32
The average energy of the population during
threading
33
Structural comparisons
Difference between sequence deletions
and structure deletions plots
Structural alignment
Most similar threading alignment
Least similar threading alignment
34
Maximal mutation magnitude
Average score of 5 runs after 600 generations
Average score of 5 runs after 2000 generations
35
Summary
  • The running time of a GA depends linearly on the
    number of solutions in the population (i.e.
    population size) and also depends linearly on the
    number of generations the process is repeated
  • Genetic algorithms method is a feasible and
    efficient approach to threading
  • It is especially encouraging that the threading
    alignments are quite similar, quantitatively, to
    the structural alignments

36
Summary
  • Changing the locality of the mutation and
    crossover operation does not show a consistent
    change in the performance of the algorithm
  • Mutations of high magnitude are
    counterproductive, probably because changes
    between the template and the assigned structure
    do not tend to concentrate in single position
  • Using crossover under strict quality control was
    shown not to be effective, since genetic
    mechanism has quality control itself

37
Summary
  • The success of the reverse mutation is quite
    surprising and should be further explored

38
Future work
  • Threading algorithms should be tested on their
    ability to assign a conformation for new and
    unknown sequence
  • Authors plan to implement the genetic algorithm
    in a complete threading package, with all the
    necessary components and to test it in a
    realistic prediction setup.
Write a Comment
User Comments (0)
About PowerShow.com