Protein structure prediction - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Protein structure prediction

Description:

Used with simplified model of protein (does not ... Homology modeling ... Modeling on Lattice is a way to fight the complexity of the prediction problem ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 40
Provided by: aaa250
Category:

less

Transcript and Presenter's Notes

Title: Protein structure prediction


1
Protein structure prediction
Alexander Churbanov University of Nebraska at
Omaha CSCI 8980 February 14, 2002
2
Structure of the presentation
  • Introduction
  • Protein native structure
  • Computational methods of finding a native
    structure
  • Common methods and principles
  • Specific methods
  • Homology finding
  • Threading
  • Modeling on lattice

3
Introduction
  • In Greek mythology, Sisyphus is condemned to an
    eternity of hard labor his labor is a
    frustrating and fruitless, for just as he is
    about to achieve his goal, his work is undone and
    he must start again from the beginning
  • Those who work in protein structure prediction
    seem to share the same fate

4
Problem of protein structure prediction
  • Proteins are key molecules in all life processes
  • The function of a protein directly related to its
    three dimensional structure
  • Knowing and understanding the structure of
    proteins will have a tremendous impact on
    understanding of biological processes, medical
    discoveries, and biotechnological inventions

5
Problem of protein structure prediction
  • For over 30 years, there has been an ardent
    search for methods to the predict
    three-dimensional (3D) structure from the
    sequence
  • Many methods were found which looked initially
    very promising - but always the hope has been
    dashed

6
Problem of protein structure preduction
  • Given a sequence of amino acids, predict the
    unique 3D folding of molecule minimizing its free
    energy

1
2
3
Lys
Computational Methods of prediction
Practical use of the 3D structural knowledge
Gly
Leu
Physical methods of prediction
Primary structure
7
General structure of an amino acid
  • Each amino acid consists of
  • Common main chain part, containing the heavy
    atoms N, C, O, C? forming amide plane
  • Chain residue of size 0 10 additional atoms

8
Peptide bond
  • Peptide bond connects carboxyl group of the first
    amino acid with amino group of the second acid
  • Peptide bonds are planar and rigid

9
Sequence of amino acids
  • Sequence of amino acids, connected by peptide
    bonds, form protein
  • There is no flexibility for rotation around
    peptide bond
  • There is more flexibility for protein to rotate
    around N-C?-bond (called the ?-angle) and around
    C-C?-bond (?-angle)
  • These angles are restricted to small regions in
    natural proteins

10
Part of Protein (PheAspAla)
11
Protein folding
  • Using the freedom of rotations, the protein can
    fold into a specific and unique three dimensional
    structure (called conformation), forming a native
    structure

12
Computational methods to find a protein structure
  • The unique 3D arrangement of protein corresponds
    to lowest free energy conformation
  • Most computational approaches for solving the
    protein folding problem look for the lowest free
    energy conformation
  • Two principal methods are currently in use for
    computing the lowest energy conformation
  • Molecular dynamics
  • Monte Carlo

13
Molecular dynamics
  • Forces acting on each atom at a particular state
    of the system are calculated using an empirical
    force field
  • Atoms allowed to move with accelerations
    resulting from forces, changing conformation
  • Once atom moved significantly, acting forces are
    recalculated (every 10-15 sec)
  • Even super computers can simulate only 10-9 sec
    of folding time, which is insufficient

14
Monte Carlo method
  • Used with simplified model of protein (does not
    consider structure of every amino acid)
  • Procedure makes random move from current
    conformation and evaluates resulting energy
    changes
  • If new conformation is better, it replaces old
    one with newly generated, and process repeats
  • Method is not powerful enough to find an optimal
    conformation even for simple cases

15
Knowledge based structure prediction methods
  • The most successful structure prediction tools
    are knowledge-based, using a combination of
    statistical theory and empirical rules
  • The most successful theoretical approach is
    homology modelling

16
Homology modeling
  • Given a sequence of unknown fold (denote U), if U
    has significant sequence similarity to a protein
    of known structure (T) (i.e., if the pairwise
    sequence identity is gt25), it is possible to
    construct an approximate 3D model which has a
    correct fold but inaccurate loop regions

17
Homology modeling
  • The basic assumption of homology modelling is
    that U and the homologous template protein of
    known structure (T) have nearly identical
    backbone structure in the aligned regions
  • A new generation of alignment methods are based
    on Hidden Markov Models and another on Genetic
    algorithms

18
Homology modeling
  • For sequence identities down to about 30
    sequence identity, U and T will still have the
    same fold, but the number of loops inserted grows
    and the divergence between U and T becomes
    considerable
  • Modelling of loop regions is still a difficult
    problem even the best methods only rarely
    achieve atomic accuracy and are often completely
    different to the correct structure

19
Homology modeling
  • A pessimistic view is that the accuracy of
    resulting 3D predictions is typically at the
    level of ribbon plots, i.e. the mutual
    orientation of elements such as helices and
    sheets can be identified
  • The optimistic version is that even down to
    levels of 30 sequence identity homology
    modelling occasionally yields correct predictions
    at atomic resolution

20
Three difficult problems of homology modeling
  • Remote homology modelling (lt25) has three
    obstacles to overcome
  • the remote homology between U and T has to be
    detected
  • U and T have to be aligned correctly
  • the homology modelling procedure has to be
    tailored to the harder problem of extremely low
    sequence identity

21
Solution to the first problem
  • In the early 1990s, there was a great deal of
    optimism that the first obstacle, the detection
    of similar folds, would be solved by threading
    methods
  • The basic idea is to thread the sequence of U
    into the backbone 3D structure of T, at each step
    evaluating the 'fitness of sequence for
    structure' using environment-based or
    knowledge-based mean-force-potentials

22
Protein threading
  • Many proteins in nature are homologous
  • They have different primary structure
  • They form similar conformation to carry out the
    same functionality in a living matter
  • There are groups of proteins having the same
    evolutionary origin

23
Protein threading
  • Most protein share the secondary structure
    motifs
  • Helices
  • Extended strands forming sheets
  • Specific turns
  • Random coils

24
Protein threading
  • Threading means mapping a given sequence to a
    given structure
  • To assign a structure to a sequence one would
    then need to thread the sequence through all
    known conformations, evaluating compatibility,
    and assign the most compatible structure to the
    sequence
  • Upon discovery of completely different structure
    from any known, enter it into database of
    structures

25
Protein threading
  • Structure is presented by the black trace
  • Sequence (at the top) is threaded through the
    structure, encoding an alignment (at the bottom)
  • Zero means structure deletion, values greater
    that one mean sequence deletion, while one is a
    fit

26
Protein threading
  • The size of the search space to thread sequence
    of length k into structure of size n could be
    found as a selection with repetition
  • Search space is huge and problem appears to be
    NP-complete Unger,R., Moult,J. (1993)

27
Protein threading
  • In order to reduce complexity of search task, (m
    1) core and m non-core regions are introduced
  • Usually ?-helices and ?-sheets are core regions,
    connected by loops
  • Total number of amino acids in core regions is c

28
Protein threading
  • Although suffering from some inherent limitations
    (such as prediction of the right structure with
    completely wrong threading), method became a
    significant tool in protein structure prediction
  • Any threading procedure must contain two major
    components
  • An alignment algorithm to position a sequence on
    a structure
  • Score function to evaluate the energy of the
    sequence in given conformation

29
Protein threading possible implementations
  • Protein threading could be implemented using
  • Enumeration for small problems,
  • Dynamic programming to find core regions to
    freeze,
  • Monte Carlo variants with Gibbs sampling
  • Branch and bound search
  • Genetic programming with constraints seems to be
    a decent alternative in comparison with other
    methods

30
Protein structure prediction on lattice
  • Another way to model protein folding in 3D space
    is to assume certain simplifications
  • Modeling on Lattice is a way to fight the
    complexity of the prediction problem
  • Though the problem solution on Lattice is still
    NP-complete, we can expand size of the protein
    modeled significantly

31
Protein simplification for lattice model
  • Monomers (or residues) are represented using a
    unified size
  • Bond length is unified
  • The positions of the monomers are restricted to
    positions in a lattice
  • Simplified energy function

32
HP - model
  • 20 letter alphabet of amino acids is reduced to a
    two letter alphabet, namely H and P
  • H represents non-polar or hydrophobic amino acid
  • P represents polar or hydrophilic amino acid

33
The energy function
  • The energy function for HP-model is given by the
    matrix
  • Energy contribution of a contact between two
    monomers is 1 if both are H-monomers, and 0
    otherwise

34
Contact energy
  • Two monomers form a contact in some specific
    conformation if they are not connected via a
    bond, but occupy neighboring positions in the
    conformation
  • A conformation with minimal energy is just a
    conformation with the maximal number of contacts
    between H-monomers

35
Sample conformation
  • A sample conformation for the sequence PHPPHHPH
    in the two-dimentional lattice with energy 2 is

36
Cubic lattice
  • Lattice 3D space

37
Native conformation
38
Vertical and horizontal contribution to the
surface of a conformation in
Vertical contribution to the surface
Horizontal contribution to the surface
39
Conclusions
  • Native 3D structures of proteins are encoded by a
    linear sequence of amino acid residues
  • To predict 3D structure from sequence is a task
    challenging enough to have occupied a generation
    of researchers
  • Have they finally succeeded in their goal? The
    bad news is no, we still cannot predict
    structure for any sequence
  • The good news are we have come closer, and
    growing databases facilitate the task.
Write a Comment
User Comments (0)
About PowerShow.com