Protein structure prediction - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Protein structure prediction

Description:

Used with simplified model of protein (does not ... Homology modeling ... Modeling on Lattice is a way to fight the complexity of the prediction problem ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 40

Provided by: aaa250

Category:

more less

Transcript and Presenter's Notes

Title: Protein structure prediction

1
Protein structure prediction
Alexander Churbanov University of Nebraska at
Omaha CSCI 8980 February 14, 2002
2
Structure of the presentation

Introduction
Protein native structure
Computational methods of finding a native
structure
Common methods and principles
Specific methods
Homology finding
Threading
Modeling on lattice

3
Introduction

In Greek mythology, Sisyphus is condemned to an
eternity of hard labor his labor is a
frustrating and fruitless, for just as he is
about to achieve his goal, his work is undone and
he must start again from the beginning
Those who work in protein structure prediction
seem to share the same fate

4
Problem of protein structure prediction

Proteins are key molecules in all life processes
The function of a protein directly related to its
three dimensional structure
Knowing and understanding the structure of
proteins will have a tremendous impact on
understanding of biological processes, medical
discoveries, and biotechnological inventions

5
Problem of protein structure prediction

For over 30 years, there has been an ardent
search for methods to the predict
three-dimensional (3D) structure from the
sequence
Many methods were found which looked initially
very promising - but always the hope has been
dashed

6
Problem of protein structure preduction

Given a sequence of amino acids, predict the
unique 3D folding of molecule minimizing its free
energy

1
2
3
Lys
Computational Methods of prediction
Practical use of the 3D structural knowledge
Gly
Leu
Physical methods of prediction
Primary structure
7
General structure of an amino acid

Each amino acid consists of
Common main chain part, containing the heavy
atoms N, C, O, C? forming amide plane
Chain residue of size 0 10 additional atoms

8
Peptide bond

Peptide bond connects carboxyl group of the first
amino acid with amino group of the second acid
Peptide bonds are planar and rigid

9
Sequence of amino acids

Sequence of amino acids, connected by peptide
bonds, form protein
There is no flexibility for rotation around
peptide bond
There is more flexibility for protein to rotate
around N-C?-bond (called the ?-angle) and around
C-C?-bond (?-angle)
These angles are restricted to small regions in
natural proteins

10
Part of Protein (PheAspAla)
11
Protein folding

Using the freedom of rotations, the protein can
fold into a specific and unique three dimensional
structure (called conformation), forming a native
structure

12
Computational methods to find a protein structure

The unique 3D arrangement of protein corresponds
to lowest free energy conformation
Most computational approaches for solving the
protein folding problem look for the lowest free
energy conformation
Two principal methods are currently in use for
computing the lowest energy conformation
Molecular dynamics
Monte Carlo

13
Molecular dynamics

Forces acting on each atom at a particular state
of the system are calculated using an empirical
force field
Atoms allowed to move with accelerations
resulting from forces, changing conformation
Once atom moved significantly, acting forces are
recalculated (every 10-15 sec)
Even super computers can simulate only 10-9 sec
of folding time, which is insufficient

14
Monte Carlo method

Used with simplified model of protein (does not
consider structure of every amino acid)
Procedure makes random move from current
conformation and evaluates resulting energy
changes
If new conformation is better, it replaces old
one with newly generated, and process repeats
Method is not powerful enough to find an optimal
conformation even for simple cases

15
Knowledge based structure prediction methods

The most successful structure prediction tools
are knowledge-based, using a combination of
statistical theory and empirical rules
The most successful theoretical approach is
homology modelling

16
Homology modeling

Given a sequence of unknown fold (denote U), if U
has significant sequence similarity to a protein
of known structure (T) (i.e., if the pairwise
sequence identity is gt25), it is possible to
construct an approximate 3D model which has a
correct fold but inaccurate loop regions

17
Homology modeling

The basic assumption of homology modelling is
that U and the homologous template protein of
known structure (T) have nearly identical
backbone structure in the aligned regions
A new generation of alignment methods are based
on Hidden Markov Models and another on Genetic
algorithms

18
Homology modeling

For sequence identities down to about 30
sequence identity, U and T will still have the
same fold, but the number of loops inserted grows
and the divergence between U and T becomes
considerable
Modelling of loop regions is still a difficult
problem even the best methods only rarely
achieve atomic accuracy and are often completely
different to the correct structure

19
Homology modeling

A pessimistic view is that the accuracy of
resulting 3D predictions is typically at the
level of ribbon plots, i.e. the mutual
orientation of elements such as helices and
sheets can be identified
The optimistic version is that even down to
levels of 30 sequence identity homology
modelling occasionally yields correct predictions
at atomic resolution

20
Three difficult problems of homology modeling

Remote homology modelling (lt25) has three
obstacles to overcome
the remote homology between U and T has to be
detected
U and T have to be aligned correctly
the homology modelling procedure has to be
tailored to the harder problem of extremely low
sequence identity

21
Solution to the first problem

In the early 1990s, there was a great deal of
optimism that the first obstacle, the detection
of similar folds, would be solved by threading
methods
The basic idea is to thread the sequence of U
into the backbone 3D structure of T, at each step
evaluating the 'fitness of sequence for
structure' using environment-based or
knowledge-based mean-force-potentials

22
Protein threading

Many proteins in nature are homologous
They have different primary structure
They form similar conformation to carry out the
same functionality in a living matter
There are groups of proteins having the same
evolutionary origin

23
Protein threading

Most protein share the secondary structure
motifs
Helices
Extended strands forming sheets
Specific turns
Random coils

24
Protein threading

Threading means mapping a given sequence to a
given structure
To assign a structure to a sequence one would
then need to thread the sequence through all
known conformations, evaluating compatibility,
and assign the most compatible structure to the
sequence
Upon discovery of completely different structure
from any known, enter it into database of
structures

25
Protein threading

Structure is presented by the black trace
Sequence (at the top) is threaded through the
structure, encoding an alignment (at the bottom)
Zero means structure deletion, values greater
that one mean sequence deletion, while one is a
fit

26
Protein threading

The size of the search space to thread sequence
of length k into structure of size n could be
found as a selection with repetition
Search space is huge and problem appears to be
NP-complete Unger,R., Moult,J. (1993)

27
Protein threading

In order to reduce complexity of search task, (m
1) core and m non-core regions are introduced
Usually ?-helices and ?-sheets are core regions,
connected by loops
Total number of amino acids in core regions is c

28
Protein threading

Although suffering from some inherent limitations
(such as prediction of the right structure with
completely wrong threading), method became a
significant tool in protein structure prediction
Any threading procedure must contain two major
components
An alignment algorithm to position a sequence on
a structure
Score function to evaluate the energy of the
sequence in given conformation

29
Protein threading possible implementations

Protein threading could be implemented using
Enumeration for small problems,
Dynamic programming to find core regions to
freeze,
Monte Carlo variants with Gibbs sampling
Branch and bound search
Genetic programming with constraints seems to be
a decent alternative in comparison with other
methods

30
Protein structure prediction on lattice

Another way to model protein folding in 3D space
is to assume certain simplifications
Modeling on Lattice is a way to fight the
complexity of the prediction problem
Though the problem solution on Lattice is still
NP-complete, we can expand size of the protein
modeled significantly

31
Protein simplification for lattice model

Monomers (or residues) are represented using a
unified size
Bond length is unified
The positions of the monomers are restricted to
positions in a lattice
Simplified energy function

32
HP - model

20 letter alphabet of amino acids is reduced to a
two letter alphabet, namely H and P
H represents non-polar or hydrophobic amino acid
P represents polar or hydrophilic amino acid

33
The energy function

The energy function for HP-model is given by the
matrix

Energy contribution of a contact between two
monomers is 1 if both are H-monomers, and 0
otherwise

34
Contact energy

Two monomers form a contact in some specific
conformation if they are not connected via a
bond, but occupy neighboring positions in the
conformation
A conformation with minimal energy is just a
conformation with the maximal number of contacts
between H-monomers

35
Sample conformation

A sample conformation for the sequence PHPPHHPH
in the two-dimentional lattice with energy 2 is

36
Cubic lattice

Lattice 3D space

37
Native conformation
38
Vertical and horizontal contribution to the
surface of a conformation in
Vertical contribution to the surface
Horizontal contribution to the surface
39
Conclusions

Native 3D structures of proteins are encoded by a
linear sequence of amino acid residues
To predict 3D structure from sequence is a task
challenging enough to have occupied a generation
of researchers
Have they finally succeeded in their goal? The
bad news is no, we still cannot predict
structure for any sequence
The good news are we have come closer, and
growing databases facilitate the task.