Protein Structure Prediction - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Protein Structure Prediction

Description:

Title: PowerPoint Presentation Author: Xiaole Shirley Liu Last modified by: Jun Liu Created Date: 1/3/2005 7:27:35 PM Document presentation format – PowerPoint PPT presentation

Number of Views:235

Avg rating:3.0/5.0

Slides: 48

Provided by: XiaoleSh

Category:

more less

Transcript and Presenter's Notes

Title: Protein Structure Prediction

1
Protein Structure Prediction

Xiaole Shirley Liu
And
Jun Liu
STAT115

2
Protein Structure Prediction Ram
Samudrala University of Washington
3
Outline

Motivations and introduction
Protein 2nd structure prediction
Protein 3D structure prediction
CASP
Homology modeling
Fold recognition
ab initio prediction
Manual vs automation
Structural genomics

4
Protein Structure

Sequence determines structure, structure
determines function
Most proteins can fold by itself very quickly
Folded structure lowest energy state

5
Protein Structure

Main forces for considerations
Steric complementarity
Secondary structure preferences (satisfy H bonds)
Hydrophobic/polar patterning
Electrostatics

6
Rationale for understanding protein structure and
function
Protein sequence -large numbers of sequences,
including whole genomes
?
Protein function - rational drug design and
treatment of disease - protein and genetic
engineering - build networks to model cellular
pathways - study organismal function and evolution
7
Protein Databases

SwissProt protein knowledgebase
PDB Protein Data Bank, 3D structure

8
View Protein Structure

Free interactive viewers
Download 3D coordinate file from PDB
Quick and dirty
VRML
Rasmol
Chime
More powerful
Swiss-PdbViewer

9
Compare Protein Structures

Structure is more conserved than sequence
Why compare?
Detect evolutionary relationships
Identify recurring structural motifs
Predicting function based on structure
Assess predicted structures
Protein structure comparison and classification
Manual SCOP
Automated DALI

10
Compare protein structures

Need ways to determine if two protein structures
are related and to compare predicted
models to experimental structures
Commonly used measure is the root mean square
deviation (RMSD) of the Cartesian
atoms between two structures after optimal
superposition (McLachlan, 1979)
Usually use Ca atoms

Other measures include contact maps and torsion
angle RMSDs

11
SCOP

Compare protein
structure, identify
recurring structural
motifs, predict function
A. Murzin et al, 1995
Manual classification
A few folds are highly
populated
5 folds contain 20 of all homologous
superfamilies
Some folds are multifunctional

12
Determine Protein Structure

X-ray crystallography (gold standard)
Grow crystals, rate limiting, relies on the
repeating structure of a crystalline lattice
Collect a diffraction pattern
Map to real space electron density, build and
refine structural model
Painstaking and time consuming

13
Protein Structure Prediction

Since AA sequence determines structure, can we
predict protein structure from its AA sequence?
predicting the three angles, unlimited DoF!
Physical properties that determine fold
Rigidity of the protein backbone
Interactions among amino acids, including
Electrostatic interactions
van der Waals forces
Volume constraints
Hydrogen, disulfide bonds
Interactions of amino acids with water

14
Protein folding landscape Large
multi-dimensional space of changing conformations
free energy
folding reaction
15
Protein primary structure
16
2nd Structure Prediction

? helix, ? sheet, turn/loop

17
2nd Structure Prediction

Chou-Fasman 1974
Base on 15 proteins (2473 AAs) of known
conformation, determine P?, P? from
? 0.5-1.5
Empirical rules for 2nd struct nucleation
4 H? or h? out of 6 AA, extends to both dir, P? gt
1.03, P? gt P?, no ? breakers
3 H? or h? out of 5 AA, extends to both dir, P? gt
1.05, P? gt P?, no ? breakers
Have 50-60 accuracy

18
P? and P?
19
2nd Structure Prediction

Garnier, Osguthorpe, Robson, 1978
Assumption each AA influenced by flanking
positions
GOR scoring tables (problem limited dataset)
Add scores, assign 2nd with highest score

20
2nd Structure Prediction

D. Eisenberg, 1986
Plot hydrophobicity as function of sequence
position, look for periodic repeats
Period 3-4 AA, ? (3.6 aa / turn)
Period 2 AA, ? sheet
Best overall JPRED by Geoffrey Barton, use many
different approaches, get consensus
Overall accuracy 72.9

21
3D Protein Structure Prediction

CASP contest Critical Assessment of Structure
Prediction
Biannual meeting since 1994 at Asilomar, CA
Experimentalists before CASP, submit sequence of
to-be-solved structure to central repository
Predictors download sequence and minimal
information, make predictions in three categories
Assessors automatic programs and experts to
evaluate predictions quality

22
CASP Category I

Homology Modeling (sequences with high homology
to sequences of known structure)
Given a sequence with homology gt 25-30 with
known structure in PDB, use known structure as
starting point to create a model of the 3D
structure of the sequence
Takes advantage of knowledge of a closely related
protein. Use sequence alignment techniques to
establish correspondences between known
template and unknown.

23
CASP Category II

Fold recognition (sequences with no sequence
identity (lt 30) to sequences of known structure
Given the sequence, and a set of folds observed
in PDB, see if any of the sequences could adopt
one of the known folds
Takes advantage of knowledge of existing
structures, and principles by which they are
stabilized (favorable interactions)

24
CASP Category III

Ab initio prediction (no known homology with any
sequence of known structure)
Given only the sequence, predict the 3D structure
from first principles, based on energetic or
statistical principles
Secondary structure prediction and multiple
alignment techniques used to predict features of
these molecules. Then, some method necessary for
assembling 3D structure.

25
Structure Prediction Evaluation

Hydrophobic core similar?
2nd struct identified?
Energy minimized? H-bond contacts?
Compare with solved crystal structure gold
standard

26
Comparative modelling of protein structure
refine
27
Homology Modeling Results

When sequence homology is gt 70, high resolution
models are possible (lt 3 Å RMSD)
MODELLER (Sali et al)
Find homologous proteins with known structure and
align
Collect distance distributions between atoms in
known protein structures
Use these distributions to compute positions for
equivalent atoms in alignment
Refine using energetics

28
Homology Modeling Results

Many places can go wrong
Bad template - it doesnt have the same structure
as the target after all
Bad alignment (a very common problem)
Good alignment to good template still gives wrong
local structure
Bad loop construction
Bad side chain positioning

29
Homology Modeling Results

Use of sensitive multiple alignment (e.g.
PSI-BLAST) techniques helped get best alignments
Sophisticated energy minimization techniques do
not dramatically improve upon initial guess

30
Fold Recognition Results

Also called protein threading
Given new sequence and library of known folds,
find best alignment of sequence to each fold,
returned the most favorable one

31
Fold Recognition with Dynamic Programming

Environmental class for each AA based on known
folds (buried status, polarity, 2nd struct)

32
Protein Folding with Dynamic Programming

D. Eisenburg 1994
Align sequence to each fold (a string of
environmental classes)
Advantages fast and works pretty well
Disadvantages do not consider AA contacts

33
Fold Recognition Results

Each predictor can submit N top hits
Every predictor does well on something
Common folds (more examples) are easier to
recognize
Fold recognition was the surprise performer at
CASP1. Incremental progress at CASP2, CASP3,
CASP4

34
Fold Recognition Results

Alignment (seq to fold) is a big problem

35
ab initio

Predict interresidue contacts and then compute
structure (mild success)
Simplified energy term reduced search space
(phi/psi or lattice) (moderate success)
Creative ways to memorize sequence ?? structure
correlations in short segments from the PDB, and
use these to model new structures ROSETTA

36
Ab initio prediction of protein structure
sample conformational space such that native-like
conformations are found
hard to design functions that are not fooled by
non-native conformations (decoys)
astronomically large number of conformations 5
states/100 residues 5100 1070
37
Sampling conformational space continuous
approaches

Most work in the field
Molecular dynamics
Continuous energy minimization (follow a valley)
Monte Carlo simulation
Genetic Algorithms

Like real polypeptide folding process
Cannot be sure if native-like conformations are
sampled

38
Molecular dynamics

Force -dU/dx (slope of potential U)
acceleration, force m a(t)
All atoms are moving so forces between atoms are
complicated functions of time
Analytical solution for x(t) and v(t) is
impossible numerical solution is trivial
Atoms move for very short times of 10-15 seconds
or 0.001 picoseconds (ps)
x(tDt) x(t) v(t)Dt 4a(t) a(t-Dt)
Dt2/6
v(tDt) v(t) 2a(tDt)5a(t)-a(t-Dt) Dt/6
Ukinetic ½ S mivi(t)2 ½ n KBT
Total energy (Upotential Ukinetic) must not
change with time

acceleration
old velocity
old position
new position
new velocity
n is number of coordinates (not atoms)
39
Energy minimization

For a given protein, the energy depends on
thousands of x,y,z Cartesian atomic coordinates
reaching a deep minimum is not trivial
Furthermore, we want to minimize the free energy,
not just the potential energy.

40
Monte Carlo Simulation

Propose moves in torsion or Cartesian
conformation space
Evaluate energy after every move, compute ?E
Accept the new conformation based on
If run infinite time, the simulated conformation
follows the Boltzmann distribution
Many variations, including simulated annealing
and other heuristic approaches.

41
Scoring/energy functions

Need a way to select native-like conformations
from non-native ones
Physics-based functions electrostatics, van der
Waals, solvation, bond/angle terms.
Knowledge-based scoring functions
Derive information about atomic properties from a
database of experimentally determined
conformations
Common parameters include pairwise atomic
distances and amino acid burial/exposure.

42
Rosetta

D. Baker, U. Wash
Break sequence into short segments (7-9 AA)
Sample 3D from library of known segment
structures, parallel computation
Use simulated annealing (metropolis-type
algorithm) for global optimization
Propose a change, if better energy, take
otherwise take at smaller probability
Create 1000 structures, cluster and choose one
representative from each cluster to submit

43
Manual Improvements and Automation

Very often manual examination could improve
prediction
Catch errors
Need domain knowledge
A. Murzins success at CASP2
CAFASP Critical Assessment of Fully Automated
Structure Prediction
Murzin Cant play!!
MetaServers combine different methods to get
consensus

44
CAFASP Evaluation
45
Structural Genomics

With more and more solved structures and novel
folds, computational protein structure prediction
is going to improve
Structural genomics
Worldwide initiative to high throughput determine
many protein structures
Especially, solve structures that have no homology

46
Summary