Title: Protein Tertiary Structure Prediction
1Protein Tertiary Structure Prediction
Structural Bioinformatics
2The Different levels of Protein Structure
Primary amino acid linear sequence.
Secondary ?-helices, ß-sheets and loops.
Tertiary the 3D shape of the fully folded
polypeptide chain
3Predicting 3D Structure
Outstanding difficult problem
- Comparative modeling (homology)
- Based on structural homology
- Fold recognition (threading)
Based on sequence homology
4Comparative Modeling
Based on Sequence homology
Similar sequences suggests similar structure
5Sequence and Structure alignments of two Retinol
Binding Protein
6Structure Alignments
There are many different algorithms for
structural Alignment.
The outputs of a structural alignment are a
superposition of the atomic coordinates and a
minimal Root Mean Square Distance (RMSD) between
the structures. The RMSD of two aligned
structures indicates their divergence from one
another.
Low values of RMSD mean similar structures
7Comparative Modeling
Based on Sequence homology
Similar sequence suggests similar structure
- Builds a protein structure model based on its
alignment to one or more related protein
structures in the database
8Comparative Modeling
Based on Sequence homology
- Accuracy of the comparative model is related to
the sequence identity on which it is based - gt50 sequence identity high accuracy
- 30-50 sequence identity 90 modeled
- lt30 sequence identity low accuracy (many
errors)
9Homology Threshold for Different Alignment
Lengths
Homology Threshold (t)
Alignment length (L)
A sequence alignment between two proteins is
considered to imply structural homology if the
sequence identity is equal to or above the
homology threshold t in a sequence region of a
given length L.
The threshold values t(L) are derived from PDB
10Comparative Modeling
- Similarity particularly high in core
- Alpha helices and beta sheets preserved
- Even near-identical sequences vary in loops
11Comparative Modeling Methods
Based on Sequence homology
MODELLER (Sali Rockefeller/UCSF)
SCWRL (Dunbrack- UCSF )
SWISS-MODEL http//swissmodel.expasy.org//SWISS-M
ODEL.html
12Comparative Modeling
Based on Sequence homology
Modeling of a sequence based on known
structures Consist of four major steps
- Finding a known structure(s) related to the
sequence to be modeled (template), using sequence
comparison methods such as PSI-BLAST
2. Aligning sequence with the templates
3. Building a model
4. Assessing the model
13Fold Recognition
Based on Structure homology
14Based on Secondary Structure
Protein Folds sequential and spatial arrangement
of secondary structures
Hemoglobin
TIM
15Similar folds usually mean similar function
Transcription factors
Homeodomain
16The same fold can have multiple functions
Rossmann
12 functions
31 functions
TIM barrel
17Fold Recognition
Based on Structure homology
- Methods of protein fold recognition attempt to
detect similarities between protein 3D structure
that have no significant sequence similarity. - Search for folds that are compatible with a
particular sequence. - "the turn the protein folding problem on it's
head rather than predicting how a sequence will
fold, they predict how well a fold will fit a
sequence
18Based on Structure homology
Basic steps in Fold Recognition Compare
sequence against a Library of all known Protein
Folds (finite number)
Query sequence
MTYGFRIPLNCERWGHKLSTVILKRP...
Goal find to what folding template the sequence
fits best
There are different ways to evaluate
sequence-structure fit
19Based on Secondary Structure homology
There are different ways to evaluate
sequence-structure fit
1) ... 56) ... n)
...
...
MAHFPGFGQSLLFGYPVYVFGD...
-10 ... -123 ... 20.5
20Programs for fold recognition
Based on Secondary Structure homology
- TOPITS (Rost 1995)
- GenTHREADER (Jones 1999)
- SAMT02 (UCSC HMM)
- 3D-PSSM http//www.sbg.bio.ic.ac.uk/3dpssm/
21Ab Initio Modeling
- Compute molecular structure from laws of physics
and chemistry alone - Theoretically Ideal solution
- Practically nearly impossible
- WHY ?
- Exceptionally complex calculations
- Biophysics understanding incomplete
22Ab Initio Methods
- Rosetta (Bakers lab, Seattle)
- Undertaker (Karplus, UCSC)
23CASP - Critical Assessment of Structure Prediction
- Competition among different groups for resolving
the 3D structure of proteins that are about to be
solved experimentally. - Current state -
- ab-initio - the worst, but greatly improved in
the last years. - Modeling - performs very well when homologous
sequences with known structures exist. - Fold recognition - performs well.
24What can you do?FOLDITSolve Puzzles for Science
- A computer game to fold proteins
- http//fold.it/portal/puzzles
25Whats Next
- Predicting function from structure
26Structural Genomics a large scale structure
determination project designed to cover all
representative protein structures
ATP binding domain of protein MJ0577
Zarembinski, et al., Proc.Nat.Acad.Sci.USA,
9915189 (1998)
27As a result of the Structure Genomic initiative
many structures of proteins with unknown
function will be solved
28Approaches for predicting function from structure
ConSurf - Mapping the evolution conservation
on the protein structure
http//consurf.tau.ac.il/
29Approaches for predicting function from structure
PFPlus Identifying positive electrostatic
patches on the protein structure
http//pfp.technion.ac.il/
30A method to distinguish DNA from RNA-binding
proteins
DNA binding interface
RNA binding interface
31RNA and DNA binding interfaces tend to have
different geometric features
DNA binding interface
RNA binding interface
32Applying Differential Geometry to characterize
DNA and RNA binding proteins
k1 - minimal curvature
K2- MAXIMAL CURVATURE
H(k1k2)/2 Mean Curvature Kk1k2 Gaussian
Curvature
33Applying Differential Geometry to characterize
DNA and RNA proteins
Flat
Peak
Pit
Minimal Surface
Saddle ridge
Ridge
Valley
Saddle valley
34Applying Differential Geometry for DNA and RNA
function prediction
Frequency of points
35RNA binding surfaces are distinguished from DNA
binding surfaces based on Differential Geometric
features
76 RNA-binding
78 DNA binding
36Differential Geometry can correctly
determine whether a given binding domain binds
RNA or DNA
Frequency of points
RNA pattern
DNA pattern
Shazman et al, NAR 2011
37How can we view the protein structure ?
- Download the coordinates of the structure from
the PDB - http//www.rcsb.org/pdb/
- Launch a 3D viewer program
- For example we will use the program Pymol
- The program can be downloaded freely from
- the Pymol homepage http//pymol.org
- Upload the coordinates to the viewer
38Pymol example
- Launch Pymol
- Open file 1aqb (PDB coordinate file)
- Display sequence
- Hide everything
- Show main chain / hide main chain
- Show cartoon
- Color by ss
- Color red
- Color green, resi 140
Help http//pymol.org