Title: Computational Short Cuts to Protein Structure and Function: Fold Recognition Methods.
1Computational Short Cuts to Protein Structure and
Function Fold Recognition Methods.
- Jarek Meller
- Biomedical Informatics
- Childrens Hospital Research Foundation
2Outline
- Introduction
- Fold recognition sequence similarity vs.
threading - Common models and algorithms
- Fold recognition servers and annotation
strategies - Discussion
3Introduction
- Protein machinery of life from sequence to
structure to function (from DNA to mRNA to
protein sequence to protein structure to
protein-protein/DNA/RNA/small molecules
interactions to phenotype) - Deciphering protein structure experiment vs.
simulation ( Computer-Aided Short Cuts CASH ) - Fold recognition nature as best computational
device
4Three lovely proteins hemoglobin
- Four units carrying oxygen
- Sickle-cell anemia inherited disease
- Glu6 Val6 mutation causes aggregation
5Three lovely proteins gramicidin
- Transmembrane ion channel
- Bacteria killer - antibiotic
6Three lovely proteins ras p21
- Molecular switch based on GTP hydrolysis
- Cellular growth control and cancer
- Ras oncogene single point mutations at positions
Gly12 or Gln61
7Significance of Protein Folding Problem
VLSEGEWQLVLV . . .
Sequence structure
function
folds into a 3D
to perform a
8Sequence Structure Function
Same fold, different function
- Same function,
- different fold
Homologous sequences
9Sequence Structure Function
- Continuous nature of folds, multiple functions
- SCOP up to 7 folds per function and up to 15
functions per fold - Divergent (common ancestor) vs. convergent (no
ancestor) evolution - PDB virtually all proteins with 30 seq.
identity have similar structures, however most of
the similar structures share only up to 10 of
seq. identity ! - www.columbia.edu/rost/Papers/1997_evolutio
n/paper.html (B. Rost) - www.bioinfo.mbb.yale.edu/genome/foldfunc/
(H. Hegyi, M. Gerstein)
10Classifications of protein shapes and families
- SCOP (Structural Classification of Proteins,
scop.berkeley.edu, Murzin et. al.) - 548 folds (major structural similarity in
terms of secondary structures e.g. globin-like,
Rossman fold) 1296 families (clear evolutionary
relationship or homology e.g. globins, Ras) - CATH (Class, Architecture, Topology, Homologous
Superfamily, www.biochem.ucl.ac.uk/bsm/cath/,
Orengo et. al) - 35 architectures (gross arrangment of
secondary structures e.g. non-bundle, sandwich)
580 topologies (connectivity of secondary
structures e.g. globin-like, Rossman fold) 1846
families (clear homology, same function)
11Deciphering protein structure and function
- Experiment (X-ray, NMR) months
- Experiments can be lengthy and costly.
Therefore computational methods are often used to
focus and facilitate experimental research. - Atomistic (physical principles based)
simulations weeks - Homology based modeling hours
- Sequence similarity based annotations seconds
12Computational complexity price of accurate models
- Huge search problem - scaling with size in
protein folding - No. of conformations 10
n - Rugged energy landscape and local minima problem
Nature performs these computations efficiently
and one can use solutions provided by nature as
templates from protein folding
to protein recognition.
13Assigning fold and function utilizing similarity
to experimentally characterized proteins
- Sequence similarity BLAST and others
- Beyond sequence similarity matching sequences
and shapes (threading)
14Sequence to structure matching (threading) may
detect distantly related proteins due to
conservation of structure.
In practice fold recognition methods are often
mixtures of sequence matching and
threading. D.Fischer and D. Eisenberg, Curr.
Opinion in Struct. Biol. 1999, 9 208
15We need a scoring (energy) function to
distinguish native structure from misfolded
structures.
Ideally, each misfolded structure should have an
energy higher than the native energy, i.e.
Emisfolded - Enative gt 0
E
misfolded
native
16Reduced Representations of
Protein Structure
Each amino acid represented by a point in the 3D
space simple contact model two amino acids in
contact if their distance smaller than a cutoff.
17(No Transcript)
18How to choose an energy function?
- Functional form
- contact potential?
- profile model?
- Accuracy vs. efficiency (R.H. Lathrop
protein threading problem with contact potentials
is NP-complete, Protein Eng. 7, 1994). - Optimization of parameters
- Linear Programming!
- Edecoy - Enative gt 0
- V.N. Mairov G.M. Crippen, JMB 227, 1992.
19(No Transcript)
20(No Transcript)
21Methodological kit
- Dynamic programming optimal string matching
- Neural networks secondary structure predictions
(PsiPRED, Jones DT, JMB 292 195) - Hidden Markov Models family profiles, secondary
and tertiary structure prediction (TMHMM by A.
Krogh and co-workers, http//www.cbs.dtu.dk/krogh/
refs.html ) - Monte Carlo suboptimal solutions (Mirny LA,
Shakhnovich EI, Protein Structure Prediction By
Threading. Why It Works Why It Does Not, JMB 283
507)
22Fold recognition servers
- PsiBLAST (Altschul SF et. al., Nucl. Acids Res.
25 3389) - Live Bench evaluation (http//BioInfo.PL/LiveBench
/1/) - FFAS (L. Rychlewski, L. Jaroszewski, W. Li, A.
Godzik (2000), Protein Science 9 232) seq.
profile against profile - 3D-PSSM (Kelley LA, MacCallum RM, Sternberg JE,
JMB 299 499 ) 1D-3D profile combined with
secondary structures and solvation potential - GenTHREADER (Jones DT, JMB 287 797) seq.
profile combined with pairwise interactions and
solvation potential - LOOPP annotations of orphan sequences
- http//www.tc.cornell.edu/CBIO/loopp
23Annotations Strategies
- Use first sequence methods (with polypeptide
chains if possible) and remember profile methods
(e.g. PsiBLAST, SAM) are much more sensitive than
pairwise alignments! (Park et. al., Sequence
comparisons using multiple sequences detect three
times as many remote homologues as pairwise
methods. JMB 284 1201) - Still nothing? Submit your sequence to
transmembrane prediction (more than 90
reliability) and secondary structure prediction
servers (70 to 80 reliability). (e.g. TMHMM by
A. Krogh et. al., PsiPRED, D.T. Jones, JMB 292
195 ) - Having a reasonably good feeling about different
domains on your beloved protein submit
alternative queries to fold recognition servers.
Use all trustworthy servers and pay attention to
their estimates of statistical significance. - Re-evaluate check consistency with expected
sequence motifs, active sites, disulphide bridges
etc., validate predictions using all the
knowledge about your protein! Use consensus, but
without rejecting biologically interesting
conclusions.