Computational Short Cuts to Protein Structure and Function: Fold Recognition Methods.

1 / 23

About This Presentation

Title:

Computational Short Cuts to Protein Structure and Function: Fold Recognition Methods.

Description:

SCOP: up to 7 folds per function and up to 15 functions per fold ... SCOP (Structural Classification of Proteins, scop.berkeley.edu, Murzin et. al. ... –

Number of Views:85

Avg rating:3.0/5.0

Slides: 24

Provided by: foldin

Category:

more less

Transcript and Presenter's Notes

Title: Computational Short Cuts to Protein Structure and Function: Fold Recognition Methods.

1
Computational Short Cuts to Protein Structure and
Function Fold Recognition Methods.

Jarek Meller
Biomedical Informatics
Childrens Hospital Research Foundation

2
Outline

Introduction
Fold recognition sequence similarity vs.
threading
Common models and algorithms
Fold recognition servers and annotation
strategies
Discussion

3
Introduction

Protein machinery of life from sequence to
structure to function (from DNA to mRNA to
protein sequence to protein structure to
protein-protein/DNA/RNA/small molecules
interactions to phenotype)
Deciphering protein structure experiment vs.
simulation ( Computer-Aided Short Cuts CASH )
Fold recognition nature as best computational
device

4
Three lovely proteins hemoglobin

Four units carrying oxygen
Sickle-cell anemia inherited disease
Glu6 Val6 mutation causes aggregation

5
Three lovely proteins gramicidin

Transmembrane ion channel
Bacteria killer - antibiotic

6
Three lovely proteins ras p21

Molecular switch based on GTP hydrolysis
Cellular growth control and cancer
Ras oncogene single point mutations at positions
Gly12 or Gln61

7
Significance of Protein Folding Problem
VLSEGEWQLVLV . . .

Sequence structure
function
folds into a 3D
to perform a
8
Sequence Structure Function
Same fold, different function

Same function,
different fold

Homologous sequences
9
Sequence Structure Function

Continuous nature of folds, multiple functions
SCOP up to 7 folds per function and up to 15
functions per fold
Divergent (common ancestor) vs. convergent (no
ancestor) evolution
PDB virtually all proteins with 30 seq.
identity have similar structures, however most of
the similar structures share only up to 10 of
seq. identity !
www.columbia.edu/rost/Papers/1997_evolutio
n/paper.html (B. Rost)
www.bioinfo.mbb.yale.edu/genome/foldfunc/
(H. Hegyi, M. Gerstein)

10
Classifications of protein shapes and families

SCOP (Structural Classification of Proteins,
scop.berkeley.edu, Murzin et. al.)
548 folds (major structural similarity in
terms of secondary structures e.g. globin-like,
Rossman fold) 1296 families (clear evolutionary
relationship or homology e.g. globins, Ras)
CATH (Class, Architecture, Topology, Homologous
Superfamily, www.biochem.ucl.ac.uk/bsm/cath/,
Orengo et. al)
35 architectures (gross arrangment of
secondary structures e.g. non-bundle, sandwich)
580 topologies (connectivity of secondary
structures e.g. globin-like, Rossman fold) 1846
families (clear homology, same function)

11
Deciphering protein structure and function

Experiment (X-ray, NMR) months
Experiments can be lengthy and costly.
Therefore computational methods are often used to
focus and facilitate experimental research.
Atomistic (physical principles based)
simulations weeks
Homology based modeling hours
Sequence similarity based annotations seconds

12
Computational complexity price of accurate models

Huge search problem - scaling with size in
protein folding
No. of conformations 10
n
Rugged energy landscape and local minima problem

Nature performs these computations efficiently
and one can use solutions provided by nature as
templates from protein folding
to protein recognition.
13
Assigning fold and function utilizing similarity
to experimentally characterized proteins

Sequence similarity BLAST and others
Beyond sequence similarity matching sequences
and shapes (threading)

14
Sequence to structure matching (threading) may
detect distantly related proteins due to
conservation of structure.
In practice fold recognition methods are often
mixtures of sequence matching and
threading. D.Fischer and D. Eisenberg, Curr.
Opinion in Struct. Biol. 1999, 9 208
15
We need a scoring (energy) function to
distinguish native structure from misfolded
structures.
Ideally, each misfolded structure should have an
energy higher than the native energy, i.e.
Emisfolded - Enative gt 0
E
misfolded
native
16
Reduced Representations of
Protein Structure
Each amino acid represented by a point in the 3D
space simple contact model two amino acids in
contact if their distance smaller than a cutoff.

17
(No Transcript)
18
How to choose an energy function?

Functional form
contact potential?
profile model?
Accuracy vs. efficiency (R.H. Lathrop
protein threading problem with contact potentials
is NP-complete, Protein Eng. 7, 1994).
Optimization of parameters
Linear Programming!
Edecoy - Enative gt 0
V.N. Mairov G.M. Crippen, JMB 227, 1992.

19
(No Transcript)
20
(No Transcript)
21
Methodological kit

Dynamic programming optimal string matching
Neural networks secondary structure predictions
(PsiPRED, Jones DT, JMB 292 195)
Hidden Markov Models family profiles, secondary
and tertiary structure prediction (TMHMM by A.
Krogh and co-workers, http//www.cbs.dtu.dk/krogh/
refs.html )
Monte Carlo suboptimal solutions (Mirny LA,
Shakhnovich EI, Protein Structure Prediction By
Threading. Why It Works Why It Does Not, JMB 283
507)

22
Fold recognition servers

PsiBLAST (Altschul SF et. al., Nucl. Acids Res.
25 3389)
Live Bench evaluation (http//BioInfo.PL/LiveBench
/1/)
FFAS (L. Rychlewski, L. Jaroszewski, W. Li, A.
Godzik (2000), Protein Science 9 232) seq.
profile against profile
3D-PSSM (Kelley LA, MacCallum RM, Sternberg JE,
JMB 299 499 ) 1D-3D profile combined with
secondary structures and solvation potential
GenTHREADER (Jones DT, JMB 287 797) seq.
profile combined with pairwise interactions and
solvation potential
LOOPP annotations of orphan sequences
http//www.tc.cornell.edu/CBIO/loopp

23
Annotations Strategies

Use first sequence methods (with polypeptide
chains if possible) and remember profile methods
(e.g. PsiBLAST, SAM) are much more sensitive than
pairwise alignments! (Park et. al., Sequence
comparisons using multiple sequences detect three
times as many remote homologues as pairwise
methods. JMB 284 1201)
Still nothing? Submit your sequence to
transmembrane prediction (more than 90
reliability) and secondary structure prediction
servers (70 to 80 reliability). (e.g. TMHMM by
A. Krogh et. al., PsiPRED, D.T. Jones, JMB 292
195 )
Having a reasonably good feeling about different
domains on your beloved protein submit
alternative queries to fold recognition servers.
Use all trustworthy servers and pay attention to
their estimates of statistical significance.
Re-evaluate check consistency with expected
sequence motifs, active sites, disulphide bridges
etc., validate predictions using all the
knowledge about your protein! Use consensus, but
without rejecting biologically interesting
conclusions.