Evolution teaches to predict protein structure and function - PowerPoint PPT Presentation

1 / 165
About This Presentation
Title:

Evolution teaches to predict protein structure and function

Description:

Sven Mika. Chien Peter Chen. Burkhard Rost. http://cubic.bioc.columbia.edu ... Hepan Tan. Columbia. Trevor Siggers. Columbia. Burkhard Rost (Columbia New York) ... – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 166
Provided by: burkha
Category:

less

Transcript and Presenter's Notes

Title: Evolution teaches to predict protein structure and function


1
Evolution teaches to predict protein structure
and function
  • Burkhard Rost
  • CUBIC Columbia University
  • rost_at_columbia.edu
  • http//www.columbia.edu/rost
  • http//cubic.bioc.columbia.edu/

2
Evolution teaches prediction
  • Is Bioinformatics up to the data deluge?
  • Sequence comparison do we know what we do?
  • conservation of structure and function
  • Structure prediction where are we today?
  • How to learn from the evolutionary odyssey?
  • secondary structure
  • transmembrane proteins
  • solvent accessibility
  • Are 1D predictions useful?
  • sub-cellular localisation
  • whole genomes
  • 3D structure threading
  • floppy regions

3
http//cubic.bioc.columbia.edu/
  • Volker Eyrich
  • Rajesh Nair
  • Jinfeng Liu
  • Dariusz Przybylski
  • Yanay Ofran
  • Henry Bigelow
  • Kazimierz Wrzeszczynski
  • Sven Mika
  • Chien Peter Chen
  • Burkhard Rost
  • Miguel Andrade EMBL
  • Sean ODonoghue LION
  • Andrej Sali Marc Marti-Renom Rockefeller
  • Alfonso Valencia Florencio Pazos
    Madrid
  • Michal Linial Jerusalem
  • Claus Andersen Copenhagen
  • Bastian Bruning Nijmegen
  • Hepan Tan Columbia
  • Trevor Siggers Columbia
  • http//cubic.bioc.columbia.edu/

4
CUBIC http//cubic.bioc.columbia.edu
Dariusz Przybylski
Trevor Siggers
Volker Eyrich
Murat Cokol
Jinfeng Liu
Hepan Tan
5
The Data Deluge
Conclusion Bioinformatics will have a hell of a
problem
6
Data Deluge what do we want?
7
Data Deluge numbers
50 1.200.000 500.000 2000 17.000 800 35.000
8
Data Deluge what CAN we do?
9
Data Deluge we CAN we do?
Not much yet
10
Evolution teaches prediction
  • Bioinformatics up to the data deluge? NO, but
    work in progress!
  • Sequence comparison do we know what we do?
  • conservation of structure and function
  • Structure prediction where are we today?
  • How to learn from the evolutionary odyssey?
  • secondary structure
  • transmembrane proteins
  • solvent accessibility
  • Are 1D predictions useful?
  • sub-cellular localisation
  • whole genomes
  • 3D structure threading
  • floppy regions

11
Dynamic programming optimal alignment
12
BLAST fast matching of single words
13
Profile-based comparison
14
Zones
15
Sequence -gt Structure
  • Sequence folds into unique structure S -gt T

16
Sequence -gt Structure
  • Sequence folds into unique structure S -gt T
  • Similar sequences fold into similar structures S
    S -gt T

17
Sequence -gt Structure
  • Sequence folds into unique structure S -gt T
  • Similar sequences fold into similar structures S
    S -gt T
  • Most sequences dont fold, at all S -gt no T

18
Twilight zone false positives explode
B Rost 1999 Prot. Engin.12, 85-94
19
Significant sequence identity
B Rost 1999 Prot. Engin.12, 85-94
20
Evolution did it !
B Rost 1999 Prot. Engin.12, 85-94
21
Similar sequence -gt similar structure?
B Rost 1999 Prot. Engin.12, 85-94
22
Detecting true hits in Twilight zone
B Rost 1999 Prot. Engin.12, 85-94
23
Finding similar structures in Twilight zone
B Rost 1999 Prot. Engin.12, 85-94
24
Secure thresholds for BLAST
B Rost 1999 Prot. Engin.12, 85-94
25
Accuracy vs. coverage
26
BLAST is not enough ...
B Rost 1999 Prot. Engin.12, 85-94
27
Sequence Space Hopping
B Rost 1999 Prot. Engin.12, 85-94
28
Success through sequence space hopping
B Rost 1999 Prot. Engin.12, 85-94
29
Zones
30
Profile-based database search
B Rost 2001 Structural Bioinformaticsin press
31
Profile-based database search
32
Profile-based database search
33
Profile-based database search
34
Profile-based database search
35
Profile-based database search
36
Zones
37
Hypothetical distribution of similar structures
38
FAKE DATA
39
Midnight zone real - random
AS Yang and B Honig 2000 J. Mol. Biol.301,
679-689
B Rost 1997 Folding Design2, S19-S24
40
Evolution into the Midnight zone
B Rost and S O'Donoghue 1998 EMBL preprint
41
Protein structures evolved at random - almost
  • average lt 10
  • -gt most pairs have random identity levels
  • 3 - 4 anchor residues
  • 4 billion years of evolution reached equilibrium
  • rate of creating new structures slower than drift
    towards mean
  • averages for convergent and divergent evolution
    similar
  • convergent evolution may have been a major event

42
Structure space
B Rost 1998 Structure6, 259-263
43
Gold-mine out of reach!
Percentage of pairs
44
Conservation of function
Devon Valencia 2000, Proteins, 41, pp. 98
45
Conservation of EC number
46
Conservation of EC number 2
47
Conservation of EC number BLAST
48
Conservation in detail
49
Accuracy vs. coverage EC number
50
Conservation of EC numbers
51
Evolution teaches prediction
  • Bioinformatics up to the data deluge? NO, but
    work in progress!
  • Know what we do? Some do, 30 over 100
    residues!
  • Structure prediction where are we today?
  • How to learn from the evolutionary odyssey?
  • secondary structure
  • transmembrane proteins
  • solvent accessibility
  • Are 1D predictions useful?
  • sub-cellular localisation
  • whole genomes
  • 3D structure threading
  • floppy regions

52
Notation protein structure 1D, 2D, 3D
53
(No Transcript)
54
(No Transcript)
55
Goal of structure prediction
Epstein Anfinsen, 1961 sequence uniquely
determines structure INPUT sequence
OUTPUT
56
Protein structure prediction in reality
57
(No Transcript)
58
Homology modelling/comparative modelling
  • assumption H and U homolgous 3D structures
  • strategy modelling of U based on H

59
Protein structure prediction in reality
60
Protein structure prediction in reality
Genome view
SWISS-PROT view
61
Structure prediction for protein universe
62
Improving prediction by waiting it out
1999
1995
1991
63
Evolution teaches prediction
  • Bioinformatics up to the data deluge? NO, but
    work in progress!
  • Know what we do? Some do, 30 over 100
    residues!
  • Where are we today? NO 3D prediction from
    sequence!
  • How to learn from the evolutionary odyssey?
  • secondary structure
  • transmembrane proteins
  • solvent accessibility
  • Are 1D predictions useful?
  • sub-cellular localisation
  • whole genomes
  • 3D structure threading
  • floppy regions

64
Evolution did it !
B Rost 1999 Prot. Engin.12, 85-94
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
Evolution teaches prediction
  • Bioinformatics up to the data deluge? NO, but
    work in progress!
  • Know what we do? Some do, 30 over 100
    residues!
  • Where are we today? NO 3D prediction from
    sequence!
  • Evolutionary odyssey applied
  • secondary structure 15 -gt 76 10
  • transmembrane proteins
  • solvent accessibility
  • Are 1D predictions useful?
  • sub-cellular localisation
  • whole genomes
  • 3D structure threading
  • floppy regions

69
Membrane prediction
70
HTM prediction waiting for database growth ...
1999
1996
1993
71
Topology for membrane helical proteins
72
PHDsec success on Poly-Valine
73
(No Transcript)
74
Refine by dynamic programming on NN energy
75
PHDhtmrefinetopologyprediction
76
PHDhtm on Poly-Valine
77
Example IS representative
78
To be or not to be (HTM)
79
False positives globular proteins
80
Details PHDsec Wrong alignment
  • single sequences gt accuracy clearly lower
  • sufficient information in multiple alignment
  • many sequences
  • diversity
  • wrong alignment -gt wrong prediction

ID IDE WSIM IFIR ILAS JFIR JLAS LALI
NGAP LGAP LSEQ ftsh_ecoli 1.00 1.00 1 644
1 644 644 0 0 644 ftsh_haein 0.76
0.84 256 635 1 380 380 0 0
381 ftsh_bacsu 0.50 0.62 3 630 6 637
623 6 14 637 ftsh_porpu 0.48 0.59 5
604 9 623 598 5 19 628 ftsh_lacla
0.46 0.57 1 638 12 695 635 7 52
695 ftsh_odosi 0.45 0.56 2 611 5 644
609 5 32 644
81
Details PHDhtm wrong for save alignment
....,....1....,....2....,.... AA
MAKNLILWLVIAVVLMSVFQSFGPSESNG OBS htm
HHHHHHHHHHHHHHHHHHHH PHD htm
Rel htm 999999999998888899999999
99999
82
Details PHDhtm correct for accurate alignment
....,....1....,....2....,.... AA
MAKNLILWLVIAVVLMSVFQSFGPSESNG OBS htm
HHHHHHHHHHHHHHHHHHHH PHD htm
HHHHHHHHHHH Rel htm 888776510000000000
01357899999 PHDRhtm HHHHHHHHHHHHHHHHHH
PHDThtm iiiiTTTTTTTTTTTTTTTTTTooooooo
83
Evolution teaches prediction
  • Bioinformatics up to the data deluge? NO, but
    work in progress!
  • Know what we do? Some do, 30 over 100
    residues!
  • Where are we today? NO 3D prediction from
    sequence!
  • Evolutionary odyssey applied
  • secondary structure 15 -gt 76 10
  • transmembrane proteins 10 -gt 65 topo ok
  • solvent accessibility
  • Are 1D predictions useful?
  • sub-cellular localisation
  • whole genomes
  • 3D structure threading
  • floppy regions

84
Defining residue solvent accessibility
85
(No Transcript)
86
Evolution for accessibility prediction
  • Detailed prediction problematic
  • Significant gain by evolutionary
    information in/out with gt 75 accuracy!

87
PHDacc the un-g(l)ory details
  • accuracy gt 75 (two states buried, exposed)
  • distribution with 10
  • stronger predictions more accurate
  • WARNING reliability index almost factor 2 too
    large for single sequences
  • accuracy below average for intermediate state
  • VERY dependent on alignment accuracy

88
Evolution teaches prediction
  • Bioinformatics up to the data deluge? NO, but
    work in progress!
  • Know what we do? Some do, 30 over 100
    residues!
  • Where are we today? NO 3D prediction from
    sequence!
  • Evolutionary odyssey applied
  • secondary structure 15 -gt 76 10
  • transmembrane proteins 10 -gt 65 topo ok
  • solvent accessibility 5 -gt 75
  • Are 1D predictions useful?
  • sub-cellular localisation
  • whole genomes
  • 3D structure threading
  • floppy regions

89
Evolution teaches prediction
  • Bioinformatics up to the data deluge? NO, but
    work in progress!
  • Know what we do? Some do, 30 over 100
    residues!
  • Where are we today? NO 3D prediction from
    sequence!
  • Evolutionary odyssey applied
  • secondary structure 15 -gt 76 10
  • transmembrane proteins 10 -gt 65 topo ok
  • solvent accessibility 5 -gt 75
  • Are 1D predictions useful? Of course to experts
  • sub-cellular localisation
  • whole genomes
  • 3D structure threading
  • floppy regions

90
(No Transcript)
91
(No Transcript)
92
(No Transcript)
93
(No Transcript)
94
(No Transcript)
95
Shuttle into the nucleus
Cytoplasm
Nucleus
96
How many NLS motifs in databases?
  • ONE in PROSITEbi-partite motif

Coverage
97
Experimental NLS positive charges
98
Experimental NLS more complicated
99
In silico mutagenisis
100
Increasing accuracy and coverage
Coverage
101
Increasing accuracy and coverage
Coverage
102
Increasing accuracy and coverage
Coverage
103
Increasing accuracy and coverage
Coverage
104
Increasing accuracy and coverage
Coverage
105
Nuclear protein in proteomes
106
Un-annotated nuclear proteins with NLS
  • ATAXIN-1 GERGHGGG
  • Breast Cancer type2 (Brc2) RIKKKQR
  • Fibroblast Growth factor (fgf) KKRRRRR
  • Brg1 ERKRRQ

107
Using NLS to bind DNA
108
DNA-binding predictions in proteomes
109
Rotation _at_ CUBIC.bioc.columbia.edu
  • want all cell-cycle protein
  • search in SWISS-PROT, PROSITE
  • search literature
  • build expert set of known

110
Significant motifs
111
Rotation _at_ CUBIC.bioc.columbia.edu
  • want all cell-cycle protein
  • search in SWISS-PROT, PROSITE
  • search literature
  • build expert set of known
  • choose unique subset

112
Finding unique subsets of proteins
113
Similar sequence -gt similar structure?
B Rost 1999 Prot. Engin.12, 85-94
114
Rotation _at_ CUBIC.bioc.columbia.edu
  • want all cell-cycle protein
  • search in SWISS-PROT, PROSITE
  • search literature
  • build expert set of known
  • choose unique subset
  • find motifs. sorry time run out, here!

115
Retention signals in ER and Golgi
116
Evolution teaches prediction
  • Bioinformatics up to the data deluge? NO, but
    work in progress!
  • Know what we do? Some do, 30 over 100
    residues!
  • Where are we today? NO 3D prediction from
    sequence!
  • Evolutionary odyssey applied
  • secondary structure 15 -gt 76 10
  • transmembrane proteins 10 -gt 65 topo ok
  • solvent accessibility 5 -gt 75
  • High-throughput success of predictions
  • localisation accessibility useful, but
    not enough!
  • whole genomes
  • 3D structure threading
  • floppy regions

117
(No Transcript)
118
Family size
Prokaryotes
Archeans
Aeropyrum pernix K1
Cumulative percentage of proteins
Eukaryotes
Number of proteins in family
119
Structure prediction for protein universe
120
Do we aim at getting one structure per fold?
  • Structural proteomics hunt for new folds
    ?Tough task for theory! -gt Practice Shrink
    complexes 14747 technicians!
  • Can we avoid non-globular proteins?
  • Can we prioritise aspects of function?

121
Similar amino acid composition
122
Inventory of life membrane proteins
Eukaryotes
Prokaryotes
Archaea
123
Number of membrane helices -gt complexity?
124
Membraneproteinskingdomsinventeddifferenttr
icks
125
The membraneLEGO
126
Length of globular regions in membrane proteins
127
Inventory of life coiled-coil proteins
128
Coiled-coil proteins details
129
Inventory of life compartments
130
Proteinstructureuniverse
131
Distribution of protein length
132
Bottleneck 5 money ...
  • Goal 500 in 5 years
  • money total of 25 M in 5 years
    50,000,000,000 Lire

133
What will we get?
  • many new structures
  • the machinery for structural genomics
  • some weired structures ...

134
Recipe to determine targets
  • Is it a known structure?
  • Is it similar to a known structure?
  • Is it a membrane protein?
  • Does it look like a known fold?
  • Does it look like a globular protein?
  • Is it a big family?
  • Is it short (NMR) does it contain Met (MAD)?

135
Alternative recipe to determine targets
  • Do we have a crystal?
  • Is it a known structure?
  • Is it similar to a known structure?

136
Reality checkthe invaluable contribution of
bioinformatics to target selection
137
Target selection
138
Priority classes
  • Experimental feasibility
  • Biophysical properties
  • length
  • presence of Methionine
  • Bioinformatics criteria
  • similarity to known structure
  • family size
  • functional annotation
  • Functional genomics

139
Target selection machinery
140
Conclusions Structural Genomics
  • we get
  • most major functional elements
  • most structural scaffolds
  • evolutionary links
  • structure-based comparison
  • high-throughput techniques
  • we wont get
  • complexes
  • interaction between them
  • particular structures
  • when?
  • 70 of the human genome by 2010 2015
  • remainder HTMs?

141
Evolution teaches prediction
  • Bioinformatics up to the data deluge? NO, but
    work in progress!
  • Know what we do? Some do, 30 over 100
    residues!
  • Where are we today? NO 3D prediction from
    sequence!
  • Evolutionary odyssey applied
  • secondary structure 15 -gt 76 10
  • transmembrane proteins 10 -gt 65 topo ok
  • solvent accessibility 5 -gt 75
  • High-throughput success of predictions
  • localisation accessibility useful, but
    not enough!
  • whole genomes kingdoms differ in some
    respects!
  • 3D structure threading
  • floppy regions

142
Midnight zone STRONGLY populated
143
What we are threading for
144
Goals of fold recognition, threading,remote
homology modelling
  • Recognising similar fold(s)
    (entire proteins)
  • Detecting remote homologies for fragments
    (part of protein)
  • Align target and fold
  • Remote homology modelling
    (prediction in 3D)

145
Two paths to fold recognition
146
TOPITS
147
Prediction-based threading
148
Example of remote sequence identity
149
30 correct first, better if stronger
150
Other threading methods
  • TOPITS is not the best!
  • CASP PredictionCenter.llnl.gov/content.html
  • CAFASP www.cs.bgu.ac.il/dfischer/CAFASP2/
  • EVA cubic.bioc.columbia.edu/eva/
  • CUBIC linkscubic.bioc.columbia.edu/doc/links_inde
    x.html

151
Evolution teaches prediction
  • Bioinformatics up to the data deluge? NO, but
    work in progress!
  • Know what we do? Some do, 30 over 100
    residues!
  • Where are we today? NO 3D prediction from
    sequence!
  • Evolutionary odyssey applied
  • secondary structure 15 -gt 76 10
  • transmembrane proteins 10 -gt 65 topo ok
  • solvent accessibility 5 -gt 75
  • High-throughput success of predictions
  • localisation accessibility useful, but
    not enough!
  • whole genomes kingdoms differ in some
    respects!
  • threading better than sequence alignment!
  • floppy regions (NORS no regular secondary
    structure)

152
Long floppy regions
  • less than 5 helix or strand over gt 70 residues

153
Floppy loops between domains
Formate Dehydrogenase H (1aa6.pdb)
phiX174 virion (1al0F.pdb)
DNA-containing capsid of CPV (4dpv.pdb)
Isoamylase (1bf2.pdb)
154
Floppy ends
pyruvateferredoxin oxidoredisoamylase (1b0pA.p
db)
Capsid protein of CPV(1b35C.pdb)
Hexon from adenovirus type 2 (1dhx.pdb)
Myeloperoxidase (1mhlA.pdb)
Aspartate aminotrans- ferase (2aat.pdb)
Prothrombin fragment 2 (2hppP.pdb)
SH3 domainof PLC-gamma (1hsq.pdb)
Hydroxylase com- ponent of MMOH (1mtyB.pdb)
155
Floppy-wrap
SH3 and adjacent ligand site (1awj.pdb)
Erythrocyte catalase (7cat.pdb)
GmDNV capsid protein (1dnx.pdb)
Cellulase (1tf4A.pdb)
Phosphoglycerate mutase (3pgm.pdb)
Carboxypeptidase T (1obr.pdb)
156
Weirdoes
Extracellular domain of T beta RI (1tbi.pdb)
HIVZ2 Tat protein (1tac.pdb)
Plasminogen Kringle 4 (1krn.pdb)
Gene 5 DNA binding protein (2gn5.pdb)
Recombinant Kringle 5 domain (5hpg.pdb)
Aspartate Trans- carbamoylase (9atc.pdb)
157
Weirdoes are not alone !
158
10 of biomass weird !
159
Length distribution of floppy regions
160
Weirdoes functional !
161
Yeast-2-hybrid interactions
162
Evolution teaches prediction
  • Bioinformatics up to the data deluge? NO, but
    work in progress!
  • Know what we do? Some do, 30 over 100
    residues!
  • Where are we today? NO 3D prediction from
    sequence!
  • Evolutionary odyssey applied
  • secondary structure 15 -gt 76 10
  • transmembrane proteins 10 -gt 65 topo ok
  • solvent accessibility 5 -gt 75
  • High-throughput success of predictions
  • localisation accessibility useful, but
    not enough!
  • whole genomes kingdoms differ in some
    respects!
  • threading better than sequence alignment!
  • NORS weirdoes not alone AND functional!

163
Conclusions
  • no prediction of 3D structure
  • no prediction of function
  • but quantum leap through using frozen
    knowledge from evolutionand protein structures
  • the data deluge floods bioinformatics
  • the unsolved urgent problems are legion
  • but it is still time to get it donerunning
    BLAST is NOT all there is the key is
    intelligent use of biological knowledge ...

164
Thanksgiving
  • Volker Eyrich Schrödinger, New York
  • Chris Sander Whitehead, Boston
  • Reinhard Schneider LION, Boston
  • Alfonso Valencia CNB, Madrid
  • Miguel Andrade EMBL, Heidelberg
  • Séan ODonoghue LION, Heidelberg
  • Amos Bairoch SIB, Genève
  • Michael Braxenthaler La Roche, New York
  • Søren Brunak CBS, København
  • Rita Casadio Univ. Bologna
  • Antoine De Daruvar LION, Bordeaux
  • David Eisenberg UCLA, Los Angeles
  • Piero Fariselli Univ. Bologna
  • Barry Honig Columbia, New York
  • Tim Hubbard Sanger, Hinxton
  • Michael Levitt Univ. Stanford
  • Marc Marti-Renom Rockefeller, New York
  • Andrej Sali Rockefeller, New York
  • Michael Scharf Take 5, Heidelberg

.. in general
localisation
165
Availability of methods
  • email PredictProtein_at_columbia.edu
  • subject HELP
  • file
  • WWW http//cubic.bioc.columbia.edu/predictprotein
    /
  • META http//cubic.bioc.columbia.edu/
    predictprotein/submit_meta.html
  • EVA http//cubic.bioc.columbia.edu/eva
  • CUBIC http//cubic.bioc.columbia.edu/

Email address options protein name SEQWENCE
Write a Comment
User Comments (0)
About PowerShow.com