Title: Bioinformatics of membrane proteins
1Bioinformatics of membrane proteins
Gunnar von Heijne Department of Biochemistry and
Biophysics Stockholm Bioinformatics
Center Stockholm University
2I. The physical view
3A simulated lipid bilayer(Jakobsson, TiBS 22339)
Physical properties
4Physcial properties of the lipid bilayer (White
et al., JBC 27632395)
interactrions
5Molecular interactions in membrane
proteins(White et al., JBC 27632395)
2 structures
6Only two basic structures(Quart.Rev.Biophys.
32285)
ß-barrel
Lipid/prot interactions
7Lipid-protein interactions(Mitsuoka et al., JMB
286861)
Exposed/buried
8Lipid-exposed vs. buried residues(Prot. Sci.
6808)
conservation
9Lipid-exposed residues are less conserved than
buried ones
Membrane assembly
10II. Assembly in vivo
11Protein sorting in a eukaryotic cell
SRP pwy
12The SRP/Sec61 pathway
Ffh
13The Ffh M-domain/4.5 S complex(Batey et al.,
Science 2871232)
ribosome
14The nascent chain tunnel(Nissen et al., Science
289920)
Rib-Sec61
15The ribosome-Sec61p complex(Beckmann et al.,
Science 2782123)
16The translocation channel(Beckmann et al.,
Science 2782123)
movie
17The basic model(graphics by Bill Skach)
prediction
18III. Prediction - basics
19What we want
- know all membrane proteins - know their
topology - know their 3D structure - know their
function and more...
TM lengths
20TM helix lengths are typically 20-30
residues(Bowie, JMB 272780)
Trp, Tyr
21Trp Tyr are enriched in the region near the
lipid headgroups(Prot.Sci. 6808 72026)
Loop lengths
22Loops tend to be short(Tusnady Simon, JMB
283489)
PI rule
23The positive inside rule(EMBO J. 53021 EJB
174671, 2051207 FEBS Lett. 28241)
Bacterial IM in 16 KR out 4 KR Eukaryotic
PM in 17 KR out 7 KR Thylakoid membrane in
13 KR out 5 KR Mitochondrial IM In 10 KR
out 3 KR
out
in
prediction
24IV. Topology prediction
25Topology prediction - a classical problem in
bioinformatics
4 characteristics
26Four important characteristics
short loops
20 hydrophobic residues
Positive inside rule
predictors
27Popular topology predictors
TMHMM (HMM) HMMTOP (HMM) TopPred (h-plot
PI-rule) MEMSAT (dynamic programming) TMAP
(h-plot, mult. alignment) PHD (NN, mult.
alignment)
toppred
28TopPred(JMB 225487)
- construct all possible topologies - rank based
on D
E. coli LacY
http//bioweb.pasteur.fr/ seqanal/interfaces/ topp
red.html
TMHMM
29TMHMM(Sonnhammer et al., ISMB 6175, Krogh et
al., JMB 305567)
A hidden Markov model-based method
www.cbs.dtu.dk
h l models
30Helix loop models in TMHMM
HMMTOP
31TMHMM performance(Krogh et al., JMB 305567)
Discrimination globular/membrane sens spec gt
98 Correct topology 65-70 Single TM
identification sensitivity 96 specificity
98 Training set 160 membrane proteins 650
globular proteins
of TM proteins
32How many membrane proteins are there?
- identify all TMHMM hits with TM 1 - remove
secretory proteins from the 1TM class using
SignalP-HMM
results
3320-25 of all ORFs encode membrane proteins
C. elegans 30 D. melanogaster 20 S.
cerevisiae 21 A. thaliana 23 B.
subtilis 24 E. coli 21 M. genitalium 20
T. maritima 24 A. fulgidus 20 P.
horikoshii 26
consensus
34Consensus predictions indicate
reliability(FEBS Lett. 486267)
60 E. coli proteins
5 prediction methods used 46 of 764 predicted E.
coli IM proteins are in the 5/0 or 4/1 classes
fraction correct/coverage
majority level
Partial consensus
35Partial consensus topologies(Prot. Sci. 112974)
- 89 of all predicted partial topologies
correct (5/0 class) - 72 of all predicted E.
coli IM proteins covered (5/0 class)
TMHMM reliability
36TMHMM reliability scores(Melén et al., JMB in
press)
TMHMM output
1. Mean probability pmean 2. Minimum probability
pmin(label) 3. PbestPath/PallPaths
S3 results
37TMHMM (score 3)
Prediction accuracy vs. coverage
92 bacterial proteins
percent correct
70
45
coverage
Test set bias
38Experimentally known topologies is a biased
sample
Estimate true performance
39Correlation between accuracy and TMHMM S3 score
percent correct
mean score
genomes
40Expected TMHMM performance on proteomes
test set
percent correct
C. elegans
E. coli
S. cerevisiae
coverage
Add C-term.
41Improved performance by use of experimental
information
92 bacterial proteins
C-terminus known
percent correct
C-terminus not known
75
45
coverage
genomes
42Expected improvementwhen C-term. location is
known
TMHMMMEMSAT
43Improved performance by combining TMHMM and MEMSAT
Coverage 60 Accuracy 95
Red C/C Blue F/F White C/F Black F/C
MEMSAT score
TMHMM score
Fusion analysis
44Experimental topology determination by PhoA
fusions(Manoil, Meth.Cell Biol. 3461)
Periplasmic side
Cytoplasmic side
GFP
45Experimental topology determination by GFP
fusions(Drew et al., PNAS 992690)
Periplasmic side
Cytoplasmic side
12 topologies
46GFP and PhoA activitesof 12 E. coli IMPs(PNAS
992690)
GFP
PhoA
YnfA
MarC
PstA
TatC
YaeL
YcbM
YddQ
YdgE
YedZ
YgjV
YiaB
YigG
Medium-scale
47Medium-scale mapping of E. coli IMPs (N47)(Rapp
et al., in preparation)
results
48C-in topologies dominate inE. coli
yeast
49A dual reporter for topology mapping in
yeast(Deak Wolf, JBC 27610663)
50A dual reporter for topology mapping in
yeast(Deak Wolf, JBC 27610663)
51Large-scale topology mappingin yeast(Kim von
Heijne, JBC in press)
Glycosylation
Growth on histidinol
YGR290W
YGR055W
YGR105W
YOR376W
-
-
-
-
EndoH
results
52Results
39 proteins analysed 37 yield consistent results
in the two assays One of the 2 inconsistent
proteins is mitochondrial Predicted C-terminal
location correct for 31 of 37
End