Title: Bioinformatic analysis of protein complexes
1Bioinformatic analysis of protein complexes
- Roland Krause
- Cellzome AG, Heidelberg
2Overview
- The Proteome proteins and their interactions
- The yeast proteome project at Cellzome
- Experimental data generation
- Functional analysis
- Obtaining protein complexes
- Comparing protein complexes and interacting
proteins - Shared components Building blocks or biochemical
artifacts?
3Proteomics
- The study of the protein repertoire expressed in
the cell - Protein expression levels
- Qualitative
- Quantitative
- Localization
- Protein interactions
- Powerful tool for the elucidation of protein
function - Pair-wise interactions
- Protein complexes
- Protein complexes
- Visible structural bodies
- Important players in molecular life
4Acknowledgements
- Cellzome AG
- Dr. Giulio Superti-Furga
- Dr. Gitte Neubauer
- Yeast biology
- Dr. Anne-Claude Gavin
- Dr. Paola Grandi
- Dr. Christina Rau
- Mass spectrometry
- Bernhard Küster, PhD
- Markus Bösche
- Bioinformatics
- Dr. Georg Casari
- Jens Rick
- Julien Gagneur
- EMBL
- Dr. Peer Bork
- Dr. Christian von Mering
- Biozentrum Würzburg
- Prof. Dr. Thomas Dandekar
- Prof. Dr. Jörg Schultz
5The Yeast Proteome Projectat Cellzome
- Tandem Affinity Purification (TAP)
- Mass-Spectrometry (MS)
6Workflow TAP-MS
- Homologous transformation (addition of TAP-tag)
- Test for expression
- Large scale culture
- Purification
- Gel separation of complexes
- Mass spectrometry (MALDI)
7Large scale culture
Transformation
Separation
LIMS system
An integrated workflow
Mass spectrometry
8Handling laboratory informationand annotation
- Selection of protocol according to features of
gene - Size, membrane association
- Process information/ user management
- Annotation of new complexes and novel findings
- Collection of information for patent
- Database of known drug targets
9Key figures of the screen
- Purifications 589
- Proteins retrieved 1440, 300 novel
- Complexes discovered 232, 60 appear as novel
- Overlap to the Y2H-data 165 of 1500
interactions - 37 percent of proteins in complexes are shared
- Gavin, AC., Bösche, M., Krause, R., et al.
(2002) Nature
10Functional analysis
- Protein complexes share many components
- The resulting network of complexes builds a
higher order network - Highly conserved and essential proteins tend to
interact with each other - All localizations are sampled well but for
membrane proteins - Small proteins are underrepresented
11Examples of new findings
- New complexes
- 90S Pre-Ribosome
- Gives rise to the primordial, nucleolar ribosome
- Third largest complex in the yeast cell
- Established functionally by Grandi et al, 2002
(Mol. Cell.), Dragon et al, 2002, (Nature) - COP9/Signalosome
- Missing complex known in human, fly,
Arabidopsis - Known to be related to the 19S regulatory part of
the proteasome - Shares components with the proteasome in yeast
- New interactors for known complexes
- Iwr1 with RNA polymerase II
- YFL049w with SWI/SNF complex
- Apparent underestimate of protein complexes in
the reference literature
12A comprehensive list of protein complexes
- Cluster analysis for protein complexes
13Obtaining complexes
TAP purifications
TAP-tagged protein (entry point)
yTAP-complexes (232)
14A comprehensive list of protein complexes
- Assembly of individual interactions into
physiological protein complexes - Allows interpretation and annotation of results
- Manually performed for the publication in Nature
- Used known complexes as a guide
- Contains several inconsistencies
- An automatic procedure would be beneficial
- Cluster analysis
- Should preserve features of real complexes
- Possible clusters
- Of proteins
- Of purifications
- Large number of clusters compared to clustering
of transcription profiles
15Benchmarking protein complexes
- There is no standard on comparing sets of
complexes - How shall we treat the intricate structure of
protein complexes?
- Variant complexes
- RSC complex
- Lsm1-Lsm7 vs Lsm2-Lsm8
- Cyclin dependent kinases
- Megacomplexes
- Assemblies of complexes
- Transient interactions
- Definitions vary
- Kinetics
- Cell cycle
- Functional associations
16Ribosomal biogenesis
From Schafer et al, EMBO Journal (2003)
17Clustering of proteins
- Clustering of proteins
- Shared components are not preserved
- Each protein is assigned to a single complex
- Megacomplexes did not allow for a good separation
- Very few data points 80 of the proteins have
less than 3 identifications - Simpler approach Cluster of purifications
- A purification should contain complexes already
18Clustering of purifications
19Similarity indices for comparing complexes or
purifications
- Dice-Index
- Jaccard-Index
- na, nb Number of components in group a or b
- Geometric index
- Simpson-Index
- ni Number of components in the intersection
20Experimental complications
- Sensitive identification of background proteins
- Ribosomal proteins
- Heat shock proteins
- Abundant enzymes
- Filtering by class and detection frequency
- Missing identifications
- Differences in expression levels
- Small proteins
- Membrane associated proteins
21Refinements of similarity indices
- Normalized Dice-like index (by column)
- Normalized Simpson-like-index
f Frequency of detection
22Comparing clustering results
- Manually refined the MIPS and YPD complex sets
for benchmark - Parameter exploration, comparing the results to
the benchmark set - 80 complexes are contained in the curated
complex set - No increase when expanding beyond 250 complexes
- 252 complexes from the TAP set using means
clustering and a threshold of 0.3
23Results and conclusions
- Combined HMS-PCI and the TAP set
- 494 clusters (a reasonable total number of
complexes) - 46 of 94 identical entry points occur in the same
cluster - Refining the distance index is crucial to the
clustering - Future work
- Clustering of proteins (bi-clustering) and
classification of proteins - Different clustering algorithms
- Including more information into distance measure
- Bait protein
- Refine benchmarking
- Krause R., et al. (2003) Bioinformatics.
24Comparison of protein-protein interaction screens
- Differences between individual methods and
reference sets
25Comparison of different data sets
- Biochemical purifications
- Gavin et al. (2002) (TAP)
- Ho et al. (2002) (HMS-PCI)
- Yeast-two hybrid
- Ito et al. (2000, 2001), Uetz et al. (2000)
- mRNA-co-expression
- Eisen et al., (1998) Marcotte (2000)
- In silico predictions
- STRING (von Mering et al., (2003)
- Synthetic lethals
26Interaction density
27Functional biases
28Comparison
29Conclusions
- The overlap between the individual methods is
surprisingly small - Different methods complement each other
- Individual methods are not exhausted
- Single experimental methods can be as reliable as
combined sets - Integration
- Bader, G. and Hogue, C. (2002) Nat. Biot.
- Kemmeren H., et al. (2002) Mol. Cell
- Von Mering C., Krause, R., et al. (2002) Nature
30Shared components of protein complexes
- Biochemical artifacts or versatile building
blocks?
31Shared components in the Cellzome screen
Co-activator of Pol II transcription
SAGA complex
Cytoskeleton
NuA4 histone acetylase
Chromatin remodelling
Histone deacetylase complex
32Motivation and approach
- Artifacts or structural principle?
- Relevance to medical target discovery
- Target to several processes
- Understanding side effects
- Evolutionary insights
- Study of known shared components
33Functional arrangements
- Dihydrolipoamide dehydrogenase (Lpd1)
- 2-Oxoglutarate dehydrogenase
- Glycine decarboxylase
- Pyruvate dehydrogenase
- Common enzymatic function
- RNA polymerases
- Shared proteins are not the business end
- Regulatory structural roles
Cramer, P., et al. (2000) Science
34Structural arrangements for shared components
Examples Spt6 Tethers exosome to the RNA
polymerase for surveillance
Examples Lsm1-7 complex Lsm2-8 complex
Examples Signaling networks
Manuscript in preparation
35Research interests
36Research interests
- Improve clustering approaches
- Find a sensible structure for protein complexes
and their interactions - Benchmark set of protein complexes in yeast
- Functional properties of protein complexes
- Conquering the human proteome and experimental
planning - Hypothesis-free research
37Thank you!
38Thank you!