Title: Proteomics technologies and protein-protein interaction
1Proteomics technologies and protein-protein
interaction
- Lars Kiemer
- Center for Biological Sequence Analysis
- The Technical University of Denmark
- Advanced bioinformatics November 2005
2Outlining the problem
- Around 30 of the human proteins still have no
annotated function. - Even if the function is known, we often dont
know anything about the big picture (regulation?,
multiple functions?, pathogenesis?, mutations?,
splice variants?). - In fact, the individual proteins are as
interesting as bricks in a wall what we want to
know about is the system.
3Example signal transduction cascade
EXTRACELLULAR
NCAM
NCAM
CB1
NCAM
FGFR
NCAM
Ras
bRaf
Frs2
PKC
Sos
Ca2
Raf
Shc
Grb2
C-Fos
DAGL
MEK
CYTOPLASM
Fyn
PLC?
PKA
CREB
MAPK
Rap1
MAPK
Fak
CaMKII
NUCLEUS
GAP43
4Example signal transduction cascade
EXTRACELLULAR
NCAM
NCAM
NCAM
2-AG
DAG
PIP2
Ras
Frs2
NCAM
DAGL
Sos
Grb2
Fyn
Sos
PLC?
Shc
Fak
Grb2
Raf
IP3
Ca2
PKC
PKA
MEK
CYTOPLASM
GAP43
NUCLEUS
MAPK
CaMKII
CREB
MAPK
C-Fos
Transcription
5Obtaining data
- High-throughput data can provide information
about interactions with other proteins, protein
abundance in different tissues, transcriptional
regulation, etc. - High-throughput experimental techniques provide
large data sets thus no manual curation is
possible. - ? These data sets often contain false positives.
- ? But combining several such data sets
increases confidence.
6Protein interactions reveal a lot!
- Hints of the function of a protein are revealed
when its interaction partners are known. - Guilt by association!
- Complexes in which none of the interaction
partners have known functions are even more
interesting.
7Yeast-two-hybrid screening
- Has been widely used
- Only binary interactions
- High false postive rate
- Proteins must be able to enter the nucleus
8Affinity purification
- Large-scale
- Can be done on any preparation of cells
- Often complexes are purified and the order of
binding is not obtained - An extra step is needed to identify purified
proteins
9Mass spectrometer
Q1
q2
TOF
3 principal components
10Mass spectrometry in short
- Extremely sensitive
- Weight precision of one atom
- In principle, detection of one, relatively short
peptide allows for unambiguous identification. - Some proteins are difficult to chop up with
proteases. - Some peptides are very difficult to ionize.
- Due to the high sensitivity of the method,
contaminations are difficult to avoid.
11Protein interaction databases Spoke/Matrix
Affinity pulldown
Bait
Prey
Spoke
Matrix
Truth?
12Protein interaction databases Overlap
Protein interaction data A total of 18.629
articles represented in the databases (June
2005).
Database Unique article references interaction pairs in unique references.
DIP 1.353 5.403 (binary?)
MINT 1.406 5.430 (spoke)
Intact 355 6.836 (spoke)
GRID 1.232 49.135 (binary?)
BIND (protein part) 5.733 44.279 (spoke/matrix)
HPRD 6.989 14.533 (matrix)
Approx. 10 of pp interactions in BIND are db
imports
13Species bias in available data
- A few select organisms are very well-studied,
while others are not. - The BIND database, species distribution (Alfarano
et al., NAR, 2005)
14Trans-organism protein interaction network
Orthologs? Orthologous genes are direct
descendants of a gene in a common ancestor
S. cerevisiae
D. melanogaster
H. sapiens
(O'Brien K, Remm et al. 2005)
15Trans-organism protein interaction network
H. sapiens MOSAIC
D. melanogaster Experim.
C. elegans Experim.
S. cerevisiae Experim.
16Repetition of experiments adds credibility
Light blue connection 1 experiment. Darker blue
connection gt1 experiment, 1 organism. Purple
connection - gt1 experiment, gt1 organisms.
17Adding co-expression data
Red connector co-expression in 80 different
tissues with a correlation coefficient above
0.7. Grey nodes no expression data available.
18Nucleolus dynamics
Nodes are coloured according to level of protein
in the nucleolus following transcriptional
inhibition (Andersen et al., Nature, 2005).
19Adding up to make high quality associations
Integration of various data sources builds up
confidence
20Upon integration comes enlightenment
21Upon integration comes enlightenment
22Identifying functional complexes
23Summary
- Protein-protein interactions can reveal hints
about the function of a protein (guilt by
association). - Information about protein interactions is
obtained with different technologies each with
its own advantages and weaknesses. - Due to the high degree of systemic conservation,
interactions can be inferred from observed
interactions in other species. - Data are always error-prone. Repeated
observations build up confidence. - Integrating different types of data can futher
build up confidence.