Title: Protein Evolution, Coevolution and Interaction Networks Day 2
1Protein Evolution, Co-evolution and Interaction
Networks(Day 2)
- Matteo Pellegrini
- Rosetta Inpharmatics
2Identifying The Components of Cellular Pathways
and Protein Complexes using Co-evolution
3Proteins are Components of Molecular Machines
Hartwell LH, Hopfield JJ, Leibler S, Murray AW.
From molecular to modular cell biology. Nature.
1999 Dec 2402(6761 Suppl)C47-52.
4Techniques to Study Protein Interactions
Protein Interactions
5Bacterial Diversity
- 150 fully sequenced genomes in Genbank
- 30,000 species represented in Genbank
- Sea may support 2,000,000
- Soil may support 4,000,000
T.P. Curtis, W.T. Sloan, and J.W. Scannell.
2002. Estimating prokaryotic diversity and its
limits Proc Natl Acad Sci USA 99 10494-10499.
6The Study of the Co-Evolution of Non-Homologous
Proteins
- Because selection generally acts to maintain or
delete entire complexes and pathways, pairs of
proteins that are part of these will appear to
co-evolve across bacteria -
- By studying the co-evolution of non-homologous
proteins across these bacteria we attempt to
reconstruct the components of complexes and
pathways
7Methods to Infer Co-evolution
Method
Basis
Phylogenetic Profile Pairs of
genes that are always present or absent
together in genomes Rosetta Stone
Pairs of proteins that are
fused in some organism Gene
Neighbor Pairs of genes that
are coded nearby in multiple
organisms Gene Cluster Gene
proximity within genome
8Phylogenetic Profile
Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg
D, Yeates TO, Assigning protein functions by
comparative genome analysis protein phylogenetic
profiles. Proc Natl Acad Sci U S A.
96(8)4285-8,. 1999
9Flagellar Proteins Phylogenetic Profiles
10Hypergeometric Function
m
n
k
N
11Gene Neighbor Method
Pellegrini M, Thompson MJ, Fierro J, Bowers P, A
Computational Method to Assign Microbial Genes to
Pathways. Journal of Cellular Biochemistry Suppl
37106-9, 2001
12Linking Dihydrofolate reductase and Thymidilate
synthase
13Gene Neighbor Probability
14Rosetta Stone Method Identifies Protein Fusions
- Monomeric proteins that are found fused in
another organism are likely to be functionally
related and physically interacting.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates
TO, Eisenberg D, Detecting protein function and
protein-protein interactions from genome
sequences. Science 285(5428)751-3, 1999
15Rosetta Stone Probability
Protein i
Protein j
K Rosetta Stone fusion proteins
has m homologs
has n homologs
16Gene Cluster
genes
genomic DNA
17Tryptophan Operon
P0.67
P0.53
Plt0.01
P0.09
Plt0.01
Plt0.01
P0.91
yciG
trpA
trpB
trpC
trpD
trpE
trpL
yciV
Here, a p-value threshold of 0.1 captures all but
one of the genes for this operon.
18Combining Inferences of Co-evolution from
Previous Methods
- We use a Bayesian approach to combine the
probabilities from the previous four methods to
arrive at a single probability that two proteins
co-evolve
Where positive pairs are proteins with common
pathway annotation and negative pairs are
proteins with different annotation
Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ,
Chung S, Emili A, Snyder M, Greenblatt JF,
Gerstein M. A Bayesian networks approach for
predicting protein-protein interactions from
genomic data. Science. 2003 Oct
17302(5644)449-53.
19True and False Interactions are derived from
Pathway Classification Schemes
Information Storage and Processing Translation,
ribosomal structure and biogenesis Transcription D
NA replication, recombination and
repair Cellular processes Cell division and
chromosome partitioning Posttranslational
modification, protein turnover, chaperones Cell
envelope biogenesis, outer membrane Cell motility
and secretion Inorganic ion transport and
metabolism Signal transduction mechanisms Metabol
ism Energy production and conversion Carbohydrate
transport and metabolism Amino acid transport and
metabolism Nucleotide transport and
metabolism Coenzyme metabolism Lipid
metabolism Secondary metabolites biosynthesis,
transport and catabolism
Pathway categorization scheme
20Networks of Co-evolving Proteins
- We can generate networks of co-evolution by
selecting only pairs of proteins whose
probability of co-evolution is above a threshold
21Bacterial Flagella Network Using Combined Methods
22Alternative Representations of Network
Strong M, Graeber TG, Beeby M, Pelligrini M,
Thompson MJ, Yeates TO, Eisenberg D. Inference
and Visualization of Protein Networks in
Mycobacterium tuberculosis Based on Hierarchical
Clustering of Whole Genome Functional Linkage
Maps. Submitted to Nucleic Acids Research
23Hierarchical Clustering Reveals Modular Evolution
24Clusters are Enriched for Pathways and Complexes
25Examples of Clusters that Contain Components of
Biochemical Pathways
26Cluster Reeveals Additional ORFs Involved in
Lipopolysaccharide Biosynthesis
27Clusters are also Enriched for Subunits of
Protein Complexes
True positive interactions are between subunits
of known complexes and false positive ones are
between subunits of different complexes. For
high confidence links, we recover one third of
true interactions and only one thousandth of the
false positive ones
28Clusters Containing Subunits of Protein Complexes
Cytochrome c oxidase controls the last step of
food oxidation
ATP Synthase
29Identification of an Uncharacterized Protein
Complex in Pseudomonas Auruginosa
30Parallel Pathways and Protein Complexes
- Clustered Maps of co-evolving genes may be used
not only to identify groups of proteins that are
part of a complex or pathway but also to identify
duplicated complexes and pathways
Li H, Pellegrini M, Eisenberg D. Discovering
parallel pathways and protein complexes from
genome sequences. In preparation.
31Schematic of Pathway Duplication Identification
32Nitrogenases in Rhodopseudomonas palustris
N2 8e 8H 16ATP 16ADP 16Pi 2NH3 H2
Iron protein (NifH)
Mo-Fe protein (a2 ß2) NifD a subunit NifK ß
subunit
33Co-evolution Network of Nitrogenases
Full network
Distinct sub-networks
34Predicting Protein Functions In Yeast
Xiaoqun Joyce Duan, Matteo Pellegrini, and David
Eisenberg. Discovering Biological Modules and
Function from Various Genome-scale Protein
Networks Submitted to PLOS Biology.
35Guilt By Association Predicting MIPS Categories
for Yeast Genes
36Combining Methods to Improve Function Prediction
P
P
Network 2
Network 1
P
P
S1
S2
37Using Bayesian Formalism to Combine Methods
38Benchmarking Combined Method
Accuracy
CS
Recovery
Accuracy
Accuracy
Coverage
Coverage
39MIPS Category Accuracy and Coverage
Accuracy
CS GN PP RS TFBP
Methods
Coverage
CS GN PP RS TFBP
Methods
67.07 C-compound transporters 67.04.07
anion transporters 67.04.01 cation
transporters 67.04.01.07 other cation
transport 67.04.01.01 heavy methal
transporter 67.04 ion transporter 67 Transport
facilitation
40Identifying the function of a Histone Deacetylase
Complex
YDR155C
YDR155C
YIL112W
YIL112W
YOL068C
YOL068C
YMR273C
YMR273C
YGL194C
YGL194C
YKR029C
YKR029C
YCR033W
YCR033W
YBR103W
YBR103W
3 un-annotated (empty circles) two share
transcription control (dark gray circles) 3
annotated with other various functions (light
gray circles)
Our combined scoring algorithm infers function
transcriptional control to seven out of the
eight proteins (dark gray circles)
41Conclusions
- Protein modules appear to co-evolve across
bacterial species - Modules are enriched for proteins that
participate in the same pathway or complex - We can identify and reconstruct duplicated
complexes and pathways - Co-evolution may be used to identify functions of
yeast proteins
42PROLINKS Database
- We have constructed a database that contains
co-evolution links between the genes of 83 (soon
to be 150) fully sequenced genomes - The Prolinks database may be accessed through the
Proteome Navigator web browser interface at -
- dip.doe-mbi.ucla.edu/pronav
Peter M Bowers, Matteo Pellegrini, Mike J.
Thompson, Joe Fierro, Todd O. Yeates, David
Eisenberg. PROLINKS A Database of Protein
Functional Linkages Derived from Co-evolution,
Genome Biology, in press
43Proteome Navigator Access Page
44Proteome Navigator Network Page
45Future Directions
- Distinguish pathways from complexes
- Combine multiple data types
- Compute statistics for 3 or more genes to
co-evolve -
- Account for Phylogenies
46Acknowledgements
Michael Thompson Peter Bowers
Michael Strong
Huiying Li
Todd Yeates
Joseph Fierro
David Eisenberg
Joyce duan
Edward Marcotte