Title: Statistical Inference Using Graphs for Protein Complex Identification
1Statistical Inference Using Graphs for Protein
Complex Identification
- Denise Scholtens
- Robert Gentleman
- Marc Vidal
- Workshop on Statistical Inference, Computing, and
Visualization for Graphs - Stanford University
- August 1-2, 2003
2Graphic from U.S. Department of Energy Human
Genome Program http//www.ornl.gov/hgmis
3High-throughput Protein Complex Identification
- Gavin, et al. (Nature, 2002)
- TAP Tandem Affinity Purification
- Ho, et al. (Nature, 2002)
- HMS-PCI High-throughput Mass Spectromic Protein
Complex Identification
4Protein Complex Identification Using TAP
Data
Spoke Model
Matrix Model
Bader, et al. (Nature Biotechnology, 2002)
5Protein-Complex Affiliation Network Incidence
Matrix
C1 C2 C3 C4 C5 Cm
P1
P2
P3
P4
P5
P6
P7
Pn
1 1 1 0 0 0 0 0
0 0 0 0 1 1 1 0
1 0 0 1 0 0 0 0
1 1 0 1 0 0 0 0
1 1 1 1 0 0 0 0
A
6Cohesive vs. Dynamic Protein Complexes
Cohesive Complex a complex of invariable
composition whose proteins are associated only
with that complex and its particular function
7Cohesive Complex Affiliation Network Incidence
Matrix
C1
Bait
Hit 1
Hit 2
Hit 3
Hit 4 Hit 5
1 1 1 1 1 1
A
8Cohesive vs. Dynamic Protein Complexes
Dynamic Complex complex composed of proteins
that may also be involved in other complexes
9Dynamic Complex Affiliation Network Incidence
Matrices
C1 C2 C3 C4 C5
Bait 1 1 1 1 1
Hit 1 1 0 0 0 0
Hit 2 0 1 0 0 0
Hit 3 0 0 1 0 0
Hit 4 0 0 0 1 0
Hit 5 0 0 0 0 1
C1 C2
Bait 1 1
Hit 1 1 0
Hit 2 0 1
Hit 3 1 0
Hit 4 0 1
Hit 5 1 0
C1 C2
Bait 1 1
Hit 1 1 1
Hit 2 1 1
Hit 3 0 1
Hit 4 0 1
Hit 5 0 1
A
A
A
10All 5 complexes above would yield the same TAP
Data
11Statistical Inference Problem
- What is A?
- A captures the cohesive/dynamic distinction.
- At best, we observe all but the main diagonal of
- XAA.
- Current analyses focus on X, not on A.
-
12Protein Complex Data as a Directed Graph
?
13Cohesive Complex described in Gavin, et al.
14Dynamic Complex described in Gavin, et al.
15Largest Connected Component in Gavin, et al.
using Bait Proteins Only,Colored by Outdegree
16Gavin Data
Ho Data
17SubGraph of Bait Proteins from Previous Graphs
with Outdegree 7
Gavin Data
Ho Data
18Examples of Distinct Complexes Identified by
Gavin, et al.
19Back to Affiliation Networks
C1
B1 1
B2 1
B3 1
B1 B2 B3
B1 1 1 1
B2 1 1 1
B3 1 1 1
A
XAA
One Three-Way Conversation
20Affiliation Networks
C1 C2 C3
B1 1 1 0
B2 1 0 1
B3 0 1 1
B1 B2 B3
B1 2 1 1
B2 1 2 1
B3 1 1 2
A
XAA
Three Two-Way Conversations
21Statistical Inference Problem
- Which A is correct?
- A uniquely defines X, but X does not uniquely
define the observable part of A. - Extra information and directed graph model for
the TAP data - Cellular Component Data
- Gene Expression Data
- Hit Data
22Possible Use of Hit Data to Help Estimate A
23Conclusions
- In the protein complex setting, directed graphs
are useful for EDA, as well as framing the
correct questions for statistical inference. - Statistical inference problem for cohesive and
dynamic protein complex identification should
focus on A, not X. - Digraph model of the TAP data better reflects
what we actually observe, and is informative for
estimating A.