Statistical Inference Using Graphs for Protein Complex Identification - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Statistical Inference Using Graphs for Protein Complex Identification

Description:

Statistical Inference Using Graphs for Protein ... C4. 0. 0. 0. Hit 5. 0. 0. 0. Hit 4. 1. 0. 0. Hit 3. 0. 1. 0. Hit 2. 0. 0. 1. Hit 1. 1. 1. 1. Bait. C3. C2. C1 ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 24
Provided by: denisesc
Category:

less

Transcript and Presenter's Notes

Title: Statistical Inference Using Graphs for Protein Complex Identification


1
Statistical Inference Using Graphs for Protein
Complex Identification
  • Denise Scholtens
  • Robert Gentleman
  • Marc Vidal
  • Workshop on Statistical Inference, Computing, and
    Visualization for Graphs
  • Stanford University
  • August 1-2, 2003

2
Graphic from U.S. Department of Energy Human
Genome Program http//www.ornl.gov/hgmis
3
High-throughput Protein Complex Identification
  • Gavin, et al. (Nature, 2002)
  • TAP Tandem Affinity Purification
  • Ho, et al. (Nature, 2002)
  • HMS-PCI High-throughput Mass Spectromic Protein
    Complex Identification

4
Protein Complex Identification Using TAP
Data
Spoke Model
Matrix Model
Bader, et al. (Nature Biotechnology, 2002)
5
Protein-Complex Affiliation Network Incidence
Matrix
C1 C2 C3 C4 C5 Cm
P1
P2
P3
P4
P5
P6
P7

Pn
1 1 1 0 0 0 0 0
0 0 0 0 1 1 1 0
1 0 0 1 0 0 0 0
1 1 0 1 0 0 0 0
1 1 1 1 0 0 0 0
A
6
Cohesive vs. Dynamic Protein Complexes
Cohesive Complex a complex of invariable
composition whose proteins are associated only
with that complex and its particular function
7
Cohesive Complex Affiliation Network Incidence
Matrix
C1
Bait
Hit 1
Hit 2
Hit 3
Hit 4 Hit 5
1 1 1 1 1 1
A
8
Cohesive vs. Dynamic Protein Complexes
Dynamic Complex complex composed of proteins
that may also be involved in other complexes
9
Dynamic Complex Affiliation Network Incidence
Matrices
C1 C2 C3 C4 C5
Bait 1 1 1 1 1
Hit 1 1 0 0 0 0
Hit 2 0 1 0 0 0
Hit 3 0 0 1 0 0
Hit 4 0 0 0 1 0
Hit 5 0 0 0 0 1
C1 C2
Bait 1 1
Hit 1 1 0
Hit 2 0 1
Hit 3 1 0
Hit 4 0 1
Hit 5 1 0
C1 C2
Bait 1 1
Hit 1 1 1
Hit 2 1 1
Hit 3 0 1
Hit 4 0 1
Hit 5 0 1
A
A
A
10
All 5 complexes above would yield the same TAP
Data
11
Statistical Inference Problem
  • What is A?
  • A captures the cohesive/dynamic distinction.
  • At best, we observe all but the main diagonal of
  • XAA.
  • Current analyses focus on X, not on A.

12
Protein Complex Data as a Directed Graph
?
13
Cohesive Complex described in Gavin, et al.
14
Dynamic Complex described in Gavin, et al.
15
Largest Connected Component in Gavin, et al.
using Bait Proteins Only,Colored by Outdegree
16
Gavin Data
Ho Data
17
SubGraph of Bait Proteins from Previous Graphs
with Outdegree 7
Gavin Data
Ho Data
18
Examples of Distinct Complexes Identified by
Gavin, et al.
19
Back to Affiliation Networks
C1
B1 1
B2 1
B3 1
B1 B2 B3
B1 1 1 1
B2 1 1 1
B3 1 1 1
A
XAA
One Three-Way Conversation
20
Affiliation Networks
C1 C2 C3
B1 1 1 0
B2 1 0 1
B3 0 1 1
B1 B2 B3
B1 2 1 1
B2 1 2 1
B3 1 1 2
A
XAA
Three Two-Way Conversations
21
Statistical Inference Problem
  • Which A is correct?
  • A uniquely defines X, but X does not uniquely
    define the observable part of A.
  • Extra information and directed graph model for
    the TAP data
  • Cellular Component Data
  • Gene Expression Data
  • Hit Data

22
Possible Use of Hit Data to Help Estimate A
23
Conclusions
  • In the protein complex setting, directed graphs
    are useful for EDA, as well as framing the
    correct questions for statistical inference.
  • Statistical inference problem for cohesive and
    dynamic protein complex identification should
    focus on A, not X.
  • Digraph model of the TAP data better reflects
    what we actually observe, and is informative for
    estimating A.
Write a Comment
User Comments (0)
About PowerShow.com