Protein%20Complex%20Detection%20in%20Large%20Protein%20Interaction%20Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Protein%20Complex%20Detection%20in%20Large%20Protein%20Interaction%20Networks

Description:

... Interactive viewer for directed mode Evaluation Require interactions and known complexes Interactions from Saccharomyces ... Signal transduction 6/17 ... – PowerPoint PPT presentation

Number of Views:186
Avg rating:3.0/5.0
Slides: 57
Provided by: bad136
Learn more at: http://www.baderlab.org
Category:

less

Transcript and Presenter's Notes

Title: Protein%20Complex%20Detection%20in%20Large%20Protein%20Interaction%20Networks


1
Protein Complex Detection in Large Protein
Interaction Networks
  • Gary Bader

May.21.2003
Chris Sander Lab Computational Biology Center
(cBio) Memorial Sloan-Kettering Cancer Center
http//cbio.mskcc.org/
2
Yeast Two-Hybrid
233 Interactions 145 Proteins
Fields, Drees, Boone, Tong
3
Highly Connected 6-Core Las17 Actin Assembly
Complex?
20 proteins
Tong et al. Science 2002295(5553)
4
Experimental Validation of Las17 Complex
Experimental
Ypr154
Myo3
Yfr024
Bbc1
Ysc84
Bzz1
Rvs167
Yfr024
Ygr136
Sho1
Ypr154
Ysc84
Ygr136
Ygr136
Rvs167
Bzz1
2
3
4
5
Las17
1
ELISA Bbc1, Bzz1, Ygr136w, Ypr154w, Yfr024c,
Ysc84 CoIP Colocalized
Tong et al. Science 2002295(5553)
5
So...
  • Based on observations, densely interconnected
    regions of an interaction network may represent
    molecular complexes
  • Complexes are another level of annotation above
    other guilt by association methods
  • Methods that find dense network regions can help
    us understand biological systems (using only
    qualitative connectivity information)

6
Nuclear Complexes
7
k-core
  • A part of a graph where every node is connected
    to other nodes with at least k edges
    (k0,1,2,3...)
  • Highest k-core is a central most densely
    connected region of a graph
  • Therefore, high k-cores may be molecular complexes

8
k-core
Pajek - Batagelj,V., Mrvar, A.
9
A Better Complex Finder
  • k-core method is limited to a single complex in
    the middle of a network

SH3 Y2H data
10
2 Complexes
7/19 membrane10/19 unknown1/19
cytoskeletal Signal transduction
6/17 cell polarity3/17 unknown role Actin
cytoskeleton rearrangement
other
11
(No Transcript)
12
Molecular Complex Detection
MCODE
  • MCODE finds densely connected regions of a
    network
  • Graph theoretic based clustering algorithm
  • Three stages
  • Network Weighting
  • Complex Detection
  • Optional Post-processing

Bader Hogue - BMC Bioinformatics 2003 Jan
134(1)2
13
Overview
  • MCODE Algorithm
  • Evaluation
  • Application

14
MCODE
  • Take a network
  • Give each node a score
  • High score node in dense region
  • Find complexes
  • Optionally expand/contract complexes

15
Input Network
16
Find neighbors of Pti1
17
Find highest k-core (8-core)
Removes low degree nodes in power-law networks
18
Find graph density
19
Calculate score for Pti1
20
Repeat for entire network
21
Find dense regions -Pick highest scoring
vertex -Paint outwards until threshold score
reached ( score from seed node)
22
Post-process (optional) Fluff the boundary by
fluff density threshold - Haircut 2-core
23
Polyadenylation Factor I Complex
KnownCft1, Cft2, Fip1, Pap1, Pfs2, Pta1, Ysh1,
Yth1 and Ykl059c UnknownYor179c and Pti1
Ideally ? testin wet-lab
Continue with rest of network
24
Evaluation
  • Yeast
  • Requires a list of known complexes for
    comparison Gavin et al. (221), MIPS (208)
  • Not perfect, but neither is the data

25
Modeling Copurification Data
  • E.g. Co-immunoprecipitation (CoIP) data
  • Population of complexes of unknown topology
  • Want to use this data with pairwise interactions
  • Must model CoIP as pairwise interactions

Bader Hogue Nature Biotech 200220(10)
26
Spoke and Matrix Models
  • Vrp1 (bait), Las17, Rad51, Sla1, Tfp1, Ypt7

Possible Actual Topology
Spoke
Matrix
Theoretical max. number of interactions, but many
FPs
Simple model Intuitive, more accurate, but
canmisrepresent
27
TAP Benchmark
  • Only 88/221 coverage
  • Better predictions with better experimental
    coverage

28
Application to Yeast Network
  • From a list of 15,143 known yeast intx among
    4,825 proteins 209 complexes predicted
  • 100 random network permutations
  • Average of 27.4 complexes (SD4.4)
  • Random complexes 5x larger
  • Did not match any known complexes
  • Large annotation spread
  • Thus, number, size, functional composition
    unlikely to occur by chance
  • Not affected by high number of false positives in
    high-throughput data sets

29
The Yeast 26S Proteasome
16/21 19S regulatory subunit
9/15 20S proteolytic subunit
Basic structure is evident
30
Cytoskeleton/Cytokinesis Complex?
31
Directed Mode
32
Directed Mode - Split
26S proteasome
Lsm mRNA Modification
snRNA associated
Allows fine-tuning without considering entire
network
33
Functional Connections Between Complexes
34
Advantages
  • Compared to other clustering algorithms
  • Directed mode
  • Complex connectivity mode
  • Does not force all data points into clusters
  • Makes visualization of large networks more
    manageable

35
Conclusions
  • Initial step in taking advantage of current
    purely qualitative connectivity information
  • Requires graph layout software (Pajek)
  • Future networks need to have more information
    about time, space, data quality, stoichiometry
    (from e.g. interaction databases) ? p-value
    weights on edges
  • Dynamic, not static
  • ftp.mshri.on.ca/pub/BIND/Tools/MCODE

36
Future MCODE Directions
  • Adaptive vertex scoring function (functional
    annotation, gene expression)
  • Interactive viewer for directed mode

www.cytoscape.org
37
Acknowledgements
Sander Group Chris Sander Mike Cary Ethan
Cerami Daniel Eisenbud Anton Enright Ronald
Jansen Alex Lash Boris Reva
Original work Chris Hogue (Toronto)
Data Boone lab, Tyers lab, MDSP,SLRI, Fields lab,
Cesareni lab
bicjobs_at_cbio.mskcc.org CB, SE, DBA, SA
38
(No Transcript)
39
Evaluation
  • Require interactions and known complexes
  • Interactions from Saccharomyces cerevisiae
  • List of known complexes for comparison Gavin et
    al., MIPS
  • Predict and compare with parameter optimization

40
Evaluation with Gavin Data Set
  • Convert 588 raw copurification data to binary
    interactions using spoke model
  • 3,225 interactions among 1,363 proteins
  • Run MCODE 840 parameter combinations
  • Compare with 221 hand annotated complexes
    (somewhat redundant)
  • Pick parameters with most number of matched known
    complexes

Gavin et al. Nature 2002 415(6868)
41
Complex Comparison
  • Overlap score ? i 2/ab
  • i size of intersection set of two complexes
    (predicted known)
  • a size of predicted complex
  • b size of known complex

42
Matched Known Complexes
43
Parameter Optimization
Large range best parameters hFfT/0.05/0.05
44
Evaluation with MIPS Data Set
  • More varied benchmark
  • 9,088 interactions among 4,379 proteins
    literature and large-scale (not including HTMS)
  • 208 MIPS curated complexes
  • Best parameters hTfT/0.1/0.2
  • 166 predicted complexes
  • 52 matched 64 MIPS complexes gt ?0.2

Gavin et al. Nature 2002 415(6868)
45
Prediction Benchmark Overlap
MIPS complex catalogue incomplete Data set
incomplete MCODE complex ! human definition
46
Effect of Data Set Properties
MCODE Predictions vs. MIPS Complexes
High FP doesnt affect sens/spec
47
Large-Scale Data Sets
  • Large-scale data has many false positives
  • Benchmark interaction data set augmented by
    large-scale data set interactions that only
    connect proteins in the Benchmark set with each
    other
  • gt3100 interactions added to existing 3300
  • Sensitivity/specificity was not affected
  • ? High FP rate doesnt affect prediction

48
Effect of Data Set Properties
MCODE Predictions vs. Gavin Complexes
Spoke is reasonable
49
Prediction Significance
  • 100 random network permutations
  • Average of 27.4 complexes (SD4.4) ? optimized
  • Random complexes 5x larger than MIPS
  • Did not match any known complexes
  • Large annotation spread
  • Thus, number, size, functional composition
    unlikely to occur by chance

50
Complex Score
  • MCODE Score Complex density x Size of complex
    (DC x V)
  • Ranks larger more dense complexes higher
  • Other scoring functions exist, this one developed
    empirically (heuristic)

51
Top 5 Complexes
52
Complex Score Accuracy
53
Future Directions
  • Complexes and network connectivity are just one
    set of features that will be useful to compare
    between organisms
  • Evolution of complexity
  • Protein interaction network involved in actin
    cytoskeleton regulation, defined by SH3, PDZ
    domains

54
http//www.caida.org/tools/visualization/walrus/
55
Xerox.com high, low traffic (Apr1997)
Chi Card INFOVIS 1999 Xerox Parc
56
new, deleted, unchanged
Chi et al. 1997 Xerox Parc
57
VWP Parameter Properties
Write a Comment
User Comments (0)
About PowerShow.com