Title: Discovering functional interaction patterns in Protein-Protein Interactions Networks
1Discovering functional interaction patterns in
Protein-Protein Interactions Networks
-
-
Authors -
Mehmet E
Turnalp -
Tolga Can -
Presented By -
Sandeep
Kumar
2Background
- Availability of genome scale protein network
- Understanding topological organization
- Identification of conserved subnetworks across
different species - Discover modules of interaction
- Predict functions of uncharacterized proteins
- Improve the accuracy of currently available
networks
3Aim of study
- Using available functional annotations of
proteins in PPI network and look for
overrepresented patterns of interactions in the
network - Present new frequent pattern identification
technique PPISpan
4Yeast as a model
- Why yeast genomics? A model eukaryote organism
- Well known PPI network
-
5PPI Network
- Protein protein interaction shown by edge between
them indicating physical association in the form
of modification, transport or complex formation - Interesting conserved interaction patterns among
species - Patterns correspond to specific biological process
6Frequent sub-graphs
A graph (sub graph) is frequent if is support
(occurrence frequency) in a given dataset is no
less than minimum support threshold
7Example Frequent Subgraphs
GRAPH DATASET
(A)
(B)
(C)
FREQUENT PATTERNS (MIN SUPPORT IS 2)
(1)
(2)
8The Algorithm - PPISpan
- Based on gSpan
- Modified to adapt for PPI network
- Candidate generation
- Frequency counting
9Algorithm PPISpan (G, L, minSup)
- Set the vertex labels in G with GO terms from the
desired GO level L - S lt- all frequent 1-edge graphs in G in frequency
based lexicographical order - for each edge e in S (in ascending order
frequency) do - SubGraphs (e, minSup, e)
- Remove e from G
10Algorithm Subpgraphs (s, minSup, ext)
- If (feasible (s, ext))
- If DES code of s ! to its minimum DFS code
- return
- C lt- Generate all children of s (by growing an
edge, ext) - Maximal lt- true
- For each c in C (in DFS lexicographical order) do
- If support (c) gt minSup
- Subgraphs (c, minSup, c.ext)
- maximal lt- false
- If (maximal)
- output s
11Datasets used
- Database of interacting proteins (DIP)
- data constructed from high-throughput
- experiments
- String Database
- confidence weighted predicted data
- WI-PHI
- weighted yeast interactome enriched
for direct - physical interactions
12Gene Ontology annotations
- Used to assign functional category labels to the
proteins in PPI network - Collaborative effort to address the need of
consistent descriptions of the gene products in
different databases - Provides description for biological processes,
cellular components, and molecular functions
13GO slim terms
- Provides a broad overview of the functional
categories in GO - GO Slim Molecular Function Terms for S.
Cerevisiae - Term ID Definition
- GO3674 molecular function unknown
- GO16787 hydrolase activity
- GO16740 transferase activity
- GO5515 protein binding
-
- Total of 22 broad functional categories
14Research Steps
- Label the nodes with functional categories with
GO annotations - Consider molecular function hierarchy
- Focus on functional interaction patterns in
arbitrarily topologies - Find non-overlapping embeddings using PPISpan
15Problems faced
- Noise in PPI network
- False positives
- False negatives
- Accuracy and specificity of annotations of
proteins
16Supporting embedding
- Specific instance of the functional pattern
realized by certain proteins in the PPI network
17Experiment details
- Implemented in C
- Searched for frequent interaction patterns of
support gt 15
18Pattern frequency in different datasets
19Observation
- Most of the patterns are trees
- Star topology most abundant
- Cycles rare
20Comparison with known molecular complexes and
pathways
- Ignore topology and treat patterns as set of
proteins for comparison - Molecular complexes from MIPS (Munich Information
Center for Protein Sequences) complex catalogue
database - Signaling, transport, and regulatory pathways
from KEGG database - Use high quality complexes
21cpcount
- Average number of different complexes or pathways
the embeddings of a frequent interaction pattern
overlaps with - To speculate on the location of interacting
patterns
22cpoverlap
- Quantifies the overlap between proteins in an
embedding and known complexes and pathways - Ratio of proteins in an embedding that are
members of known functional modules
23Observations from comparison
- For some of the observed patterns, topology is
more important than underlying functional
annotations - Comparison of all the patterns with random
patterns in terms of overlap with MIPS complexes - Comparison of all the patterns with random
patterns in terms of overlap with transport and
signaling pathways
24Analysis of patterns with MIPS complexes
- Selected patterns from DIP and WI-PHI networks
- Selected patterns from the STRING network
- cpoverlap of selected patterns with respect to
MIPS complexes - cpcount of selected patterns with respect to MIPS
complexes
25Analysis of patterns with KEGG pathways
- Selected patterns from DIP, STRING and WI-PHI
networks - cpoverlap of selected patterns with respect to
transport and signaling pathways - cpcount of selected patterns with respect to
transport and signaling pathways
26Some interesting Functional interaction patterns
- A frequent functional interaction pattern in the
DIP network - A frequent functional interaction pattern in the
WI-PHI network - A functional interaction pattern related to the
MAPK signaling pathwaysignaling pathways - A functional interaction pattern related to the
SNARE interactions in vesicular transport
27Conclusions
- Proposed new frequent pattern identification
technique, PPISpan - utilized molecular function Gene Ontology
annotations to assign non-unique labels to
proteins of a PPI network - identified significantly frequent functional
interaction patterns - Frequent patterns offer a new perspective into
the modular organization of protein-protein
interaction networks
28QUESTIONS ?
29 THANK YOU