Title: PowerPoint Poster Template
1Topology-Free Querying of Protein Interaction
Networks
Sharon Bruckner1, Falk Hüffner1 , Richard M.
Karp2, Ron Shamir1, Roded Sharan1 1Blavatnik
School of Computer Science, Tel Aviv University,
Israel, 2Int. Computer Science Institute,
Berkeley, USA
Introduction
Methods
Experiments Results
Goal Network Querying Given a protein complex
from species A, identify the connected region
most similar to it in the protein-protein
Interaction network of species B.
- Method 1
- Used when the complex size is 4-10
- A fixed parameter algorithm, uses dynamic
programming - Running time O(3kmins)
- Can handle multiple colors per vertex using color
coding 3
- Experiment species
- We applied our method to query complexes within
- yeast (5430 proteins, 39936 interactions),
- fly (6650 proteins, 21275 interactions)
- human (7915 proteins, 28972 interactions).
- We queried complexes from
- yeast, fly, human (some interaction information
is available) - bovine, mouse, and rat (not enough interaction
information is available)
- Why network querying?
- Match hints at an evolutionary conserved region
- May infer the functionality of the matched
region from that of the complex.
- Evaluation Methods
- Comparison to other method
- Tested all complexes with known topology (from
fly, yeast, and human) with QNet1, and
counted the number of matched complexes and
the quality of the match. - Functional coherence
- Used GO TermFinder for functional enrichment.
- Corrected for multiple testing using FDR.
- Previous Methods
- Assume knowledge of the interactions within the
query complex (the topology). Looks for a match
in the network with the same topology. Allow
flexibility deleting nodes from the query
(deletions), adding nodes to the match
(insertions) - Examples QNet1, GraphFind2.
Selected Results
Our method Remove the requirement for query
topology Query is now just a list of proteins!
Find the best connected region in the network
whose proteins are similar to the query proteins.
Examples of the dynamic programming formula. The
vertex is a non-colored vertex used for
insertions.
Why no topology? Interaction information is noisy
and incomplete, and for some species not
available. We claim that the connectivity of the
target region is enough to find good matches.
Total number of matches as compared with
QNet,when querying species with better known
topology. Feasible complexes are all the
complexes for which there were enough similar
proteins in the network to make a match possible.
Definitions
Examples of colorful, connected solutions
- Graph G(V,E) A protein-protein interaction
network of some species. - Color set C1,2,3,,k Given a set of
proteins from another species that compose a
complex, each vertex is assigned a color
corresponding to the protein most
sequence-similar to it.
- Method 2
- Used when complex size is 11-25.
- Integer Linear Programming approach.
- Formulate colorfulness
- Formulate connectivity
Quality matches are the matches that were
functionally coherent. The same trend occurs in
all experiments, between all species pairs.
These complexes could not be tested with Qnet
since theres no sufficient topology information
about them.
- The basic problem
- Given a graph G with colors as above, find a
connected - subgraph containing all k colors exactly once
(colorful subgraph). - The problem is NP-complete!
- Flexibility
- Allow insertions of
- Non-colored vertices, similar to no query
protein. - Colored vertices.
- Allow Deletions
- Allow a network vertex to have more than one
color.
TORQUE server
http//www.cs.tau.ac.il/bnet/torque.html
- Connectivity idea
- Find a flow such that
- Every source has connection to the sink via
flow edges. Therefore, all vertices of the
solution are connected! - Only vertices selected for the solution can be
involved in - the flow.
- Coloring Constraints idea
- Binary variables for each vertex-color
combination - Every vertex should get at most one color
- Every color should be given to at most one
vertex - A vertex gets a color only if it is selected for
the solution -
Network query problems. Left the network, where
vertex j is non-colored. Right queries. For the
basic problem disallowing indels, Q1 is solved by
c, b, i, while Q2 and Q4 have no solution. When
allowing a single arbitrary insertion, Q2 has
solution a, d, h, i and Q4 has the solution a,
b, c, d, i. When allowing a single
special insertion, Q3 has the solution a, b, g,
j. When allowing one deletion, Q2 has the
solutions a, d, i, f. When allowing repeated
nodes and no indels, Q5 has the solution b, c,
I, f, j.
Left TORQUE homepage, allowing users to query
complexes in predefined target species or
user-provided one. Right the results of a sample
TORQUE query.
We thank Noga Alon for his help in analyzing the
case of multiple color constraints. We thank
Banu Dost for providing us with the Qnet code,
and Nir Yosef for providing the PPI networks.
R. Shamir and R. Sharan were supported in part
by the Israel Science Foundation (grant no.
385/06). F. Hüffner was supported by a
postdoctoral fellowship from the Edmond J. Safra
Bioinformatics Program at Tel Aviv University.
1 R.Sharan, B. Dost, T. Shlomi, N. Gupta, E.
Ruppin, and V. Bafna. Qnet A tool for querying
protein interaction networks. Journal of
Computational Biology, 15(7)913-925,
2008. 2 A. Ferro, R. Giugno, M. Mongiov, A.
Pulvirenti, D. Skripin, and D. Shasha. GraphFind
enhancing graph searching by low support data
mining techniques. BMC Bioinformatics, 9 Suppl
41471-2105, 2008. 3 N. Alon, R. Yuster, and U.
Zwick. Color coding. Journal of the ACM, 42
844-856, 1995.
Acknowledgements
References