Title: Approximating the Number of Network Motifs
1- Approximating the Number of Network Motifs
- Mira Gonen
- gonenmir_at_post.tau.ac.il
- Joint Work with Yuval Shavitt
2Talk Outline
- Background and Motivation
- Our Algorithm for counting Network Motifs
- The Color-Coding Technique
- High level description of the algorithm
- Summery of main results
- An example counting the number of cycles with a
chord. - Conclusions and Future Work
3Background and Motivation
- World Wide Web, Internet, coupled biological and
chemical systems, neural networks, and social
interacting species, are only a few examples of
systems composed by a large number of highly
interconnected dynamical units. - The first approach to capture the properties of
such systems is to model them as graphs whose
nodes represent the dynamical units, and whose
links stand for the interactions between them.
4Background and Motivation
- Such networks have been extensively studied by
exploring their global topologies such as
power-law degree distribution and the existence
of dense-core. - However two networks which have similar global
features can have significant differences in
structure. - ? Local structures must be examined.
- These networks contain characteristic patterns,
termed network motifs, which occur far more often
than in randomized networks with the same degree
sequence. - Different motifs were found in different
networks. The motifs reflect the underlying
processes that generate each type of network.
5Background and Motivation
6Background and Motivation
- Milo, Shen-Orr, Itzkovitz, Kashtan, Chklovskii
and Alon found motifs in the World Wide Web, and
networks from biochemestry and neurobiology. - Graphlet distribution of a vertex - a new
systematic measure of a network local topology
that was suggested by Przulj. They count for each
vertex the number of all motifs of size at most
five that are adjacent to the vertex. - Gordon, Livneh, Pinter, and Rubin discussed
counting the number of motifs a node is part of
as a method to classify nodes in the network. - Hales and Arteconi presented results from a motif
analysis of networks produced by peer-to-peer
protocols. They showed that the motif profiles of
such networks closely match protein structure
networks.
7Background and Motivation
8(?,?) - Approximation
- An algorithm for a counting problem f is an
(?,?)-approximation if it takes an input instance
and two real values ?, ? and produces an output y
such that - Pr(1-?)?f?y?(1?)?f?1-2?
9The Color Coding Technique
- Combinatorial approach that was introduced by
Alon, Yuster, and Zwick to detect simple paths,
trees and bounded treewidth subgraphs in
unlabeled graphs. - It is based on
- assigning random colors to the vertices of an
input graph - considering only subgraphs for which each vertex
has a unique color.
10Main Contribution
- Using the color coding technique to approximate
the number of network motifs a node is part of
for k-length cycles, k-length cycles with a
chord, (k-1)-length paths, where kO(log
V), and for all motifs of size at most 4. - The time complexity of our algorithm is
O(e2k?E?V2?log(1/?)/?2)
11Counting the Number of Motifs v is part of
v
12Counting the Number of Motifs v is part of
v
13Counting the Number of Motifs v is part of
v
14Counting the Number of Motifs v is part of
v
15Counting the Number of Motifs v is part of
v
16Counting the Number of Motifs v is part of
v
17Counting the Number of Motifs v is part of
v
184-Nodes Motifs
3
1
2
4
5
6
3. Results
19O(log(V)-Nodes Motifs
7
8
9
20Our Main Results
Motif Time Complexity 1
O(E?V) 2
O(E?log(1/?)/?2) 3
O(E) 4
O(E2V?E?log(1/?)/?2) 5
O(E?V?logVE2) 6
O(E?V)
21Our Main Results
Motif Time Complexity 7
O(e2k?E?log(1/?)/?2
) 8
O(e2k?E?V2?log(1/?)/?2) 9
O(e2k?E?V2?log(1/?)/?2)
kO(log V)
22An Example Result
- Approximation Algorithm for Counting the
Number of Cycles with a Chord
23Mathematical Notations
- C(v,u,S) the number of colorful paths from v to
u in a specific coloring, using the colors in S. - P(v,u,w,S) the number of colorful paths from u
to w that are adjacent to v in a specific
coloring, using the colors in S.
S
v
u
w
v
u
24The Algorithm
- Algorithms Input
- A graph G(V,E), a vertex v, fault-tolerance ?,
error probability ? - Notation let AV,z,b(S) be the set of all pairs
(S1,S2) such that the following hold - S1 z1,
- S2b-z1,
- S1?S2S,
- S1\col(u)u?V?S2\col(u)u?V ?
25The Algorithm
Repeat tlog(1/?) times
- 1. Color each vertex of G independently and
uniformly at random with one of the k
colors. - 2. Compute the number of cycles with a chord in
the coloring there are two cases -
-
- 2.1 Compute X1,v the number of
k-length cycles with a chord in case 1. - 2.2 Compute X2,v the number of
k-length cycles with a chord in case 2. - 3. Let Yv the average of all the s X1,vX2,v.
- 4. Return the median of all the t Yv multiplied
by kk/k!.
Case 2
Case 1
Repeat s4kk/?2k! times
26Computing the number of paths between v,w, for
every color-set S
w
v
- For all S?? k s.t Sl C(v,w,S)1 if
col(v)col(w)l, and 0 otherwise. - For q2 to k, for all S?? k s.t Sq
- C(v,w,S) ? C(u,w,S\col(v))
u?N(v)
w
v
u?N(v)
27Case 1
1?z?l-1
- P(v,u,w,S) ?1?z?l-1? C(v,w,S1)?C(v,u,S2)
l-z-length colorful paths between v and u using
colors in S2
z-length colorful paths between u and w using
colors in S1
The sum is over all (S1,S2) in Av,z,l(S)
28Case 1
- of cycles with a chord in case 1 ? ? ?
P(v,u,w,S3)?C(u,w,S4)
1?z?l-1
l
(u,w)?E
l-length colorful paths between u and w that
are adjacent to v using colors in S3
k-l-length colorful paths between u and w using
colors in S4
The sum is over all (S3,S4) in Au,wl,k(k) and
over all (S3,S4) in Au,w,k-l,k(k)
29Case 2
- of cycles with a chord in case 2 ? ? ?
C(v,w,S1)?C(v,w,S2)
l
w?N(v)
k-l-length colorful paths between v and w using
colors in S2
l-length colorful paths between v and w using
colors in S1
The sum is over all (S1,S2) in Av,w,l,k(k)
30Time Complexity
Case 1
The time complexity of computing C(v,w,S) for
every color-set S and every pair of vertices v,w
in a given coloring is O(2k?E?V2). The time
complexity of computing P(v,u,w,S) for every
color-set S and fixed v,u,w (assuming C(v,w,S) is
known) is
Choosing the colors of the path between u and w
that is going through v
Choosing the colors of the path between v and w
31Time Complexity
Case 1
?The time complexity of the first case, for every
edge (u,w), every vertex v, and every color-set
S, in a given coloring is
32Time Complexity
Case 2
The time complexity of the second case, for every
vertex v, every neighbor of v, and every
color-set S, in a given coloring is
Choosing the colors of the path between v and w
33Time Complexity
?The total time complexity, for every vertex v
and every color-set S, in a given coloring, is
O(ek?E?V2).
Time complexity O(e2k?E?V2?log(1/?)/?2)
34Conclusions and Future Work
- We have presented a fast algorithm for
approximating the number of Network Motifs a node
a part of. - We also presented algorithms for counting the
total number of occurrences of these Network
Motifs when no efficient algorithm exists. - Can we find sublinear algorithms for counting
these motifs?
35Thank You!
- gonenmir_at_post.tau.ac.il