DOULION: Counting Triangles in Massive Graphs with a Coin - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

DOULION: Counting Triangles in Massive Graphs with a Coin

Description:

New Sampling approach that counts triangles approximately. ... How small can we afford p to be and at the same time guarantee concentration? ... – PowerPoint PPT presentation

Number of Views:198
Avg rating:3.0/5.0
Slides: 35
Provided by: charalampo1
Category:

less

Transcript and Presenter's Notes

Title: DOULION: Counting Triangles in Massive Graphs with a Coin


1
DOULION Counting Triangles in Massive Graphs
with a Coin
  • Charalampos (Babis) Tsourakakis
  • Carnegie Mellon UniversityKDD 09Paris

Joint work with U Kang, Gary L. Miller, Christos
Faloutsos
2
Outline
  • Motivation
  • Related Work
  • Proposed Method
  • Results
  • Conclusion
  • Extra

3
Why is Triangle Counting important?
  • Clustering coefficient
  • Transitivity ratio
  • Social Network Analysis fact Friends of friends
    are friends

A
C
B
WF94)
  • Hidden Thematic Structure of the Web (Eckmann et
    al. PNAS EM02)
  • Motif Detection, (e.g., YPSB05 )
  • Web Spam Detection (Becchetti et.al. KDD 08
    BBCG08)

4
Personal Motivation
CET08
Political Blogs
eigenvalues of adjacency matrix
Keep only 3!
3
i-th eigenvector
5
Outline
  • Motivation
  • Related Work
  • Proposed Method
  • Results
  • Conclusion
  • Extra

6
Counting methods
  • Dense graphs

Sparse graphs
Matrix Multiplication not practical
M. Latapy, Theory and Experiments
7
Naive Sampling
  • r independent samples of three distinct vertices

X1
T3
X0
T2
T1
T0
8
Naive Sampling
  • r independent samples of three distinct vertices
  • Then the following holds

with probability at least 1-d
Works
Prohibitive for graphs with T3o(n2). e.g., T3
n2logn
9
Buriol, Frahling, Leonardi, Marchetti-Spaccamela,
Sohler
k
Sample uniformly at random an edge (i,j) and a
node k in V-i,j
?
?
i
j
Check if edges (i,k) and (j,k) exist in E(G)
samples
10
Outline
  • Motivation
  • Related Work
  • Proposed Method
  • Results
  • Conclusion
  • Extra

11
Our Sampling Approach
G(V,E)
1/p
i
j
HEADS! (i,j) survives
12
Our Sampling Approach
G(V,E)
k
m
TAILS! (k,m) dies
13
Sampling approach
14
Our Sampling Approach on Kn
Kn
Gn,0.5
In Expectation
Initially
Weighted

15
Mean and Variance
?trianglesk(?-k) k non-edge-disjoint
triangles X r.v, our estimate
E??
16
Outline
  • Motivation
  • Related Work
  • Proposed Method
  • Results
  • Conclusion
  • Extra

17
Doulion and NodeIterator
  • Sparsify first and then use Node Iterator to
    count triangles.
  • Node Iterator Consider each node and count how
    many edges among its neighbors

18
Expected Speedup
  • Expected Speedup 1/p2
  • Proof
  • Let R be the running time of Node Iterator after
    the
  • sparsification
  • Therefore, expected speedup

19
Some results (I)
3M, 35M
400K, 2.1M
20
Some results (II)
3.1M, 37M
3.6M, 42M
21
Outline
  • Motivation
  • Related Work
  • Proposed Method
  • Results
  • Conclusion
  • Extra

22
Conclusions
  • New Sampling approach that counts triangles
    approximately.
  • Basic analysis of the estimate (expectation,
    variance, expected speedup)
  • Experimentation on many real world datasets where
    we showed that for pconstant we get high quality
    estimates and 1/p2 constant speedups.

23
Question
  • Can p be smaller than constant? How small can we
    afford p to be and at the same time guarantee
    concentration?
  • Could e.g., p be as small as 1/ ???
  • Motivation

24
Outline
  • Motivation
  • Related Work
  • Proposed Method
  • Results
  • Conclusion
  • Extra

25
Approximate Triangle Counting
  • Approximate Triangle CountingArxiv preprint
    http//arxiv.org/PS_cache/arxiv/pdf/0904/0904.376
    1v1.pdf
  • C.E.T M.N. Kolountzakis
    G.L. Miller

26
TheoremC.E.T, Kolountzakis, Miller 2009
Mildness, pick p1
How to choosep?
Concentration
27
Practitioners Guide
Wikipedia 2005 1,6M nodes 18,5M edges
Pick p1/ Keep doubling until
concentration
Concentration appears
Concentration becomes stronger
28
Bad Instances
Remove edge (1,2)
Remove any weighted edgew sufficiently large
29
Thanks!
  • http//www.cs.cmu.edu/ctsourak/projects.html
  • Code and datasets available
  • graphminingtoolbox_at_gmail.com
  • (HADOOP, MATLAB, JAVA implementations along with
  • small real-world graphs, all datasets used are on
    the
  • web)
  • An article about computational science in a
    scientific publication is not the scholarship
  • itself, it is merely advertising of the
    scholarship. The actual scholarship is the
    complete
  • software environment and the complete set of
    instructions which generated the figures.
  • Buckheit and DonohoBD95

30
References
  • Efficient semi-streaming algorithms for local
    triangle counting in massive graphs
  • Becchetti, Boldi, Castillio, Gionis BBCG08
  • Commensurate distances and similar motifs in
    genetic congruence and protein interaction
    networks in yeast
  • Ye, Peyser, Spencer, Bader YPSB05

31
References
  • Curvature of co-links uncovers hidden thematic
    layers in the World Wide Web
  • Eckmann, Moses EM02

32
References
  • Fast Counting of Triangles in Large Real-World
    Networks Algorithms and LawsC. Tsourakakis
  • BD95 Wavelab and reproducible research
    Buckheit, Donoho

33
References
  • Social Network Analysis Methods and Applications
  • Wasserman, Faust WF94
  • Counting triangles in data streams
  • Buriol, Frahling, Leonardi, Spaccamela, Sohler
    BFLSS06

34
Doulion
Write a Comment
User Comments (0)
About PowerShow.com