Large Graph Mining: Power Tools and a Practitioner - PowerPoint PPT Presentation

About This Presentation
Title:

Large Graph Mining: Power Tools and a Practitioner

Description:

Large Graph Mining: Power Tools and a Practitioner s Guide Christos Faloutsos Gary Miller Charalampos (Babis) Tsourakakis CMU – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 76
Provided by: mathCmuEd
Learn more at: https://www.math.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Large Graph Mining: Power Tools and a Practitioner


1
Large Graph MiningPower Tools and a
Practitioners Guide
  • Christos Faloutsos
  • Gary Miller
  • Charalampos (Babis) Tsourakakis
  • CMU

2
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Eg., Bipartite
    Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Cheeger Inequality and Sparsest Cut
  • Derivation, intuition
  • Example
  • Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-2
3
Matrix Representations of G(V,E)
  • Associate a matrix to a graph
  • Adjacency matrix
  • Laplacian
  • Normalized Laplacian

Main focus
4
Recall Intuition
  • A as vector transformation

x
x
x

x
1
3
2
1
5
Intuition
  • By defn., eigenvectors remain parallel to
    themselves (fixed points)

v1
v1
l1
3.62

6
Intuition
  • By defn., eigenvectors remain parallel to
    themselves (fixed points)
  • And orthogonal to each other

7
Keep in mind!
  • For the rest of slides we will be talking for
    square nxn matrices and symmetric ones, i.e,

8
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Eg., Bipartite
    Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Cheeger Inequality and Sparsest Cut
  • Derivation, intuition
  • Example
  • Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-8
9
Adjacency matrix
Undirected
4
1
A
2
3
10
Adjacency matrix
Undirected Weighted
4
10
1
4
0.3
A
2
3
2
11
Adjacency matrix
Directed
4
1
ObservationIf G is undirected,A AT
2
3
12
Spectral Theorem
  • Theorem Spectral Theorem
  • If MMT, then

0
0
Reminder 1 xi,xj orthogonal
x2
x1
13
Spectral Theorem
  • Theorem Spectral Theorem
  • If MMT, then

0
0
l2
Reminder 2 xi i-th principal
axis ?i length of i-th principal
axis
l1
KDD'09
Faloutsos, Miller, Tsourakakis
P7-13
14
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Eg., Bipartite
    Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Cheeger Inequality and Sparsest Cut
  • Derivation, intuition
  • Example
  • Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-14
15
Eigenvectors
  • Give groups
  • Specifically for bi-partite graphs, we get each
    of the two sets of nodes
  • Details

16
Bipartite Graphs
Any graph with no cycles of odd length is
bipartite
K3,3
1
4
2
5
Q1 Can we check if a graph is bipartite
via its spectrum?Q2 Can we get the partition of
the vertices in the two sets of nodes?
3
6
17
Bipartite Graphs
Adjacency matrix
K3,3
1
4
where
2
5
3
6
Eigenvalues
?3,-3,0,0,0,0
18
Bipartite Graphs
Adjacency matrix
K3,3
1
4
where
2
5
3
6
  • Why ?1-?23?Recall Ax?x, (?,x)
    eigenvalue-eigenvector

KDD'09
Faloutsos, Miller, Tsourakakis
P7-18
19
Bipartite Graphs
1
1
1
1
2
3
33x1
1
4
1
4
1
1
1
2
5
5
1
1
1
3
6
6
Value _at_ each node eg., enthusiasm about a product
20
Bipartite Graphs
1
1
1
33x1
1
4
1
4
1
1
1
2
5
5
1
1
1
3
6
6
1-vector remains unchanged (just grows by 3
l1 )
KDD'09
Faloutsos, Miller, Tsourakakis
P7-20
21
Bipartite Graphs
1
1
1
33x1
1
4
1
4
1
1
1
2
5
5
1
1
1
3
6
6
Which other vector remains unchanged?
KDD'09
Faloutsos, Miller, Tsourakakis
P7-21
22
Bipartite Graphs
1
-1
-1
-1
-2
-3
-3(-3)x1
1
4
1
4
1
-1
-1
2
5
5
1
-1
-1
3
6
6
23
Bipartite Graphs
  • Observationu2 gives the partition of the nodes
    in the two sets S, V-S!

6
5
3
2
1
4
S
V-S
Question Were we just lucky?
Answer No
Theorem ?2-?1 iff G bipartite. u2 gives the
partition.
24
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Eg., Bipartite
    Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Cheeger Inequality and Sparsest Cut
  • Derivation, intuition
  • Example
  • Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-24
25
Walks
  • A walk of length r in a directed graphwhere a
    node can be used more than once.
  • Closed walk when

4
4
1
1
Closed walk of length 3 2-1-3-2
Walk of length 2 2-1-4
2
3
2
3
26
Walks
  • Theorem G(V,E) directed graph, adjacency matrix
    A. The number of walks from node u to node v in G
    with length r is (Ar)uv
  • Proof Induction on k. See Doyle-Snell, p.165

27
Walks
  • Theorem G(V,E) directed graph, adjacency matrix
    A. The number of walks from node u to node v in G
    with length r is (Ar)uv

(i, i1),(i1,j)
(i,i1),..,(ir-1,j)
(i,j)
KDD'09
Faloutsos, Miller, Tsourakakis
P7-27
28
Walks
4
1
2
3
4
i2, j4
1
2
3
KDD'09
Faloutsos, Miller, Tsourakakis
P7-28
29
Walks
4
1
2
3
i3, j3
4
1
2
3
30
Walks
4
1
2
3
Always 0,node 4 is a sink
4
1
3
2
31
Walks
  • Corollary If A is the adjacency matrix of
    undirected G(V,E) (no self loops), e edges and t
    triangles. Then the following holda) trace(A)
    0 b) trace(A2) 2ec) trace(A3) 6t

1
1
2
1
2
3
32
Walks
  • Corollary If A is the adjacency matrix of
    undirected G(V,E) (no self loops), e edges and t
    triangles. Then the following holda) trace(A)
    0 b) trace(A2) 2ec) trace(A3) 6t

Computing Ar may beexpensive!
KDD'09
Faloutsos, Miller, Tsourakakis
P7-32
33
Remark virus propagation
  • The earlier result makes sense now
  • The higher the first eigenvalue, the more paths
    available -gt
  • Easier for a virus to survive

34
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Eg., Bipartite
    Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Cheeger Inequality and Sparsest Cut
  • Derivation, intuition
  • Example
  • Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-34
35
Main upcoming result
  • the second eigenvector of the Laplacian (u2)
  • gives a good cut
  • Nodes with positive scores should go to one group
  • And the rest to the other

36
Laplacian
4
1
L D-A
2
3
Diagonal matrix, diidi
37
Weighted Laplacian
4
10
1
4
0.3
2
3
2
38
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Eg., Bipartite
    Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Cheeger Inequality and Sparsest Cut
  • Derivation, intuition
  • Example
  • Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-38
39
Connected Components
  • Lemma Let G be a graph with n vertices and c
    connected components. If L is the Laplacian of G,
    then rank(L) n-c.
  • Proof see p.279, Godsil-Royle

40
Connected Components
G(V,E)
L
1
2
3
6
4
zeros components
7
5
eig(L)
41
Connected Components
G(V,E)
L
1
2
3
0.01
6
4
zeros components
Indicates a good cut
7
5
eig(L)
42
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Eg., Bipartite
    Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Cheeger Inequality and Sparsest Cut
  • Derivation, intuition
  • Example
  • Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-42
43
Adjacency vs. Laplacian Intuition
details
V-S
Let x be an indicator vector
S
Consider now yLx
k-th coordinate
44
Adjacency vs. Laplacian Intuition
details
G30,0.5
S
Consider now yLx
k
45
Adjacency vs. Laplacian Intuition
details
G30,0.5
S
Consider now yLx
k
46
Adjacency vs. Laplacian Intuition
details
G30,0.5
S
Consider now yLx
k
k
Laplacian connectivity, Adjacency paths
47
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Eg., Bipartite
    Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-47
48
Why Sparse Cuts?
  • Clustering, Community Detection
  • And more Telephone Network Design, VLSI layout,
    Sparse Gaussian Elimination, Parallel Computation

cut
4
8
1
5
9
2
3
6
7
49
Quality of a Cut
  • Isoperimetric number f of a cut S

nodes in smallest partition
edges across
4
1
2
3
50
Quality of a Cut
  • Isoperimetric number f of a graph score of best
    cut

4
1
and thus
2
3
51
Quality of a Cut
  • Isoperimetric number f of a graph score of best
    cut

Best cut hard to find BUT
Cheegers inequality gives
bounds l2 Plays major role
4
1
2
3
Lets see the intuition behind l2
KDD'09
Faloutsos, Miller, Tsourakakis
P7-51
52
Laplacian and cuts - overview
  • A cut corresponds to an indicator vector (ie.,
    0/1 scores to each node)
  • Relaxing the 0/1 scores to real numbers, gives
    eventually an alternative definition of the
    eigenvalues and eigenvectors

53
Why ?2?
V-S
Characteristic Vector x
S
Edges across cut
Then
54
Why ?2?
S
V-S
cut
4
8
1
5
9
2
3
6
7
x1,1,1,1,0,0,0,0,0T
xTLx2
55
Why ?2?
details
Ratio cut
Sparsest ratio cut
NP-hard
Relax the constraint
?
Normalize
56
Why ?2?
details
Sparsest ratio cut
NP-hard
Relax the constraint
?2
Normalize
because of the Courant-Fisher theorem (applied to
L)
57
Why ?2?
OSCILLATE
x1
xn
Each ball 1 unit of mass
Dfn of eigenvector
Matrix viewpoint
KDD'09
Faloutsos, Miller, Tsourakakis
P7-57
58
Why ?2?
OSCILLATE
x1
xn
Each ball 1 unit of mass
Force due to neighbors
displacement
Hookes constant
Physics viewpoint
KDD'09
Faloutsos, Miller, Tsourakakis
P7-58
59
Why ?2?
OSCILLATE
x1
xn
Each ball 1 unit of mass
Node id
Eigenvector value
For the first eigenvector All nodes same
displacement ( value)
60
Why ?2?
OSCILLATE
x1
xn
Each ball 1 unit of mass
Node id
Eigenvector value
KDD'09
Faloutsos, Miller, Tsourakakis
P7-60
61
Why ?2?
Fundamental mode of vibration along the
separator
62
Cheeger Inequality
Score of best cut (hard to compute)
2nd smallest eigenvalue (easy to compute)
Max degree
63
Cheeger Inequality and graph partitioning
heuristic
  • Step 1 Sort vertices in non-decreasing order
    according to their score of the second
    eigenvector
  • Step 2 Decide where to cut.
  • Bisection
  • Best ratio cut

Two common heuristics
KDD'09
Faloutsos, Miller, Tsourakakis
P7-63
64
Outline
  • Reminders
  • Adjacency matrix
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-64
65
Example Spectral Partitioning
  • K500
  • K500

dumbbell graph
A zeros(1000) A(1500,1500)ones(500)-eye(500
) A(5011000,5011000) ones(500)-eye(500)
myrandperm randperm(1000) B
A(myrandperm,myrandperm)
In social network analysis, such clusters are
called communities
66
Example Spectral Partitioning
  • This is how adjacency matrix of B looks

spy(B)
67
Example Spectral Partitioning
  • This is how the 2nd eigenvector of B looks like.

L diag(sum(B))-Bu v eigs(L,2,'SM')plot(u
(,1),x)
Not so much information yet
68
Example Spectral Partitioning
  • This is how the 2nd eigenvector looks if we sort
    it.

ign ind sort(u(,1))plot(u(ind),'x')
But now we see the two communities!
69
Example Spectral Partitioning
  • This is how adjacency matrix of B looks now

spy(B(ind,ind))
Community 1
Cut here!
Observation Both heuristics are equivalent for
the dumbbell
Community 2
70
Outline
  • Reminders
  • Adjacency matrix
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger inequality
  • Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-70
71
Why Normalized Laplacian
  • K500
  • K500

The onlyweightededge!
Cut here
Cut here
f
f
gt
So, f is not good here
72
Why Normalized Laplacian
  • K500
  • K500

The onlyweightededge!
Cut here
Cut here
f
f
Optimize Cheegerconstant h(G), balanced cuts
gt
where
73
Extensions
  • Normalized Laplacian
  • Ng, Jordan, Weiss Spectral Clustering
  • Laplacian Eigenmaps for Manifold Learning
  • Computer Vision and many more applications

Standard reference Spectral Graph
TheoryMonograph by Fan Chung Graham
74
Conclusions
  • Spectrum tells us a lot about the graph
  • Adjacency Paths
  • Laplacian Sparse Cut
  • Normalized Laplacian Normalized cuts, tend to
    avoid unbalanced cuts

75
References
  • Fan R. K. Chung Spectral Graph Theory (AMS)
  • Chris Godsil and Gordon Royle Algebraic Graph
    Theory (Springer)
  • Bojan Mohar and Svatopluk Poljak Eigenvalues in
    Combinatorial Optimization, IMA Preprint Series
    939
  • Gilbert Strang Introduction to Applied
    Mathematics (Wellesley-Cambridge Press)
Write a Comment
User Comments (0)
About PowerShow.com