Large Graph Mining: Power Tools and a Practitioner - PowerPoint PPT Presentation

About This Presentation

Title:

Large Graph Mining: Power Tools and a Practitioner

Description:

Large Graph Mining: Power Tools and a Practitioner s Guide Christos Faloutsos Gary Miller Charalampos (Babis) Tsourakakis CMU – PowerPoint PPT presentation

Number of Views:135

Avg rating:3.0/5.0

Slides: 76

Provided by: mathCmuEd

Learn more at: https://www.math.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Large Graph Mining: Power Tools and a Practitioner

1
Large Graph MiningPower Tools and a
Practitioners Guide

Christos Faloutsos
Gary Miller
Charalampos (Babis) Tsourakakis
CMU

2
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Eg., Bipartite
Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Cheeger Inequality and Sparsest Cut
Derivation, intuition
Example
Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-2
3
Matrix Representations of G(V,E)

Associate a matrix to a graph
Adjacency matrix
Laplacian
Normalized Laplacian

Main focus
4
Recall Intuition

A as vector transformation

x
x
x

x
1
3
2
1
5
Intuition

By defn., eigenvectors remain parallel to
themselves (fixed points)

v1
v1
l1
3.62

6
Intuition

By defn., eigenvectors remain parallel to
themselves (fixed points)
And orthogonal to each other

7
Keep in mind!

For the rest of slides we will be talking for
square nxn matrices and symmetric ones, i.e,

8
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Eg., Bipartite
Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Cheeger Inequality and Sparsest Cut
Derivation, intuition
Example
Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-8
9
Adjacency matrix
Undirected
4
1
A
2
3
10
Adjacency matrix
Undirected Weighted
4
10
1
4
0.3
A
2
3
2
11
Adjacency matrix
Directed
4
1
ObservationIf G is undirected,A AT
2
3
12
Spectral Theorem

Theorem Spectral Theorem
If MMT, then

0
0
Reminder 1 xi,xj orthogonal
x2
x1
13
Spectral Theorem

Theorem Spectral Theorem
If MMT, then

0
0
l2
Reminder 2 xi i-th principal
axis ?i length of i-th principal
axis
l1
KDD'09
Faloutsos, Miller, Tsourakakis
P7-13
14
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Eg., Bipartite
Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Cheeger Inequality and Sparsest Cut
Derivation, intuition
Example
Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-14
15
Eigenvectors

Give groups
Specifically for bi-partite graphs, we get each
of the two sets of nodes
Details

16
Bipartite Graphs
Any graph with no cycles of odd length is
bipartite
K3,3
1
4
2
5
Q1 Can we check if a graph is bipartite
via its spectrum?Q2 Can we get the partition of
the vertices in the two sets of nodes?
3
6
17
Bipartite Graphs
Adjacency matrix
K3,3
1
4
where
2
5
3
6
Eigenvalues
?3,-3,0,0,0,0
18
Bipartite Graphs
Adjacency matrix
K3,3
1
4
where
2
5
3
6

Why ?1-?23?Recall Ax?x, (?,x)
eigenvalue-eigenvector

KDD'09
Faloutsos, Miller, Tsourakakis
P7-18
19
Bipartite Graphs
1
1
1
1
2
3
33x1
1
4
1
4
1
1
1
2
5
5
1
1
1
3
6
6
Value _at_ each node eg., enthusiasm about a product
20
Bipartite Graphs
1
1
1
33x1
1
4
1
4
1
1
1
2
5
5
1
1
1
3
6
6
1-vector remains unchanged (just grows by 3
l1 )
KDD'09
Faloutsos, Miller, Tsourakakis
P7-20
21
Bipartite Graphs
1
1
1
33x1
1
4
1
4
1
1
1
2
5
5
1
1
1
3
6
6
Which other vector remains unchanged?
KDD'09
Faloutsos, Miller, Tsourakakis
P7-21
22
Bipartite Graphs
1
-1
-1
-1
-2
-3
-3(-3)x1
1
4
1
4
1
-1
-1
2
5
5
1
-1
-1
3
6
6
23
Bipartite Graphs

Observationu2 gives the partition of the nodes
in the two sets S, V-S!

6
5
3
2
1
4
S
V-S
Question Were we just lucky?
Answer No
Theorem ?2-?1 iff G bipartite. u2 gives the
partition.
24
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Eg., Bipartite
Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Cheeger Inequality and Sparsest Cut
Derivation, intuition
Example
Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-24
25
Walks

A walk of length r in a directed graphwhere a
node can be used more than once.
Closed walk when

4
4
1
1
Closed walk of length 3 2-1-3-2
Walk of length 2 2-1-4
2
3
2
3
26
Walks

Theorem G(V,E) directed graph, adjacency matrix
A. The number of walks from node u to node v in G
with length r is (Ar)uv
Proof Induction on k. See Doyle-Snell, p.165

27
Walks

Theorem G(V,E) directed graph, adjacency matrix
A. The number of walks from node u to node v in G
with length r is (Ar)uv

(i, i1),(i1,j)
(i,i1),..,(ir-1,j)
(i,j)
KDD'09
Faloutsos, Miller, Tsourakakis
P7-27
28
Walks
4
1
2
3
4
i2, j4
1
2
3
KDD'09
Faloutsos, Miller, Tsourakakis
P7-28
29
Walks
4
1
2
3
i3, j3
4
1
2
3
30
Walks
4
1
2
3
Always 0,node 4 is a sink
4
1
3
2
31
Walks

Corollary If A is the adjacency matrix of
undirected G(V,E) (no self loops), e edges and t
triangles. Then the following holda) trace(A)
0 b) trace(A2) 2ec) trace(A3) 6t

1
1
2
1
2
3
32
Walks

Corollary If A is the adjacency matrix of
undirected G(V,E) (no self loops), e edges and t
triangles. Then the following holda) trace(A)
0 b) trace(A2) 2ec) trace(A3) 6t

Computing Ar may beexpensive!
KDD'09
Faloutsos, Miller, Tsourakakis
P7-32
33
Remark virus propagation

The earlier result makes sense now
The higher the first eigenvalue, the more paths
available -gt
Easier for a virus to survive

34
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Eg., Bipartite
Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Cheeger Inequality and Sparsest Cut
Derivation, intuition
Example
Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-34
35
Main upcoming result

the second eigenvector of the Laplacian (u2)
gives a good cut
Nodes with positive scores should go to one group
And the rest to the other

36
Laplacian
4
1
L D-A
2
3
Diagonal matrix, diidi
37
Weighted Laplacian
4
10
1
4
0.3
2
3
2
38
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Eg., Bipartite
Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Cheeger Inequality and Sparsest Cut
Derivation, intuition
Example
Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-38
39
Connected Components

Lemma Let G be a graph with n vertices and c
connected components. If L is the Laplacian of G,
then rank(L) n-c.
Proof see p.279, Godsil-Royle

40
Connected Components
G(V,E)
L
1
2
3
6
4
zeros components
7
5
eig(L)
41
Connected Components
G(V,E)
L
1
2
3
0.01
6
4
zeros components
Indicates a good cut
7
5
eig(L)
42
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Eg., Bipartite
Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Cheeger Inequality and Sparsest Cut
Derivation, intuition
Example
Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-42
43
Adjacency vs. Laplacian Intuition
details
V-S
Let x be an indicator vector
S
Consider now yLx
k-th coordinate
44
Adjacency vs. Laplacian Intuition
details
G30,0.5
S
Consider now yLx
k
45
Adjacency vs. Laplacian Intuition
details
G30,0.5
S
Consider now yLx
k
46
Adjacency vs. Laplacian Intuition
details
G30,0.5
S
Consider now yLx
k
k
Laplacian connectivity, Adjacency paths
47
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Eg., Bipartite
Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger inequality
Derivation, intuition
Example
Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-47
48
Why Sparse Cuts?

Clustering, Community Detection
And more Telephone Network Design, VLSI layout,
Sparse Gaussian Elimination, Parallel Computation

cut
4
8
1
5
9
2
3
6
7
49
Quality of a Cut

Isoperimetric number f of a cut S

nodes in smallest partition
edges across
4
1
2
3
50
Quality of a Cut

Isoperimetric number f of a graph score of best
cut

4
1
and thus
2
3
51
Quality of a Cut

Isoperimetric number f of a graph score of best
cut

Best cut hard to find BUT
Cheegers inequality gives
bounds l2 Plays major role
4
1
2
3
Lets see the intuition behind l2
KDD'09
Faloutsos, Miller, Tsourakakis
P7-51
52
Laplacian and cuts - overview

A cut corresponds to an indicator vector (ie.,
0/1 scores to each node)
Relaxing the 0/1 scores to real numbers, gives
eventually an alternative definition of the
eigenvalues and eigenvectors

53
Why ?2?
V-S
Characteristic Vector x
S
Edges across cut
Then
54
Why ?2?
S
V-S
cut
4
8
1
5
9
2
3
6
7
x1,1,1,1,0,0,0,0,0T
xTLx2
55
Why ?2?
details
Ratio cut
Sparsest ratio cut
NP-hard
Relax the constraint
?
Normalize
56
Why ?2?
details
Sparsest ratio cut
NP-hard
Relax the constraint
?2
Normalize
because of the Courant-Fisher theorem (applied to
L)
57
Why ?2?
OSCILLATE
x1
xn
Each ball 1 unit of mass
Dfn of eigenvector
Matrix viewpoint
KDD'09
Faloutsos, Miller, Tsourakakis
P7-57
58
Why ?2?
OSCILLATE
x1
xn
Each ball 1 unit of mass
Force due to neighbors
displacement
Hookes constant
Physics viewpoint
KDD'09
Faloutsos, Miller, Tsourakakis
P7-58
59
Why ?2?
OSCILLATE
x1
xn
Each ball 1 unit of mass
Node id
Eigenvector value
For the first eigenvector All nodes same
displacement ( value)
60
Why ?2?
OSCILLATE
x1
xn
Each ball 1 unit of mass
Node id
Eigenvector value
KDD'09
Faloutsos, Miller, Tsourakakis
P7-60
61
Why ?2?
Fundamental mode of vibration along the
separator
62
Cheeger Inequality
Score of best cut (hard to compute)
2nd smallest eigenvalue (easy to compute)
Max degree
63
Cheeger Inequality and graph partitioning
heuristic

Step 1 Sort vertices in non-decreasing order
according to their score of the second
eigenvector
Step 2 Decide where to cut.
Bisection
Best ratio cut

Two common heuristics
KDD'09
Faloutsos, Miller, Tsourakakis
P7-63
64
Outline

Reminders
Adjacency matrix
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger inequality
Derivation, intuition
Example
Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-64
65
Example Spectral Partitioning

K500

K500

dumbbell graph
A zeros(1000) A(1500,1500)ones(500)-eye(500
) A(5011000,5011000) ones(500)-eye(500)
myrandperm randperm(1000) B
A(myrandperm,myrandperm)
In social network analysis, such clusters are
called communities
66
Example Spectral Partitioning

This is how adjacency matrix of B looks

spy(B)
67
Example Spectral Partitioning

This is how the 2nd eigenvector of B looks like.

L diag(sum(B))-Bu v eigs(L,2,'SM')plot(u
(,1),x)
Not so much information yet
68
Example Spectral Partitioning

This is how the 2nd eigenvector looks if we sort
it.

ign ind sort(u(,1))plot(u(ind),'x')
But now we see the two communities!
69
Example Spectral Partitioning

This is how adjacency matrix of B looks now

spy(B(ind,ind))
Community 1
Cut here!
Observation Both heuristics are equivalent for
the dumbbell
Community 2
70
Outline

Reminders
Adjacency matrix
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger inequality
Normalized Laplacian

KDD'09
Faloutsos, Miller, Tsourakakis
P7-70
71
Why Normalized Laplacian

K500

K500

The onlyweightededge!
Cut here
Cut here
f
f
gt
So, f is not good here
72
Why Normalized Laplacian

K500

K500

The onlyweightededge!
Cut here
Cut here
f
f
Optimize Cheegerconstant h(G), balanced cuts
gt
where
73
Extensions

Normalized Laplacian
Ng, Jordan, Weiss Spectral Clustering
Laplacian Eigenmaps for Manifold Learning
Computer Vision and many more applications

Standard reference Spectral Graph
TheoryMonograph by Fan Chung Graham
74
Conclusions

Spectrum tells us a lot about the graph
Adjacency Paths
Laplacian Sparse Cut
Normalized Laplacian Normalized cuts, tend to
avoid unbalanced cuts

75
References

Fan R. K. Chung Spectral Graph Theory (AMS)
Chris Godsil and Gordon Royle Algebraic Graph
Theory (Springer)
Bojan Mohar and Svatopluk Poljak Eigenvalues in
Combinatorial Optimization, IMA Preprint Series
939
Gilbert Strang Introduction to Applied
Mathematics (Wellesley-Cambridge Press)

Write a Comment

User Comments (0)