A tutorial on spectral clustering - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

A tutorial on spectral clustering

Description:

The epsilon-neighborhood graph. Connect all points whose pairwise distances are smaller than epsilon. K-nearest neighbor graph ... – PowerPoint PPT presentation

Number of Views:526

Avg rating:3.0/5.0

Slides: 25

Provided by: fanb

Category:

more less

Transcript and Presenter's Notes

Title: A tutorial on spectral clustering

1
A tutorial on spectral clustering

Ulrike von Luxburg
Presented by Fanbin Bu
Nov. 20, 2008

2
Outline

Introduction
Graph Laplacians and their basic properties
Spectral clustering algorithms
Why do these algorithms work?
Practical details

3
Introduction

Graph notation. G(V,E)
Adjacency matrix
Degree matrix D
A the number of vertices in A
vol(A)

4
Introduction

Similarity graphs
The epsilon-neighborhood graph
Connect all points whose pairwise distances are
smaller than epsilon.
K-nearest neighbor graph
Connect vertex vi with vertex vj if vj is among
the k nearest neighbor of vi
K nearest neighbor graph
Mutual k nearest neighbor graph
The fully connected graph
Connect all points with positive similarity

5
(No Transcript)
6
Graph Laplacians

Every author calls his matrix the graph
Laplacian.
Assume that G is an undirected.
Different graph Laplacians
Unnormalized
Normalized
Symmetric
Random walk

7
Properties of Graph Laplacian L
8
Properties of Graph Laplacian L
9
PROPERTIES OF L_SYM AND L_RW
10
PROPERTIES OF L_SYM AND L_RW
11
Spectral clustering algorithms
12
Spectral clustering algorithms
13
Spectral clustering algorithms
14
A toy example
200 points, Gaussian distribution Similarity
function Similarity graph fully
connected 10-nearest neighbor Graph
Laplacians unnormalized L normalized L_rw
15
(No Transcript)
16
Why do these algorithms work?

Graph cut point of view
Random walks point of view
Perturbation theory point of view

17
Graph cut point of view

For two disjoint subsets
For k subsets, want to minimize
Problem the solution simply consists in
separating one individual vertex from the rest of
the graph.
Solution explicitly request large subsets.

18
RatioCut

Standard trace minimization problem solution is
given by
the Rayleigh-Ritz theorem. H is the matrix
which contains
the first k eigenvectors of L as columns. HU

19
Ncut

Standard trace minimization problem solution is
given by
the Rayleigh-Ritz theorem. H is the matrix
which contains
the first k eigenvectors of L_rw as columns.

20
Random walks point of view

Transition matrix
Graph Laplacian
Relation between Ncut and random walk
When minimizing Ncut, we actually look for a cut
through the graph such that a random walk seldom
transitions from A to the rest of A and vice
versa.

21
Perturbation point of view

Ideal case between-cluster similarity is 0
The first k eigenvectors of L/L_rw are
indicators.
K-means finds the clusters trivially.
Nearly ideal case between-cluster similarity is
close to 0.
Eigenvectors are close to ideal indicator
vectors.
Formal perturbation argument Davis-Kahan.

22
Perturbation point of view

In the ideal case, the eigenvectors of L and L_rw
are indicator vectors. No problem!
In the ideal case, the eigenvectors of L_sym is
Problem vertex with low degree.
Problem still exists even after
row-normalization.

23
Practical details