A tutorial on spectral clustering - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

A tutorial on spectral clustering

Description:

The epsilon-neighborhood graph. Connect all points whose pairwise distances are smaller than epsilon. K-nearest neighbor graph ... – PowerPoint PPT presentation

Number of Views:523
Avg rating:3.0/5.0
Slides: 25
Provided by: fanb
Category:

less

Transcript and Presenter's Notes

Title: A tutorial on spectral clustering


1
A tutorial on spectral clustering
  • Ulrike von Luxburg
  • Presented by Fanbin Bu
  • Nov. 20, 2008

2
Outline
  • Introduction
  • Graph Laplacians and their basic properties
  • Spectral clustering algorithms
  • Why do these algorithms work?
  • Practical details

3
Introduction
  • Graph notation. G(V,E)
  • Adjacency matrix
  • Degree matrix D
  • A the number of vertices in A
  • vol(A)

4
Introduction
  • Similarity graphs
  • The epsilon-neighborhood graph
  • Connect all points whose pairwise distances are
    smaller than epsilon.
  • K-nearest neighbor graph
  • Connect vertex vi with vertex vj if vj is among
    the k nearest neighbor of vi
  • K nearest neighbor graph
  • Mutual k nearest neighbor graph
  • The fully connected graph
  • Connect all points with positive similarity

5
(No Transcript)
6
Graph Laplacians
  • Every author calls his matrix the graph
    Laplacian.
  • Assume that G is an undirected.
  • Different graph Laplacians
  • Unnormalized
  • Normalized
  • Symmetric
  • Random walk

7
Properties of Graph Laplacian L
8
Properties of Graph Laplacian L
9
PROPERTIES OF L_SYM AND L_RW
10
PROPERTIES OF L_SYM AND L_RW
11
Spectral clustering algorithms
12
Spectral clustering algorithms
13
Spectral clustering algorithms
14
A toy example
200 points, Gaussian distribution Similarity
function Similarity graph fully
connected 10-nearest neighbor Graph
Laplacians unnormalized L normalized L_rw
15
(No Transcript)
16
Why do these algorithms work?
  • Graph cut point of view
  • Random walks point of view
  • Perturbation theory point of view

17
Graph cut point of view
  • For two disjoint subsets
  • For k subsets, want to minimize
  • Problem the solution simply consists in
    separating one individual vertex from the rest of
    the graph.
  • Solution explicitly request large subsets.

18
RatioCut
  • Standard trace minimization problem solution is
    given by
  • the Rayleigh-Ritz theorem. H is the matrix
    which contains
  • the first k eigenvectors of L as columns. HU

19
Ncut
  • Standard trace minimization problem solution is
    given by
  • the Rayleigh-Ritz theorem. H is the matrix
    which contains
  • the first k eigenvectors of L_rw as columns.

20
Random walks point of view
  • Transition matrix
  • Graph Laplacian
  • Relation between Ncut and random walk
  • When minimizing Ncut, we actually look for a cut
    through the graph such that a random walk seldom
    transitions from A to the rest of A and vice
    versa.

21
Perturbation point of view
  • Ideal case between-cluster similarity is 0
  • The first k eigenvectors of L/L_rw are
    indicators.
  • K-means finds the clusters trivially.
  • Nearly ideal case between-cluster similarity is
    close to 0.
  • Eigenvectors are close to ideal indicator
    vectors.
  • Formal perturbation argument Davis-Kahan.

22
Perturbation point of view
  • In the ideal case, the eigenvectors of L and L_rw
    are indicator vectors. No problem!
  • In the ideal case, the eigenvectors of L_sym is
  • Problem vertex with low degree.
  • Problem still exists even after
    row-normalization.

23
Practical details
  • Constructing the similarity graph
  • Similarity function
  • Choice of similarity graph
  • Type
  • Parameters
  • Computing the eigenvectors
  • Sparse matrix
  • Krylov subspace method Lanczos method
  • The number of clusters
  • Eigengap heuristic
  • Choosing graph Laplacian
  • Normalized graph Laplacian L_rw recommended
  • Regular graph, same degree. Not a big deal.

24
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com