Isomap%20Algorithm - PowerPoint PPT Presentation

About This Presentation

Title:

Isomap%20Algorithm

Description:

What if the data does not lie within a linear subspace? ... Riemannian manifolds M and N are isometric if there is a diffeomorphism such ... – PowerPoint PPT presentation

Number of Views:348

Avg rating:3.0/5.0

Slides: 26

Provided by: csJoe

Category:

more less

Transcript and Presenter's Notes

Title: Isomap%20Algorithm

1

Isomap Algorithm
http//isomap.stanford.edu/
Yuri Barseghyan
Yasser Essiarab

Linear Methods for Dimensionality Reduction
PCA (Principal Component Analysis) rotate data
so that principal axes lie in direction of
maximum variance
MDS (Multi-Dimensional Scaling) find coordinates
that best preserve pairwise distances

Limitations of Linear methods
What if the data does not lie within a linear
subspace?
Do all convex combinations of the measurements
generate plausible data?
Low-dimensional non-linear Manifold embedded in a
higher dimensional space

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction1.pdf
4

Non-linear Dimensionality Reduction
What about data that cannot be described by
linear combination of latent variables?
Ex swiss roll, s-curve
In the end, linear methods do nothing more than
globally transform (rotate/translate/scale)
data. Sometimes need to unwrap the data first

PCA
http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
5

Non-linear Dimensionality Reduction
Unwrapping the data manifold learning
Assume data can be embedded on a
lower-dimensional manifold
Given data set X xii1n, find representation
Y yii1n where Y lies on lower-dimensional
manifold
Instead of preserving global pairwise distances,
non-linear dimensionality reduction tries to
preserve only the geometric properties of local
neighborhoods

Isometry
From Mathworld two Riemannian manifolds M and N
are isometric if there is a diffeomorphism such
that the Riemannian metric from one pulls back to
the metric on the other.
For a complete Riemannian manifold
d(x, y) geodesic distance between x and y
Informally, an isometry is a smooth invertible
mapping that looks locally like a rotation plus
translation
Intuitively, for 2-dimensional case, isometries
include whatever physical transformations one can
perform on a sheet of paper without introducing
tears, holes, or self-intersections

Trustworthiness 2
The trustworthiness quanties how trustworthy is
a projection of a high-dimensional data set onto
a low-dimensional space.
Specically a projection is trustworthy if the
set of the t nearest neighbors of each data point
in the lowdimensional space are also close-by in
the original space.
r(i, j) is the rank of the data point j in the
ordering according to the distance from i in the
original data space
Ut(i) denotes the set of those data points that
are among the t-nearest neighbors of the data
point i in the low-dimensional space but not in
the original space.
The maximal value that trustworthiness can take
is equal to one. The closer M(t) is to one, the
better the low-dimensional space describes the
originaldata.

Several methods to learn a manifold
Two to start
Isomap Tenenbaum 2000
Locally Linear Embeddings (LLE) Roweis and Saul,
2000
Recently
Semidefinite Embeddings (SDE) Weinberger and
Saul, 2005

9
An important observation

Small patches on a non-linear manifold look
linear
These locally linear neighborhoods can be defined
in two ways
k-nearest neighbors find the k nearest points to
a given point, under some metric. Guarantees all
items are similarly represented, limits dimension
to K-1
e-ball find all points that lie within e of a
given point, under some metric. Best if density
of items is high and every point has a sufficient
number of neighbors

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction1.pdf
10

Isomap
Find coordinates on lower-dimensional manifold
that preserve geodesic distances instead of
Euclidean distances
Key Observation
If goal is to discover
underlying manifold,
geodesic distance
makes more sense
than Euclidean

Small Euclidean distance
Large geodesic distance
http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction1.pdf
11

Calculating geodesic distance
We know how to calculate Euclidean distance
Locally linear neighborhoods mean that we can
approximate geodesic distance within a
neighborhood using Euclidean distance
A graph is constructed by connecting each point
to its K nearest neighbours.
Approximate geodesic
distances are calculated by
finding the length of the
shortest path in the graph
between points
Use Dijkstras algorithm to
fill in remaining distances

http//www.maths.lth.se/bioinformatics/calendar/20
040527/NilssonJ_KI_27maj04.pdf
12

Dijkstras Algorithm
Greedy breadth-first algorithm to compute
shortest path from one point to all other points

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
13
Isomap Algorithm

Compute fully-connected neighborhood of points
for each item
Can be k nearest neighbors or e-ball
Calculate pairwise Euclidean distances within
each neighborhood
Use Dijkstras Algorithm to compute shortest path
from each point to non-neighboring points
Run MDS on resulting distance matrix

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
14

Isomap Algorithm 3

Time Complexity of Algorithm

http//www.cs.rutgers.edu/elgammal/classes/cs536/
lectures/NLDR.pdf
16

Isomap Results
Find a 2D embedding of the 3D S-curve

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
17

Residual Fitting Error
Plotting eigenvalues from MDS will tell you
dimensionality of your data

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
18

Neighborhood Graph

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
19

More Isomap Results

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
20

Results on projecting the face dataset to two
dimensions (Trustworthiness-Continuity) 1

More Isomap Results

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
22

Isomap Failures
Isomap has problems on closed manifolds of
arbitrary topology

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
23

Isomap Advantages
Nonlinear
Globally optimal
Still produces globally optimal low-dimensional
Euclidean representation even though input space
is highly folded, twisted, or curved.
Guarantee asymptotically to recover the true
dimensionality.

Isomap Disadvantages
Guaranteed asymptotically to recover geometric
structure of nonlinear manifolds
As N increases, pairwise distances provide better
approximations to geodesics by hugging surface
more closely
Graph discreteness overestimates dM(i,j)
K must be high to avoid linear shortcuts near
regions of high surface curvature
Mapping novel test images to manifold space

Literature
1 Jarkko Venna and Samuel Kaski, Nonlinear
dimensionality reduction viewed as information
retrieval, NIPS' 2006 workshop on Novel
Applications of Dimensionality Reduction, 9 Dec
2006
http//www.cis.hut.fi/projects/mi/papers/nips06_nl
drws_poster.pdf
2 Claudio Varini, Visual Exploration of
Multivariate Data in Breast Cancer by Dimensional
Reduction, March 2006
http//deposit.ddb.de/cgi-bin/dokserv?idn98073472
xdok_vard1dok_extpdffilename98073472x.pdf
3 YimingWu, Kap Luk Chan, An Extended Isomap
Algorithm for Learning Multi-Class Manifold,
Machine Learning and Cybernetics, 2004.
Proceedings of 2004 International Conference,
Aug. 2004
http//ww2.cs.fsu.edu/ywu/PDF-files/ICMLC2004.pdf