Isomap%20Algorithm - PowerPoint PPT Presentation

About This Presentation
Title:

Isomap%20Algorithm

Description:

What if the data does not lie within a linear subspace? ... Riemannian manifolds M and N are isometric if there is a diffeomorphism such ... – PowerPoint PPT presentation

Number of Views:348
Avg rating:3.0/5.0
Slides: 26
Provided by: csJoe
Category:

less

Transcript and Presenter's Notes

Title: Isomap%20Algorithm


1
  • Isomap Algorithm
  • http//isomap.stanford.edu/
  • Yuri Barseghyan
  • Yasser Essiarab

2
  • Linear Methods for Dimensionality Reduction
  • PCA (Principal Component Analysis) rotate data
    so that principal axes lie in direction of
    maximum variance
  • MDS (Multi-Dimensional Scaling) find coordinates
    that best preserve pairwise distances

3
  • Limitations of Linear methods
  • What if the data does not lie within a linear
    subspace?
  • Do all convex combinations of the measurements
    generate plausible data?
  • Low-dimensional non-linear Manifold embedded in a
    higher dimensional space

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction1.pdf
4
  • Non-linear Dimensionality Reduction
  • What about data that cannot be described by
    linear combination of latent variables?
  • Ex swiss roll, s-curve
  • In the end, linear methods do nothing more than
    globally transform (rotate/translate/scale)
    data. Sometimes need to unwrap the data first

PCA
http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
5
  • Non-linear Dimensionality Reduction
  • Unwrapping the data manifold learning
  • Assume data can be embedded on a
    lower-dimensional manifold
  • Given data set X xii1n, find representation
    Y yii1n where Y lies on lower-dimensional
    manifold
  • Instead of preserving global pairwise distances,
    non-linear dimensionality reduction tries to
    preserve only the geometric properties of local
    neighborhoods

6
  • Isometry
  • From Mathworld two Riemannian manifolds M and N
    are isometric if there is a diffeomorphism such
    that the Riemannian metric from one pulls back to
    the metric on the other.
  • For a complete Riemannian manifold
  • d(x, y) geodesic distance between x and y
  • Informally, an isometry is a smooth invertible
    mapping that looks locally like a rotation plus
    translation
  • Intuitively, for 2-dimensional case, isometries
    include whatever physical transformations one can
    perform on a sheet of paper without introducing
    tears, holes, or self-intersections

7
  • Trustworthiness 2
  • The trustworthiness quanties how trustworthy is
    a projection of a high-dimensional data set onto
    a low-dimensional space.
  • Specically a projection is trustworthy if the
    set of the t nearest neighbors of each data point
    in the lowdimensional space are also close-by in
    the original space.
  • r(i, j) is the rank of the data point j in the
    ordering according to the distance from i in the
    original data space
  • Ut(i) denotes the set of those data points that
    are among the t-nearest neighbors of the data
    point i in the low-dimensional space but not in
    the original space.
  • The maximal value that trustworthiness can take
    is equal to one. The closer M(t) is to one, the
    better the low-dimensional space describes the
    originaldata.

8
  • Several methods to learn a manifold
  • Two to start
  • Isomap Tenenbaum 2000
  • Locally Linear Embeddings (LLE) Roweis and Saul,
    2000
  • Recently
  • Semidefinite Embeddings (SDE) Weinberger and
    Saul, 2005

9
An important observation
  • Small patches on a non-linear manifold look
    linear
  • These locally linear neighborhoods can be defined
    in two ways
  • k-nearest neighbors find the k nearest points to
    a given point, under some metric. Guarantees all
    items are similarly represented, limits dimension
    to K-1
  • e-ball find all points that lie within e of a
    given point, under some metric. Best if density
    of items is high and every point has a sufficient
    number of neighbors

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction1.pdf
10
  • Isomap
  • Find coordinates on lower-dimensional manifold
    that preserve geodesic distances instead of
    Euclidean distances
  • Key Observation
  • If goal is to discover
  • underlying manifold,
  • geodesic distance
  • makes more sense
  • than Euclidean

Small Euclidean distance
Large geodesic distance
http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction1.pdf
11
  • Calculating geodesic distance
  • We know how to calculate Euclidean distance
  • Locally linear neighborhoods mean that we can
    approximate geodesic distance within a
    neighborhood using Euclidean distance
  • A graph is constructed by connecting each point
    to its K nearest neighbours.
  • Approximate geodesic
  • distances are calculated by
  • finding the length of the
  • shortest path in the graph
  • between points
  • Use Dijkstras algorithm to
  • fill in remaining distances

http//www.maths.lth.se/bioinformatics/calendar/20
040527/NilssonJ_KI_27maj04.pdf
12
  • Dijkstras Algorithm
  • Greedy breadth-first algorithm to compute
    shortest path from one point to all other points

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
13
Isomap Algorithm
  • Compute fully-connected neighborhood of points
    for each item
  • Can be k nearest neighbors or e-ball
  • Calculate pairwise Euclidean distances within
    each neighborhood
  • Use Dijkstras Algorithm to compute shortest path
    from each point to non-neighboring points
  • Run MDS on resulting distance matrix

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
14
  • Isomap Algorithm 3

15
  • Time Complexity of Algorithm

http//www.cs.rutgers.edu/elgammal/classes/cs536/
lectures/NLDR.pdf
16
  • Isomap Results
  • Find a 2D embedding of the 3D S-curve

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
17
  • Residual Fitting Error
  • Plotting eigenvalues from MDS will tell you
    dimensionality of your data

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
18
  • Neighborhood Graph

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
19
  • More Isomap Results

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
20
  • Results on projecting the face dataset to two
    dimensions (Trustworthiness-Continuity) 1

21
  • More Isomap Results

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
22
  • Isomap Failures
  • Isomap has problems on closed manifolds of
    arbitrary topology

http//www.cs.unc.edu/Courses/comp290-090-s06/Lect
urenotes/DimReduction2.pdf
23
  • Isomap Advantages
  • Nonlinear
  • Globally optimal
  • Still produces globally optimal low-dimensional
    Euclidean representation even though input space
    is highly folded, twisted, or curved.
  • Guarantee asymptotically to recover the true
    dimensionality.

24
  • Isomap Disadvantages
  • Guaranteed asymptotically to recover geometric
    structure of nonlinear manifolds
  • As N increases, pairwise distances provide better
    approximations to geodesics by hugging surface
    more closely
  • Graph discreteness overestimates dM(i,j)
  • K must be high to avoid linear shortcuts near
    regions of high surface curvature
  • Mapping novel test images to manifold space

25
  • Literature
  • 1 Jarkko Venna and Samuel Kaski, Nonlinear
    dimensionality reduction viewed as information
    retrieval, NIPS' 2006 workshop on Novel
    Applications of Dimensionality Reduction, 9 Dec
    2006
  • http//www.cis.hut.fi/projects/mi/papers/nips06_nl
    drws_poster.pdf
  • 2 Claudio Varini, Visual Exploration of
    Multivariate Data in Breast Cancer by Dimensional
    Reduction, March 2006
  • http//deposit.ddb.de/cgi-bin/dokserv?idn98073472
    xdok_vard1dok_extpdffilename98073472x.pdf
  • 3 YimingWu, Kap Luk Chan, An Extended Isomap
    Algorithm for Learning Multi-Class Manifold,
    Machine Learning and Cybernetics, 2004.
    Proceedings of 2004 International Conference,
    Aug. 2004
  • http//ww2.cs.fsu.edu/ywu/PDF-files/ICMLC2004.pdf
Write a Comment
User Comments (0)
About PowerShow.com