Euclidean embedding of metric spaces - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Euclidean embedding of metric spaces

Description:

Complicated metrics arise naturally in a number of applications. Image databases ... Isometric embedding |d(x, Ai) d(y, Ai)| d(x,y) Follows from triangle inequality ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 21
Provided by: amb79
Category:

less

Transcript and Presenter's Notes

Title: Euclidean embedding of metric spaces


1
Euclidean embedding of metric spaces
  • Complicated metrics arise naturally in a number
    of applications
  • Image databases
  • Biological databases
  • Usefulness of embedding into Euclidean spaces
  • Space reduction
  • Apply existing algorithms

2
Lipschitz embedding
  • Each reference set of points Ai defines a
    dimension. Total of k sets.
  • Compute the minimum distance of each data point
    to a set of reference sets and divide by k.
  • R A1, A2, .. Ak
  • Map each data point to a vector of k dimensions.
  • Special case L8 embedding considered earlier.
  • Each reference set is a singleton
  • Each point defines a reference set
  • Isometric embedding
  • d(x, Ai) d(y, Ai) d(x,y)
  • Follows from triangle inequality
  • Implies a contractive mapping under L1

3
Results
  • Bourgain 85, Linial London Rabinovich 95 Any
    n-point metric space can be embedded into an
    O(log2 n) dimensional Euclidean space and L1
    metric with O(log n) distortion
  • Contractive mapping
  • Randomized construction
  • Possible to extend to any Lp metric.

4
LLR Embedding Algorithm
  • Let q clog n.
  • For i 1,2,,log n doFor j 1 to q do Ai,j
    random subset of size n/2i.
  • Map x to the vector di,j/Q where di,j is the
    distance from x to the closest point in Ai,j and
    Q is the total number of subsets.

5
Proof of contraction
  • Let fij(x) d(x, Aij) / Q
  • Let dij(f(x),f(y)) fij(x) fij(y) d(x,
    Aij) d(y, Aij) / Q
  • dij(f(x),f(y)) d(x,y)/Q
  • S dij(f(x),f(y)) d(x,y)
  • dL1(f(x),f(y)) d(x,y)

6
Proof of O(log n) distortion
  • Consider two arbitrary points x and y.
  • Let ri be smallest radius such that B (x, ri)
    contains at least 2i points. Similarly, define
    ri corresponding to y.
  • Let ?i max(ri , ri).
  • B(x, ?i) 2i , B(y, ?i) 2i
  • ?i-1 ?i
  • ?0 0
  • Consider values of i (up to t-1) as long as ?i lt
    d(x,y)/4.
  • Choose ?t d(x,y)/2

7
  • For each i, define an evil ball Ei and a good
    ball Gi, one centered around x and the other
    around y.
  • If ?i ri then Ei Bo(x, ?i) and Gi B(y,
    ?i-1).
  • Otherwise flip the choices.
  • Ei contains at most 2i points and Gi contains at
    least 2i-1 points.
  • They are also disjoint due to distance
    restriction.
  • With a constant probability (gt 1/16) , subset
    Ai,j will include a point from Gi and none from
    Ei.
  • inclusion probability ¼
  • non-inclusion probability ¼
  • d(x,Aij) ?i and d(y,Aij) ?i-1
  • dij(f(x),f(y))Q ?i - ?i-1

8
  • Repeating this sampling q O(log n) times, at
    least k log n of these sets will intersect Gi and
    miss Ei with probability at least 1 1/n3.
  • di(f(x),f(y)) Q (?i - ?i-1 ) (k log n)
  • Repeating over all i, with probability at least 1
    1/n2,
  • dL1(f(x),f(y))Q S (?i - ?i-1 )(k log n)
  • dL1(f(x),f(y))Q ?t k log n
  • dL1(f(x),f(y)) d(x,y) (k log n)/2Q
  • dL1(f(x),f(y)) d(x,y) / O(log n)
  • Repeating over all point pairs, with probability
    at least ½, the distortion is at most O(log n).

9
Using arbitrary Lp metric
  • The same embedding also works for any arbitrary
    Lp metric.
  • Normalize the vector by dividing the entries by
    Q1/p instead of Q.
  • Proof for L2 metric here. Can be extended to
    other values of p.
  • Let Q clog2n be the number of dimensions of the
    embedding f.
  • f(x) f(y)2 v S fij(x) fij (y)2
  • (1/vQ) v S d(x,Aij) d(y,Aij)2
  • (1/vQ) v S d(x,y)2
  • d(x,y)

10
  • a ltaijgt with aij fij(x) fij (y)
  • b ltbijgt with bij 1
  • a2 b2 lta,bgt , Schwarz inequality
  • (vS ( fij(x) fij (y))2)vQ S fij(x) fij
    (y)
  • f(x) f(y)2 vQ f(x) f(y)1
  • f(x) f(y)2 vQ fL1(x) fL1 (y)1 vQ
  • f(x) f(y)2 fL1(x) fL1 (y)1
  • d(x,y) / O(log n) , from L1 case
  • dL2(f(x),f(y)) d(x,y) / O(log n)

11
Practical aspects
  • Number and size of reference sets
  • Proof uses 576 log2n reference sets!
  • High cost of transformation
  • SparseMap
  • Optimizes LLR by using heuristics for
  • Reducing the number of distance computations
  • Greedy resampling to reduce the number of
    dimensions further
  • No guarantees on ditortion
  • No longer contractive

12
Embedding of Euclidean spaces
  • (Johnson-Lindenstrauss) There exists an
    embedding of n points in (Rm, L2) into (Rk, L2)
    where k O((log n)/?2) with distortion 1?.
  • Randomized construction
  • Choose vectors r1, r2, .. rk where each component
    rij is drawn from a normal distribution
    N(0,1)---zero mean and unit variance.
  • f(v) embedding for vector v lt ltv, r1gt, ltv,
    r2gt, ltv, rkgt gt
  • Normalize by dividing by vk.

13
Proof
  • Given a point v and a random vector r, consider X
    ltv,rgt. Let v l.
  • E(X) E(ltv,rgt) 0
  • Var(X) Var(ltv,rgt) S Var (vjrj) S vj2Var
    (rj) v2 l2
  • E(f(v)2) E(S ltv,rigt2)
  • S E(ltv,rigt2)
  • S E (S vj2rij2) S (vj vk rij rik ))
  • S l2 S vi vk E(rij ) E(rik )
  • kl2

j
j
i
i
i
j
j ltgt k
i
j ltgt k
14
Proof
  • If k O(log n/? 2) then f(v)2 is
    concentrated around the mean with high
    probability
  • (1- ?) kl2 f(v)2 (1 ?) kl2 with
    probability at least c(1-1/n2).
  • Set v x-y.
  • (1- ?) k x-y2 f(x-y)2 (1 ?) k
    x-y2 with probability at least c(1-1/n2)
  • (1- ?) k x-y2 f(x)-f(y)2 (1
    ?)kx-y2 with probability at least c(1-1/n2)
  • (1- ?) k x-y2 f(x)-f(y)2 (1
    ?)kx-y2 for all x,y with probability at
    least 1/2

15
Other possibilities for basis vectors
  • Achlioptas, PODS 2001 Use of the following
    vectors avoids real number multiplications
  • element Rij v 3 .
  • element Rij

1 with prob 1/6 0 with prob 2/3 -1 with prob 1/6
1 with prob 1/2 -1 with prob 1/2
16
Applications
  • Useful when d is very high
  • N 106, e 0.2 implies that k 500
  • Speed up the computation of SVD for LSI
    Papadimitriou et al, PODS 2001
  • One-pass summary of streams Indyk, FOCS 2000
  • Clustering Schulman, STOC 2000
  • NN searches Indyk and Motwani, STOC 1998
  • Comparison with SVD for image and text Bingham,
    Mannila, KDD 2001
  • Motifs in biological sequences Buhler and Tompa,
    JCB 2002

17
Other results
  • LLR 95 There is an n-point metric space such
    that any embedding in (Rk,L2) has distortion of
    at least O(log n).

18
Locality Sensitive Hashing
  • Approximate Nearest Neighbors
  • Construct a set L of hash functions. Each hash
    function has the property that
  • If points are close then a high probability of
    collision
  • If points are far then a low probability of
    collisions
  • Combine the results from L functions to answer
    the query
  • Can use random projections instead of hash
    functions.
  • Other variations in recent literature
  • Discussed later (NN searches)

19
Bibliography
  • G. Hjaltson and H. Samet, Properties of Embedding
    Methods for Similarity Searching in Metric
    Spaces, IEEE Transactions on Pattern Analysis and
    Machine Intelligence, 25(5) 530-549 (May 2003).
  • J. Bourgain, On Lipschitz embedding of finite
    metric spaces in Hilbert space, Israel J. of
    Math., 52, 1985, 46-52.
  • N. Linial, E. London and Y. Rabinovich, The
    Geometry of Graphs and some of its algorithmic
    applications, Combinatorica, 15, 1995, 215-245.
  • D. Achlioptas, Database-friendly random
    projections, PODS 2001.

20
Bibliography
  • W.B. Johnson and J. Lindenstrauss, Extensions of
    Lipschitz maps into Hilbert spaces, Contemporary
    Mathematics, 29189-206.
  • C.H. Papadimitriou, P. Raghvan, H. Tamaki, and S.
    Vempala, Latent Semantic Indexing A
    probabilisttic analysis, PODS 1998 159-168.
  • P. Indyk and R. Motwani, Towards removing the
    curse of dimensionality, STOC 1998 604-613.
  • P. Indyk, Stable distributions, pseudorandom
    generators, embeddings and data stream
    computations, FOCS 2000 189-197.
  • L.J. Schulman, Clustering for edge-cost
    minimization, STOC 2000 547-555.
  • E. Bingham and H. Mannila, Random projection in
    dimensionality reduction Applications to image
    and text data, KDD 2001 245-250.
  • J. Buhler and M. Tompa, Finding motifs using
    random projections, JCB, 9(2), pp. 225-242,
    2002
Write a Comment
User Comments (0)
About PowerShow.com