Euclidean embedding of metric spaces - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Euclidean embedding of metric spaces

Description:

Complicated metrics arise naturally in a number of applications. Image databases ... Isometric embedding |d(x, Ai) d(y, Ai)| d(x,y) Follows from triangle inequality ... – PowerPoint PPT presentation

Number of Views:88

Avg rating:3.0/5.0

Slides: 21

Provided by: amb79

Category:

more less

Transcript and Presenter's Notes

Title: Euclidean embedding of metric spaces

1
Euclidean embedding of metric spaces

Complicated metrics arise naturally in a number
of applications
Image databases
Biological databases
Usefulness of embedding into Euclidean spaces
Space reduction
Apply existing algorithms

2
Lipschitz embedding

Each reference set of points Ai defines a
dimension. Total of k sets.
Compute the minimum distance of each data point
to a set of reference sets and divide by k.
R A1, A2, .. Ak
Map each data point to a vector of k dimensions.
Special case L8 embedding considered earlier.
Each reference set is a singleton
Each point defines a reference set
Isometric embedding
d(x, Ai) d(y, Ai) d(x,y)
Follows from triangle inequality
Implies a contractive mapping under L1

3
Results

Bourgain 85, Linial London Rabinovich 95 Any
n-point metric space can be embedded into an
O(log2 n) dimensional Euclidean space and L1
metric with O(log n) distortion
Contractive mapping
Randomized construction
Possible to extend to any Lp metric.

4
LLR Embedding Algorithm

Let q clog n.
For i 1,2,,log n doFor j 1 to q do Ai,j
random subset of size n/2i.
Map x to the vector di,j/Q where di,j is the
distance from x to the closest point in Ai,j and
Q is the total number of subsets.

5
Proof of contraction

Let fij(x) d(x, Aij) / Q
Let dij(f(x),f(y)) fij(x) fij(y) d(x,
Aij) d(y, Aij) / Q
dij(f(x),f(y)) d(x,y)/Q
S dij(f(x),f(y)) d(x,y)
dL1(f(x),f(y)) d(x,y)

6
Proof of O(log n) distortion

Consider two arbitrary points x and y.
Let ri be smallest radius such that B (x, ri)
contains at least 2i points. Similarly, define
ri corresponding to y.
Let ?i max(ri , ri).
B(x, ?i) 2i , B(y, ?i) 2i
?i-1 ?i
?0 0
Consider values of i (up to t-1) as long as ?i lt
d(x,y)/4.
Choose ?t d(x,y)/2

For each i, define an evil ball Ei and a good
ball Gi, one centered around x and the other
around y.
If ?i ri then Ei Bo(x, ?i) and Gi B(y,
?i-1).
Otherwise flip the choices.
Ei contains at most 2i points and Gi contains at
least 2i-1 points.
They are also disjoint due to distance
restriction.
With a constant probability (gt 1/16) , subset
Ai,j will include a point from Gi and none from
Ei.
inclusion probability ¼
non-inclusion probability ¼
d(x,Aij) ?i and d(y,Aij) ?i-1
dij(f(x),f(y))Q ?i - ?i-1

Repeating this sampling q O(log n) times, at
least k log n of these sets will intersect Gi and
miss Ei with probability at least 1 1/n3.
di(f(x),f(y)) Q (?i - ?i-1 ) (k log n)
Repeating over all i, with probability at least 1
1/n2,
dL1(f(x),f(y))Q S (?i - ?i-1 )(k log n)
dL1(f(x),f(y))Q ?t k log n
dL1(f(x),f(y)) d(x,y) (k log n)/2Q
dL1(f(x),f(y)) d(x,y) / O(log n)
Repeating over all point pairs, with probability
at least ½, the distortion is at most O(log n).

9
Using arbitrary Lp metric

The same embedding also works for any arbitrary
Lp metric.
Normalize the vector by dividing the entries by
Q1/p instead of Q.
Proof for L2 metric here. Can be extended to
other values of p.
Let Q clog2n be the number of dimensions of the
embedding f.
f(x) f(y)2 v S fij(x) fij (y)2
(1/vQ) v S d(x,Aij) d(y,Aij)2
(1/vQ) v S d(x,y)2
d(x,y)

a ltaijgt with aij fij(x) fij (y)
b ltbijgt with bij 1
a2 b2 lta,bgt , Schwarz inequality
(vS ( fij(x) fij (y))2)vQ S fij(x) fij
(y)
f(x) f(y)2 vQ f(x) f(y)1
f(x) f(y)2 vQ fL1(x) fL1 (y)1 vQ
f(x) f(y)2 fL1(x) fL1 (y)1
d(x,y) / O(log n) , from L1 case
dL2(f(x),f(y)) d(x,y) / O(log n)

11
Practical aspects

Number and size of reference sets
Proof uses 576 log2n reference sets!
High cost of transformation
SparseMap
Optimizes LLR by using heuristics for
Reducing the number of distance computations
Greedy resampling to reduce the number of
dimensions further
No guarantees on ditortion
No longer contractive

12
Embedding of Euclidean spaces

(Johnson-Lindenstrauss) There exists an
embedding of n points in (Rm, L2) into (Rk, L2)
where k O((log n)/?2) with distortion 1?.
Randomized construction
Choose vectors r1, r2, .. rk where each component
rij is drawn from a normal distribution
N(0,1)---zero mean and unit variance.
f(v) embedding for vector v lt ltv, r1gt, ltv,
r2gt, ltv, rkgt gt
Normalize by dividing by vk.

13
Proof

Given a point v and a random vector r, consider X
ltv,rgt. Let v l.
E(X) E(ltv,rgt) 0
Var(X) Var(ltv,rgt) S Var (vjrj) S vj2Var
(rj) v2 l2
E(f(v)2) E(S ltv,rigt2)
S E(ltv,rigt2)
S E (S vj2rij2) S (vj vk rij rik ))
S l2 S vi vk E(rij ) E(rik )
kl2

j
j
i
i
i
j
j ltgt k
i
j ltgt k
14
Proof

If k O(log n/? 2) then f(v)2 is
concentrated around the mean with high
probability
(1- ?) kl2 f(v)2 (1 ?) kl2 with
probability at least c(1-1/n2).
Set v x-y.
(1- ?) k x-y2 f(x-y)2 (1 ?) k
x-y2 with probability at least c(1-1/n2)
(1- ?) k x-y2 f(x)-f(y)2 (1
?)kx-y2 with probability at least c(1-1/n2)
(1- ?) k x-y2 f(x)-f(y)2 (1
?)kx-y2 for all x,y with probability at
least 1/2

15
Other possibilities for basis vectors

Achlioptas, PODS 2001 Use of the following
vectors avoids real number multiplications
element Rij v 3 .
element Rij

1 with prob 1/6 0 with prob 2/3 -1 with prob 1/6
1 with prob 1/2 -1 with prob 1/2
16
Applications

Useful when d is very high
N 106, e 0.2 implies that k 500
Speed up the computation of SVD for LSI
Papadimitriou et al, PODS 2001
One-pass summary of streams Indyk, FOCS 2000
Clustering Schulman, STOC 2000
NN searches Indyk and Motwani, STOC 1998
Comparison with SVD for image and text Bingham,
Mannila, KDD 2001
Motifs in biological sequences Buhler and Tompa,
JCB 2002

17
Other results

LLR 95 There is an n-point metric space such
that any embedding in (Rk,L2) has distortion of
at least O(log n).

18
Locality Sensitive Hashing

Approximate Nearest Neighbors
Construct a set L of hash functions. Each hash
function has the property that
If points are close then a high probability of
collision
If points are far then a low probability of
collisions
Combine the results from L functions to answer
the query
Can use random projections instead of hash
functions.
Other variations in recent literature
Discussed later (NN searches)

19
Bibliography

G. Hjaltson and H. Samet, Properties of Embedding
Methods for Similarity Searching in Metric
Spaces, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 25(5) 530-549 (May 2003).
J. Bourgain, On Lipschitz embedding of finite
metric spaces in Hilbert space, Israel J. of
Math., 52, 1985, 46-52.
N. Linial, E. London and Y. Rabinovich, The
Geometry of Graphs and some of its algorithmic
applications, Combinatorica, 15, 1995, 215-245.
D. Achlioptas, Database-friendly random
projections, PODS 2001.

20
Bibliography

W.B. Johnson and J. Lindenstrauss, Extensions of
Lipschitz maps into Hilbert spaces, Contemporary
Mathematics, 29189-206.
C.H. Papadimitriou, P. Raghvan, H. Tamaki, and S.
Vempala, Latent Semantic Indexing A
probabilisttic analysis, PODS 1998 159-168.
P. Indyk and R. Motwani, Towards removing the
curse of dimensionality, STOC 1998 604-613.
P. Indyk, Stable distributions, pseudorandom
generators, embeddings and data stream
computations, FOCS 2000 189-197.
L.J. Schulman, Clustering for edge-cost
minimization, STOC 2000 547-555.
E. Bingham and H. Mannila, Random projection in
dimensionality reduction Applications to image
and text data, KDD 2001 245-250.
J. Buhler and M. Tompa, Finding motifs using
random projections, JCB, 9(2), pp. 225-242,
2002