Motivation - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Motivation

Description:

First, recall the quantile transform: given a cdf F(x), we can generate samples ... random variables via the quantile transformation should serve to generate ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 43
Provided by: rco100
Category:

less

Transcript and Presenter's Notes

Title: Motivation


1
Motivation
Histograms are everywhere in vision. object
recognition / classification appearance-based
tracking How do we compare two histograms pi,
qj? Information theoretic measures like
chi-square, Bhattacharyya coeff, KL-divergence,
are very prevalent. They are based on
bin-to-bin comparisons of mass. Example,
bhattacharyya coefficient
2
Motivation
Problem the bin-to-bin comparison measures are
sensitive to the binning of the data, and also to
shifts of data acrossbins (say due to
intensity gain/offset). Example,
which of these is more similar to the black
circle?
0
255
10
0
255
10
0
255
10
intensity
intensity
intensity
3
Earth Movers Distance
?
example borrowed from Efros_at_cmu
4
Earth Movers Distance
?
5
Earth Movers Distance

6
The Difference?
(amount moved)

7
The Difference?
(amount moved) (distance moved)

8
Thought Experiment
  • move the books on your bookshelf one space to
    the right
  • you are lazy, so want to minimize sum of
    distances moved

9
Thought Experiment
More than one minimal solution. Not unique!
dist 4
dist 1 1 1 1 4
10
Thought Experiment
now minimize sum of squared distances
dist 42 16
dist 12121212 4
dist 4
dist 1 1 1 1 4
strategy 1
strategy 2
11
How Do We Know?
How do we know those are the minimal
solutions? Is that all of them? Lets go back to
abs distance new-old
Form a table of distances new-old
new
A
B
C
D
E
0
1
2
3
4
A
1
0
1
2
3
B
2
1
0
1
2
C
old
3
2
1
0
1
D
A
B
C
D
E
4
3
2
1
0
E
12
How Do We Know?
How do we know those are the minimal
solutions? Is that all of them? Lets go back to
abs distance new-old
Form a table of distances new-old X off ones
that are not admissable
new
A
B
C
D
E
x
1
2
3
4
A
x
0
1
2
3
B
x
1
0
1
2
C
old
x
2
1
0
1
D
A
B
C
D
E
x
x
x
x
x
E
13
How Do We Know?
How do we know those are the minimal
solutions? Is that all of them? Lets go back to
abs distance new-old
Consider all permutations where there is asingle
1 in each admissable row and column.
new
A
B
C
D
E
x
1
2
3
4
A
x
0
1
2
3
B
x
1
0
1
2
C
old
x
2
1
0
1
D
A
B
C
D
E
x
x
x
x
x
E
14
How Do We Know?
How do we know those are the minimal
solutions? Is that all of them? Lets go back to
abs distance new-old
Consider all permutations where there is asingle
1 in each admissable row and column.
new
A
B
C
D
E
x
1
2
3
4
A
x
0
1
2
3
B
x
1
0
1
2
C
old
x
2
1
0
1
D
A
B
C
D
E
x
x
x
x
x
E
sum 1301 5
15
How Do We Know?
How do we know those are the minimal
solutions? Is that all of them? Lets go back to
abs distance new-old
Consider all permutations where there is asingle
1 in each admissable row and column.
new
A
B
C
D
E
x
1
2
3
4
A
x
0
1
2
3
B
x
1
0
1
2
C
old
x
2
1
0
1
D
A
B
C
D
E
x
x
x
x
x
E
sum 2222 8
16
How Do We Know?
How do we know those are the minimal
solutions? Lets go back to using absolute
distance new-old
Consider all permutations where there is asingle
1 in each admissable row and column. Try to find
the minimum one!
new
A
B
C
D
E
x
1
2
3
4
A
x
0
1
2
3
B
x
1
0
1
2
C
old
x
2
1
0
1
D
A
B
C
D
E
x
x
x
x
x
E
There are 432124 permutations in this
example. We can try them all.
sum 4202 8
17
8 min solutions!
It turns out that lots of solutions are minimal,
when we use absolute distance.
18
How Do We Know?
The two we had before are there. But there are
others!!
4 1 2 3
1 2 3 4
3 1 2 4
19
Recall Thought Experiment
now minimize sum of squared distances
dist 42 16
dist 12121212 4
dist 4
dist 1 1 1 1 4
strategy 1
strategy 2
20
new
A
B
C
D
E
x
1
4
9
16
A
x
0
1
4
9
B
x
1
0
1
4
C
old
x
4
1
0
1
D
x
x
x
x
x
E
Only one unique min solution when we use
new-old2 This turns out to be the case for
new-oldp for any p gt 1 because then the cost
function is strictly convex.
21
Other Ways to Look at It
The way weve set it up so far, this problem is
equivalent tothe linear assignment problem. We
can therefore solve itusing the Hungarian
algorithm.
22
Other Ways to Look at It
We can also look at is as a min-costflow problem
on a bipartite graph.
(sources)
(sinks)
old position
new position
Instead of books, we canthink of these nodes
asfactories and consumers, or whatever. Why?
We can then think about relaxing the problemto
consider fractional assignments between oldand
new positions. (e.g. half of A goes to B, and
the other half goes to C.
cost(A,B)
A B C D E
A B C D E
1
-1
1
-1
1
-1
1
-1
cost(D,E)
more about this in a moment
cost(old,new)
new-oldp
23
Monge-Kantorovich Transportation Problem
24
Mallows (Wasserstein) Distance
X and Y be d-dimensional random variables. Prob
distribution of X is P, and distribution of Y is
Q. Also, consider some unknown distribution F
overthe two of them taken jointly (X,Y) dxd
dimensional Mallows distance
In words Trying to find a minimum expected
value of the distance between X and Y Expected
value is taken over some unknown joint
distribution F! F is constrained such that
marginal wrt X is P, and marginal wrt Y is Q
25
Understanding Mallows Distance
for discrete variables
costs dij
26
Mallows Versus EMD
EMD
Mallows
For distributions they are the same. Also same
when total masses are same
27
Mallows vs EMD
main difference EMD allows partial matches in
the case of unequal masses.
EMD 0
Mallows 1/2
note using L1 norm
As the paper points out, you have to be careful
when allowing partial matches to make sure what
you are doing is sensible.
28
Linear Programming
Mallows/EMD for general d-dimensional data is
solved vialinear programming, for example by the
simplex algorithm. This makes it OK for low
values of d (up to dozens), but makes it
unsuitable for very large d. As a result, EMD is
typically applied after clustering the data (say
using k-means) into a smaller set of clusters.
The coarse descriptors based on clusters are
often called signatures.
29
Transportation Problem
Mallows is a special case of linear programming
transportation problem
formulated as a min-flow problem in a graph
p1
-q1
p2
-q2
pm
-qn
30
Assignment Problem
some discrete cases (like our book example)
simplify further assignment problem
formulated as a min-flow problem in a graph
1
-1
p1
-q1
1
-1
p2
-q2
1
-1
all x_ij are 0 or 1, and only one 1 in each row
or column
pm
-qn
31
Linear Programming
Mallows/EMD for general d-dimensional data is
solved vialinear programming, for example by the
simplex algorithm. This makes it OK for low
values of d (up to dozens), but makes it
unsuitable for very large d. As a result, EMD is
typically applied after clustering the data (say
using k-means) into a smaller set of clusters.
The coarse descriptors based on clusters are
often called signatures.
However, If we use marginal distributions, so
that we have 1D histograms,something wonderful
happens!!!
32
One-Dimensional Data
one dimensional data (like weve been using for
illustration duringthis whole talk) is an
important special case. Mallows/EMD distance
computation greatly simplifies! First of all,
for 1D, we can represent densities by their
cumulativedistribution functions
33
One-Dimensional Data
one dimensional data (like weve been using for
illustration duringthis whole talk) is an
important special case. Mallows/EMD distance
computation greatly simplifies! First of all,
for 1D, we can represent densities by their
cumulativedistribution functions and the min
distance can be computed as
(x)
x
F(x) G(x) dx
34
One-Dimensional Data
G(x)
G-1(t)
1
1
F(x)
F-1(t)
t
t
0
0
x
x
0
255
0
255
intensity (for example)
intensity (for example)
just area between the two cumulative distribution
function curves
35
Proof?
It is easy to find papers that state the previous
1D simplified solution, but quite hard to find
one with a proof! One is
but you still have to work at it. I did, one
week, and here is what I came up with
First, recall the quantile transform given a cdf
F(x), we can generate samples from it by
uniformly sampling t U(0,1) and then outputting
F-1(t)
F(x)
1
t0
ti U(0,1) gt xi F
t
0
x0
255
0
intensity (for example)
36
Proof?
This allows us to understand that
37
But so what? Why does this minimize the Mallows
distance?
38
Expected cost is sum ofthe 4x4 array of
products. To compute Mallows distance, we want
to choose pij to minimize this expected cost
39
P.Major says at minimum solution, for any pab
and pcd on opposite sides ofthe diagonal, one or
both of them should be zero. If not, we can
construct alower cost solution. Example

our new cost differs from old on by a(94)
a(01) -12aso is a lower cost solution.
40
Connection (and a missing piece of the proof in
P.Majors paper) The above procedure serves to
concentrate all the mass of the joint
distribution along the diagonal, and apparently
also yields the mincost solution.. However,
concentration of mass along the diagonal is also
a property of joint distributions of correlated
random variables. Therefore... generating
maximally correlated random variables via the
quantile transformation should serve to generate
a joint distribution clustered as tightly as
possible around the diagonal of the cost matrix,
and therefore, should yield the minimum expected
cost. QED!!!!
41
Example CDF Distance
qj
.25
.25
.25
.25
.25
.25
.25
.25
0
0
pi
1
2
3
4
5
1
2
3
4
5
black Pi cdf of p white Qi cdf of q
sum(Pi-Qi) .25 .25 .25 .25 0
1
Note we get 1 instead of 4, the number we got
earlier for the books,because we didnt divide
by total mass (4) earlier.
42
Example Application
convert 3D color data into three 1D marginals
compute CDF of marginal color data in a circular
region compute CDF of marginal color data in a
ring around that circle compare two CDFs using
Mallows distance select peaks in the distance
function as interest regions repeat, at
a range of scales...
Write a Comment
User Comments (0)
About PowerShow.com