Contrastive Divergence Learning

About This Presentation

Title:

Description:

Number of Views:93

Avg rating:3.0/5.0

Slides: 13

Provided by: robotsOx5

Category:

more less

Transcript and Presenter's Notes

Title: Contrastive Divergence Learning

1
Contrastive Divergence Learning

2
Contents

3
Maximum Likelihood learning

Toy example Known result
4
Maximum Likelihood learning

5
Gradient descent-based approach

Move a fixed step size, , in the direction of
steepest gradient. (Not line search see why
later).
This gives the following parameter update
equation

6
Gradient descent-based approach

7
Markov Chain Monte Carlo sampling

To estimate we must draw
samples from .
Since is unknown, we cannot draw samples
randomly from a cumulative distribution curve.
Markov Chain Monte Carlo (MCMC) methods turn
random samples into samples from a proposed
distribution, without knowing .
Metropolis algorithm
Perturb samples e.g.
Reject if
Repeat cycle for all samples until stabilization
of the distribution.
Stabilization takes many cycles, and there is no
accurate criteria for determining when it has
occurred.

8
Markov Chain Monte Carlo sampling

9
Contrastive divergence

Let us make the number of MCMC cycles per
iteration small, say even 1.
Our parameter update equation is now
Intuition 1 MCMC cycle is enough to move the
data from the target distribution towards the
proposed distribution, and so suggest which
direction the proposed distribution should move
to better model the training data.

10
Contrastive divergence bias

We assume
ML learning equivalent to minimizing ,
where
(Kullback-Leibler divergence).
CD attempts to minimize
Usually , but can sometimes
bias results.
See On Contrastive Divergence Learning,
Carreira-Perpinan Hinton, AIStats 2005, for
more details.