Contrastive Divergence Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Contrastive Divergence Learning

Description:

Markov Chain Monte Carlo (MCMC) methods turn random samples into samples from a proposed distribution, without knowing . Metropolis algorithm: ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 13
Provided by: robotsOx5
Category:

less

Transcript and Presenter's Notes

Title: Contrastive Divergence Learning


1
Contrastive Divergence Learning
  • Geoffrey E. Hinton
  • A discussion led by Oliver Woodford

2
Contents
  • Maximum Likelihood learning
  • Gradient descent based approach
  • Markov Chain Monte Carlo sampling
  • Contrastive Divergence
  • Further topics for discussion
  • Result biasing of Contrastive Divergence
  • Product of Experts
  • High-dimensional data considerations

3
Maximum Likelihood learning
  • Given
  • Probability model
  • - model parameters
  • - the partition function, defined as
  • Training data
  • Aim
  • Find that maximizes likelihood of training
    data
  • Or, that minimizes negative log of likelihood

Toy example Known result
4
Maximum Likelihood learning
  • Method
  • at minimum
  • Lets assume that there is no linear solution

5
Gradient descent-based approach
  • Move a fixed step size, , in the direction of
    steepest gradient. (Not line search see why
    later).
  • This gives the following parameter update
    equation

6
Gradient descent-based approach
  • Recall . Sometimes
    this integral will be algebraically intractable.
  • This means we can calculate neither
    nor (hence no line search).
  • However, with some clever substitution
  • so
  • where can be estimated
    numerically.

7
Markov Chain Monte Carlo sampling
  • To estimate we must draw
    samples from .
  • Since is unknown, we cannot draw samples
    randomly from a cumulative distribution curve.
  • Markov Chain Monte Carlo (MCMC) methods turn
    random samples into samples from a proposed
    distribution, without knowing .
  • Metropolis algorithm
  • Perturb samples e.g.
  • Reject if
  • Repeat cycle for all samples until stabilization
    of the distribution.
  • Stabilization takes many cycles, and there is no
    accurate criteria for determining when it has
    occurred.

8
Markov Chain Monte Carlo sampling
  • Let us use the training data, , as the
    starting point for our MCMC sampling.
  • Our parameter update equation becomes

9
Contrastive divergence
  • Let us make the number of MCMC cycles per
    iteration small, say even 1.
  • Our parameter update equation is now
  • Intuition 1 MCMC cycle is enough to move the
    data from the target distribution towards the
    proposed distribution, and so suggest which
    direction the proposed distribution should move
    to better model the training data.

10
Contrastive divergence bias
  • We assume
  • ML learning equivalent to minimizing ,
    where

  • (Kullback-Leibler divergence).
  • CD attempts to minimize
  • Usually , but can sometimes
    bias results.
  • See On Contrastive Divergence Learning,
    Carreira-Perpinan Hinton, AIStats 2005, for
    more details.

11
Product of Experts
12
Dimensionality issues
Write a Comment
User Comments (0)
About PowerShow.com