Relative Entropy Part II - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Relative Entropy Part II

Description:

It does not obey the triangle inequality d(x,y) d(x,z) d(z,y) for all x,y,z. 5 ... For 27 symbols (letters plus space) Uniform model (no information) log27 4.75 bits ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 11
Provided by: VasileiosH9
Category:

less

Transcript and Presenter's Notes

Title: Relative Entropy Part II


1
Relative EntropyPart II
  • Vasileios Hatzivassiloglou
  • University of Texas at Dallas

2
Relative Entropy
  • This last quantity
  • is known as relative entropy or Kullback-Leibler
    distance or Kullback-Leibler divergence

3
Relative entropy is non-negative
4
Is relative entropy a distance?
  • It is non-negative
  • D(pq)0 if and only if p(x)q(x) for all x (the
    identity of indiscernibles property)
  • It is not symmetric (D(pq) ? D(qp))
  • It does not obey the triangle inequality d(x,y)
    d(x,z) d(z,y) for all x,y,z

5
Interpreting KL divergence
  • In coding theory, if a message is generated from
    Q, but is coded with an optimal code for P, the
    KL divergence is the extra message length
  • In Bayesian inference, if Q is the prior
    distribution (before seeing the data) and P is
    the posterior distribution, the KL divergence
    measures the information gain from seeing the data

6
Entropy in evaluation
  • We can measure how well a model fits the (test)
    data by calculating the cross entropy or
    equivalently for comparison purposes, KL
    divergence
  • This is an intrinsic measure of evaluation
  • it does not measure success at a task
  • it does not require labeled data

7
Perplexity
  • Defined as
  • Often used in speech recognition
  • Helps in making improvements look better
  • Reducing perplexity from 950 to 540
  • Reducing cross entropy from 9.9 bits to 9.1 bits
  • Interpretation as
  • number of equiprobable choices at each step

8
Entropy of English
  • For 27 symbols (letters plus space)
  • Uniform model (no information)
  • log27 4.75 bits
  • Independence model
  • about 4 bits (E occurs 13 of the time, but Q
    and Z only about 0.1 of the time)
  • Markov models
  • 3 bits for bigrams, 2.4 to 2.8 bits for trigrams

9
Entropy of English
  • Measuring directly from human subjects,
  • values close to 1.3 bits
  • varies according to text and genre
  • best estimate is around 1.1-1.2 bits
  • State of the art compression of English text
    achieves
  • 1.5 bit per character

10
Reading
  • Sections 2.2.5 and 2.2.8 on relative entropy and
    perplexity
  • Section 2.2.7 on the entropy of English
Write a Comment
User Comments (0)
About PowerShow.com