Announcement

About This Presentation

Title:

Announcement

Description:

Question: Based on the measurement of 100 people, what is the Gaussian ... Machine Translation. text in target language translation ('noise') source language ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 27

Provided by: rong7

Learn more at: http://www.cse.msu.edu

Category:

Tags: announcement

more less

Transcript and Presenter's Notes

Title: Announcement

1
Announcement

Homework 1 is due on Tuesday, 01/27/2003, not
01/26/2003

2
Statistical Inference

Rong Jin

3
Statistical Inference
Assume that the height of people follows Gaussian
distribution. Question Based on the measurement
of 100 people, what is the Gaussian distribution
that fits the data?
4
Statistical Inference

Problem
Likelihood function
Approach Maximum likelihood estimation (MLE), or
maximize log-likelihood

5
Example I Flip Coins
6
Example I Flip Coins (contd)
7
Example II Normal Distribution
8
Information Theory

Rong Jin

9
Outline

Information
Entropy
Mutual information
Noisy channel model

10
Information

Information ? knowledge
Information reduction in uncertainty
Example
flip a coin
roll a die
2 is more uncertain than 1
Therefore, more information is provided by the
outcome of 2 than 1

11
Definition of Information

Let E be some event that occurs with probability
P(E). If we are told that E has occurred, then we
say we have received I(E)log2(1/P(E)) bits of
information
Example
Result of a fair coin flip (log221 bit)
Result of a fair die roll (log262.585 bits)

12
Information is Additive

I(k fair coin tosses) log2k k bits
Example infomation conveyed by words
Random word from a 100,000 word vocabulary
I(word) log(100,000) 16.6 bits
A 1000 word document from the same source
I(document) 16,600 bits
A 480x640 pixel, 16-greyscale video picture
I(picture) 307,200 log16 1,228,800 bits
? A picture is worth a 1000 words!

13
Outline

Information
Entropy
Mutual Information
Cross Entropy and Learning

14
Entropy

A zero-memory information source S is a source
that emits symbols from an alphabet s1, s2,,
sk with probability p1, p2,,pk, respectively,
where the symbols emitted are statistically
independent.
What is the average amount of information in
observing the output of the source S?
Call this entropy

15
Explanation of Entropy

Average amount of information provided per symbol
Average amount of surprise when observed a symbol
Uncertainty that an observer has before seeing
the symbol
Average of bits needed to communicate each
symbol

16
Properties of Entropy

Non-negative H(P) ?0
For any other probability distribution q1,,qk,
H(P) ? logk, with equality iff pi1/k for all i
The further P is from uniform, the lower the
entropy.

17
Entropy k 2

Notice
zero information at edges
maximum information at 0.5 (1 bit)
drop off more quickly close edges than in the
middle

18
The Entropy of English

27 characters (A-Z, space)
100,000 words (average 6.5 char each)
Assuming independence between successive
characters
Uniform character distribution log27 4.75
bits/char
True character distribution 4.03 bits/character
Assuming independence between successive words
Uniform word distribution log100,1000/6.5 2.55
bits/char
True word distribution 9.45/6.5 1.45
bits/character
True entropy of English is much lower!

19
Entropy of Two Sources
Temperature T P(T hot) 0.3 P(T mild)
0.5 P(T cold) 0.2 ? H(T) H(0.3, 0.5, 0.2)
1.485
Humidity M P(M low) 0.6 P(M high) 0.4 ?
H(M) H(0.6, 0.4) 0.971

Random variable T, M are not independent
P(Tt, Mm)?P(Tt)P(Mm)

20
Joint Entropy

H(T) 1.485
H(M) 0.971
H(T) H(M) 2.456
Joint Entropy
H(T, M) H(0.1, 0.4, 0.1, 0.2, 0.1, 0.1, 0.1)
2.321
H(T, M) ?H(T) H(M)

Joint Probability P(T, M)
21
Conditional Entropy

Conditional Entropy
H(TM low) 1.252
H(TM high) 1.5
Average conditional entropy
How much is M telling us on average about T?
H(T) H(TM) 1.485 1.351 0.134 bits

Conditinal Probability P(T M)
22
Mutual Information

Properties
Indicate the amount of information one random
variable can provide to another one
Symmetric I(XY) I(YX)
Non-negative
Zero iff X, Y are independent