2' Mathematical Foundations - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

2' Mathematical Foundations

Description:

Discrete distributions: The binomial distribution ... Discrete distributions: The binomial distribution. 11. Standard distributions (3/3) ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 25

Provided by: klplReP

Category:

more less

Transcript and Presenter's Notes

Title: 2' Mathematical Foundations

1
2. Mathematical Foundations
Foundations of Statistic Natural Language
Processing

2001. 7. 10.
???????
???

2
Contents Part 1

1. Elementary Probability Theory
Conditional probability
Bayes theorem
Random variable
Joint and conditional distributions
Standard distribution

3
Conditional probability (1/2)

P(A) the probability of the event A
Ex1gt A coin is tossed 3 times.
W HHH, HHT, HTH, HTT, THH, THT, TTH, TTT
A HHT, HTH, THH 2 heads, P(A)3/8
B HHH, HHT, HTH, HTT first head, P(B)1/2
conditional probability

4
Conditional probability (2/2)

Multiplication rule
Chain rule
Two events A, B are independent

If
5
Bayes theorem (1/2)

Generally, if and the Bi
are disjoint
Bayes
theorem

6
Bayes theorem (2/2)

Ex2gt G the event of the sentence having a
parasitic gap
T the event of the test being positive
This poor result comes about because the prior
probability of a sentence containing a parasitic
gap is so low.

7
Random variable

Ex3gt Random variable X for the sum of two dice.

Expectation
Variance
S2,,12
probability mass function(pmf) p(x) p(Xx), X
p(x) If XW ? 0,1, then X is called an
indicator RV or a Bernoulli trial
8
Joint and conditional distributions

The joint pmf for two discrete random variables
X, Y
Marginal pmfs, which total up the probability
mass for the values of each variable separately.
Conditional pmf

for y such that
9
Standard distributions (1/3)

Discrete distributions The binomial distribution
When one has a series of trials with only two
outcomes, each trial being independent from all
the others.
The number r of successes out of n trials given
that the probability of success in any trial is
p.
Expectation np, variance np(1-p)

where
10
Standard distributions (2/3)

Discrete distributions The binomial distribution

11
Standard distributions (3/3)

Continuous distributions The normal distribution
For the Mean m and the standard deviation s

Probability density function (pdf)
12
Contents Part 2

2. Essential Information Theory
Entropy
Joint entropy and conditional entropy
Mutual information
The noisy channel model
Relative entropy or Kullback-Leibler divergence

13
Shannons Information Theory

Maximizing the amount of information that one can
transmit over an imperfect communication channel
such as a noisy phone line.
Theoretical maxima for data compression
Entropy H
Theoretical maxima for the transmission rate
Channel Capacity

14
Entropy (1/4)

The entropy H (or self-information) is the
average uncertainty of a single random variable
X.
Entropy is a measure of uncertainty.
The more we know about something, the lower the
entropy will be.
We can use entropy as a measure of the quality of
our models.
Entropy measures the amount of information in a
random variable (measured in bits).

where, p(x) is pmf of X
15
Entropy (2/4)

The entropy of a weighted coin. The horizontal
axis shows the probability of a weighted coin to
come up heads. The vertical axis shows the
entropy of tossing the corresponding coin once.

back 23 page
p
16
Entropy (3/4)

Ex7gt The result of rolling an 8-sided die.
(uniform distribution)
Entropy The average length of the message
needed to transmit an outcome of that variable.
For expectation E

17
Entropy (4/4)

Ex8gt Simplified Polynesian
We can design a code that on average takes
bits to transmit a letter
Entropy can be interpreted as a measure of the
size of the search space consisting of the
possible values of a random variable.

bits
18
Joint entropy and conditional entropy (1/3)

The joint entropy of a pair of discrete random
variable X,Y p(x,y)
The conditional entropy
The chain rule for entropy

19
Joint entropy and conditional entropy (2/3)

Ex9gt Simplified Polynesian revisited
All words of consist of sequence of
CV(consonant-vowel) syllables

Marginal probabilities (per-syllable basis)
Per-letter basis probabilities
double
back 8 page
20
Joint entropy and conditional entropy (3/3)

21
Mutual information (1/2)

By the chain rule for entropy
mutual information
Mutual information between X and Y
The amount of information one random variable
contains about another. (symmetric, non-negative)
It is 0 only when two variables are independent.
It grows not only with the degree of dependence,
but also according to the entropy of the
variables.
It is actually better to think of it as a measure
of independence.

22
Mutual information (2/2)