2' Mathematical Foundations - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

2' Mathematical Foundations

Description:

Discrete distributions: The binomial distribution ... Discrete distributions: The binomial distribution. 11. Standard distributions (3/3) ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 25
Provided by: klplReP
Category:

less

Transcript and Presenter's Notes

Title: 2' Mathematical Foundations


1
2. Mathematical Foundations
Foundations of Statistic Natural Language
Processing
  • 2001. 7. 10.
  • ???????
  • ???

2
Contents Part 1
  • 1. Elementary Probability Theory
  • Conditional probability
  • Bayes theorem
  • Random variable
  • Joint and conditional distributions
  • Standard distribution

3
Conditional probability (1/2)
  • P(A) the probability of the event A
  • Ex1gt A coin is tossed 3 times.
  • W HHH, HHT, HTH, HTT, THH, THT, TTH, TTT
  • A HHT, HTH, THH 2 heads, P(A)3/8
  • B HHH, HHT, HTH, HTT first head, P(B)1/2
  • conditional probability

4
Conditional probability (2/2)
  • Multiplication rule
  • Chain rule
  • Two events A, B are independent

If
5
Bayes theorem (1/2)
  • Generally, if and the Bi
    are disjoint
  • Bayes
  • theorem

6
Bayes theorem (2/2)
  • Ex2gt G the event of the sentence having a
    parasitic gap
  • T the event of the test being positive
  • This poor result comes about because the prior
    probability of a sentence containing a parasitic
    gap is so low.

7
Random variable
  • Ex3gt Random variable X for the sum of two dice.

Expectation
Variance
S2,,12
probability mass function(pmf) p(x) p(Xx), X
p(x) If XW ? 0,1, then X is called an
indicator RV or a Bernoulli trial
8
Joint and conditional distributions
  • The joint pmf for two discrete random variables
    X, Y
  • Marginal pmfs, which total up the probability
    mass for the values of each variable separately.
  • Conditional pmf

for y such that
9
Standard distributions (1/3)
  • Discrete distributions The binomial distribution
  • When one has a series of trials with only two
    outcomes, each trial being independent from all
    the others.
  • The number r of successes out of n trials given
    that the probability of success in any trial is
    p.
  • Expectation np, variance np(1-p)

where
10
Standard distributions (2/3)
  • Discrete distributions The binomial distribution

11
Standard distributions (3/3)
  • Continuous distributions The normal distribution
  • For the Mean m and the standard deviation s

Probability density function (pdf)
12
Contents Part 2
  • 2. Essential Information Theory
  • Entropy
  • Joint entropy and conditional entropy
  • Mutual information
  • The noisy channel model
  • Relative entropy or Kullback-Leibler divergence

13
Shannons Information Theory
  • Maximizing the amount of information that one can
    transmit over an imperfect communication channel
    such as a noisy phone line.
  • Theoretical maxima for data compression
  • Entropy H
  • Theoretical maxima for the transmission rate
  • Channel Capacity

14
Entropy (1/4)
  • The entropy H (or self-information) is the
    average uncertainty of a single random variable
    X.
  • Entropy is a measure of uncertainty.
  • The more we know about something, the lower the
    entropy will be.
  • We can use entropy as a measure of the quality of
    our models.
  • Entropy measures the amount of information in a
    random variable (measured in bits).

where, p(x) is pmf of X
15
Entropy (2/4)
  • The entropy of a weighted coin. The horizontal
    axis shows the probability of a weighted coin to
    come up heads. The vertical axis shows the
    entropy of tossing the corresponding coin once.

back 23 page
p
16
Entropy (3/4)
  • Ex7gt The result of rolling an 8-sided die.
    (uniform distribution)
  • Entropy The average length of the message
    needed to transmit an outcome of that variable.
  • For expectation E

17
Entropy (4/4)
  • Ex8gt Simplified Polynesian
  • We can design a code that on average takes
    bits to transmit a letter
  • Entropy can be interpreted as a measure of the
    size of the search space consisting of the
    possible values of a random variable.

bits
18
Joint entropy and conditional entropy (1/3)
  • The joint entropy of a pair of discrete random
    variable X,Y p(x,y)
  • The conditional entropy
  • The chain rule for entropy

19
Joint entropy and conditional entropy (2/3)
  • Ex9gt Simplified Polynesian revisited
  • All words of consist of sequence of
    CV(consonant-vowel) syllables

Marginal probabilities (per-syllable basis)
Per-letter basis probabilities
double
back 8 page
20
Joint entropy and conditional entropy (3/3)

21
Mutual information (1/2)
  • By the chain rule for entropy
  • mutual information
  • Mutual information between X and Y
  • The amount of information one random variable
    contains about another. (symmetric, non-negative)
  • It is 0 only when two variables are independent.
  • It grows not only with the degree of dependence,
    but also according to the entropy of the
    variables.
  • It is actually better to think of it as a measure
    of independence.

22
Mutual information (2/2)
  • Since
  • (entropy is called self-information)
  • Conditional MI and a chain rule

I(x,y)
Pointwise MI
23
Noisy channel model
  • Channel capacity the rate at which one can
    transmit information through the channel
    (optimal)
  • Binary symmetric channel
  • since entropy is non-negative,

go 15 page
24
Relative entropy or Kullback-Leibler divergence
  • Relative entropy for two pmfs, p(x), q(x)
  • A measure of how close two pmfs are.
  • Non-negative, and D(pq)0 if pq
  • Conditional relative entropy and chain rule
Write a Comment
User Comments (0)
About PowerShow.com