CPSC 503 Computational Linguistics - PowerPoint PPT Presentation

About This Presentation
Title:

CPSC 503 Computational Linguistics

Description:

Intro probability and information Theory. Lecture 5. Giuseppe Carenini. 9/8/09 ... theory? ... Basic Probability/Info Theory. An overview (not complete! ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 28
Provided by: gcare
Category:

less

Transcript and Presenter's Notes

Title: CPSC 503 Computational Linguistics


1
CPSC 503Computational Linguistics
  • Intro probability and information Theory
  • Lecture 5
  • Giuseppe Carenini

2
Today 28/1
  • Why do we need probabilities and information
    theory?
  • Basic Probability Theory
  • Basic Information Theory

3
Why do we need probabilities?
  • For Spelling errors what is the most probable
    correct word?
  • For real-word spelling errors, speech and hand
    writing recognition - What is the most probable
    next word?
  • Part-of-speech tagging, word-sense
    disambiguation, probabilistic parsing Basic
    question What is the probability of sequence of
    words? (e.g. of a sentence)

4
Disambiguation Tasks
Example
I made her duck
Part-of-speech tagging
  • duck V / N
  • make create / cook

Word Sense Disambiguation
  • her possessive adjective /
  • dative pronoun

Syntactic Disambiguation
(I (made (her duck))) vs. (I (made (her) (duck))
  • make transitive (single direct obj.) /
    ditransitive (two objs) / cause (direct obj.
    verb)

5
Why do we need information theory?
  • How much information is contained in a particular
    probabilistic model (PM)?
  • How predictive a PM is?
  • Given two PMs, which one better matches a corpus?

Entropy, Mutual Information, Relative Entropy,
Cross-Entropy, Perplexity
6
Basic Probability/Info Theory
  • An overview (not complete! sometimes imprecise!)
  • Clarify basic concepts you may encounter in NLP
  • Try to address common misunderstandings

7
Experiments and Sample Spaces
  • Uncertain Situation Experiment, Process, Test.
  • Set of possible basic outcomes sample space O
  • Coin toss (Ohead,tail, die (O1..6),
  • Opinion poll (Oyes,no),
  • Quality test (Obad,good)
  • Lottery (O ? 105 107)
  • of traffic accidents in Canada in 2005 (ON)
  • missing word (O ? vocabulary size)

8
Events
  • Event A is a set of basic outcomes
  • A ? O and all A? 2O (the event space)
  • O is the certain event, Ø is the impossible event
  • Examples
  • Experiment three times coin toss
  • O HHH, HHT, HTH, THH, TTH, HTT, THT,TTT
  • Cases with exactly two tails
  • ATTH, HTT, THT
  • All heads
  • AHHH

9
Probability Function/Distribution
  • Intuition measure of how likely an event is
  • Formally
  • P 2O ? 0,1, P(O) 1
  • If A and B are disjoint events P(A?B)P(A)P(B)
  • Immediate consequences
  • P(Ø)0, P(?A)1- P(A), A?B ? P(A) lt P(B)
  • ?a? OP(a) 1
  • How to estimate P(A)
  • Repeat the experiment n times
  • c times outcome ? A
  • P(A) ? c/n

10
Missing Word from Book
11
Joint and Conditional Probability
  • P(A,B) P(A?B)
  • P(AB) P(A,B)/P(B)

Bayes Rule
P(A,B) P(B,A) (since P(A?B) P(A?B)) ? P(AB)
P(B) P(BA) P(A) ? P(AB) P(BA) P(A) / P(B)
12
Missing Word Independence
13
Independence
  • How does P(AB) relates P(B)?

If knowing that B is the case does not change the
probability of A (i.e., P(AB)P(A)) A and B are
independent Immediate consequence P(A,B)P(A)P(B
)
14
Chain Rule
  • The rule
  • The proof

P(A,B,C,D,..) P(A) P(A,B)/P(A)
P(A,B,C)/P(A,B) P(A,B,C,D)/P(A,B,C)
P(..,A,B,C,D)/P(A,B,C,D)
P(A,B,C,D) P(A) P(BA) P(CA,B) P(DA,B,C)
P(..A,B,C,D)
15
Random Variables and pmf
  • Random variables (RV) X allow us to talk about
    the probabilities of numerical values that are
    related to the event space
  • Examplesdie natural numbering 1,6, English
    word length 1,?
  • Probability mass function

16
Example English Word length
p(x)
1
10
5
15
25
Sampling?
How to do it?
17
Expectation and Variance
  • The Expectation is the (expected) mean or average
    of a RV
  • Examplerolling one die (3.5)
  • The variance of a RV is a measure of whether the
    values of the RV tend to be consistent over
    samples or to vary a lot
  • s is the standard deviation

18
Joint, Marginal and Conditional RV/Distributions
Joint
Marginal
Conditional
Bayes and Chain Rule also apply !
19
Joint Distributions(word length word class)
Y
N
V
Adj
Adv
X
1
2
3
4




Note fictional numbers
20
Conditional and Independence(word length word
class)
21
Standard Distributions
  • Discrete
  • Binomial
  • Multinomial
  • Continuous
  • Normal

Go back to your Stats textbook
22
Today 28/1
  • Why do we need probabilities and information
    theory?
  • Basic Probability Theory
  • Basic Information Theory

23
Entropy
  • Def1. Measure of uncertainty
  • Def2. Measure of the information that we need to
    resolve an uncertain situation
  • Def3. Measure of the information that we obtain
    form an experiment that resolves an uncertain
    situation
  • Let p(x)P(Xx) where x ? X.
  • H(p) H(X) - ?x?X p(x)log2p(x)
  • It is normally measured in bits.

24
Entropy (extra-slides)
  • Using the formula Example
  • Example binary outcome
  • The Limits
  • (why exactly that formula?)
  • Entropy and Expectation
  • Coding interpretation
  • Joint and Conditional Entropy
  • Summary of key Properties

25
Mutual Information
  • Chain Rule for EntropyH(X,Y)H(X)H(YX)
  • By the chain rule for entropy, we have H(X,Y)
    H(X) H(YX) H(Y)H(XY)
  • Therefore, H(X)-H(XY)H(Y)-H(YX)
  • This difference is called the mutual information
    between X and Y, I(X,Y).
  • reduction in uncertainty of one random variable
    due to knowing about another
  • the amount of information one random variable
    contains about another

26
Relative Entropy or Kullback-Leibler Divergence
  • Def. The relative entropy is a measure of how
    different two probability distributions (over the
    same event space) are.
  • D(pq) ?x?X p(x)log(p(x)/q(x))
  • average number of bits wasted by encoding events
    from a distribution p with distribution q.
  • I(X,Y) D(p(x,y)p(x)p(y))

27
Next Time
  • Probabilistic models applied to spelling
  • Read Chp. 5 up to pag.156
Write a Comment
User Comments (0)
About PowerShow.com