6. Statistical Inference : ngram Models over Sparse Data - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

6. Statistical Inference : ngram Models over Sparse Data

Description:

General linear interpolation. weight : function of history ... Good-Turing, linear interpolation or back-off. Good-Turing smoothing is good. Church & Gale (1991) ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 18
Provided by: klplReP
Category:

less

Transcript and Presenter's Notes

Title: 6. Statistical Inference : ngram Models over Sparse Data


1
6. Statistical Inference n-gram Models over
Sparse Data
Foundations of Statistic Natural Language
Processing
  • 2002. 1. 18.
  • ???????
  • ???

2
Outline (1)
  • Bins Forming Equiv. Classes
  • Reliability vs. discrimination
  • n-gram models
  • Building n-gram models
  • Statistical Estimators
  • Maximum Likelihood Estimation (MLE)
  • Laplaces law, Lidstones law and the
    Jeffreys-Perks law
  • Held out estimation
  • Cross-validation
  • Good-Turing estimation

3
Outline (2)
  • Combining Estimators
  • Simple linear interpolation
  • Katzs backing-off
  • General linear interpolation
  • Language models for Austen
  • Conclusions

4
1. Bins Forming Equiv. Classes
Reliability vs. discrimination
  • classification task
  • classificatory feature
  • target feature
  • equivalence classing help to predict the value of
    target feature
  • independence assumption
  • compromise is needed
  • Discrimination dividing data into bins
  • Reliability number of training instances in bin

5
n-gram models
  • predicting the next word(probability function P)
  • Markov Assumption
  • (n-1)th order model (or n-gram model)
  • last n-1 words are in the same equiv class
  • parameters(V 20,000)

6
2. Statistical Estimators
  • probability estimate
  • target feature
  • estimating the unknown probability of
    distribution of n-grams

7
Notation for the statistical estimation
8
Maximum Likelihood Estimation (MLE)
  • Probability estimates for the next word
  • The MLE assigns a zero probability to unseen
    events
  • These zero probability will propagate and give us
    bad estimates for the probability

9
Laplaces law
  • add a little bit of probability space to unseen
    events
  • but Laps law actually gives too much of the
    probability space to unseen events
  • In case of B gt N Laplaces method is completely
    unsatisfactory in such circumstances.
  • too much of the probability space gt unseen
    bigrams
  • 46.5 (Church Gale)

10
Lidstones law and the Jeffreys-Perks law
  • Lidstones Law
  • add some positive value
  • Jeffreys-Perks Law
  • 0.5
  • or called ELE (Expected Likelihood Estimation)

11
Held out estimation
C1(w1wn) frequency of w1wn in training
data C2(w1wn) frequency of w1wn in held out
data
where C(w1wn) r
  • further text
  • how often appear bigrams that appeared r times in
    training text

12
Cross-validation(deleted estimation)
  • cross validation training data is used both as
  • initial training data
  • held out data
  • On large training corpora, deleted estimation
    works better than held-out estimation

13
Good-Turing estimation
  • suitable for large number of observations from a
    large vocabulary
  • works well for n-grams

( r is an adjusted frequency )
( E denotes the expectation of random
variable )
14
3. Combining Estimators
  • Consider how to combine multiple probability
    estimate from various different models

Simple linear interpolation
  • combination of trigram and bigram unigram

15
Katzs backing-off
  • used to smooth or to combine information source
  • n-gram appeared more than k time
  • n-gram estimate
  • k or less than k
  • estimate from a shorter n-gram

16
General linear interpolation
  • weight function of history
  • Very general way to combine models(commonly used)

17
IV. Conclusions
  • problems of sparse data
  • Good-Turing, linear interpolation or back-off
  • Good-Turing smoothing is good
  • Church Gale (1991)
  • Active research
  • combining probability models
  • dealing with sparse data
Write a Comment
User Comments (0)
About PowerShow.com