Maximum Entropy ME Theory and Examples - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Maximum Entropy ME Theory and Examples

Description:

Machine translation. Word translation model - P(French word | English word) Information sources ... Machine translation (continued) ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 12
Provided by: mihair
Category:

less

Transcript and Presenter's Notes

Title: Maximum Entropy ME Theory and Examples


1
Maximum Entropy (ME)Theory and Examples
  • Mihai Rotaru
  • ITSPOKE Presentation

2
Introduction
  • Maximum entropy
  • Estimate a probability distribution given a set
    of constraints
  • Principle
  • Model what is known
  • Assume nothing else
  • (Berger et al., 1996) example
  • Model translation of word in from English to
    French
  • Need to model P(wordFrench)
  • Constraints
  • 1 Possible translations dans, en, à, au course
    de, pendant
  • 2 dans or en used in 30 of the time
  • 3 dans or à in 50 of the time

3
Theory
  • Model what is known (conditions)
  • Feature functions (subspaces)
  • Expected value of feature functions
  • Assume nothing else
  • ? Flattest distribution
  • ? Distribution with the maximum Entropy
  • Entropy
  • Inverse of the (Kullback-Leibler) distance from
    the uniform distribution
  • Unique solution guaranteed
  • Generalized Iterative Scaling algorithm
  • Weights (?i) interpretation

4
ME in practice
  • Expected value of feature functions
  • Computed from empirical distribution
  • Conditional form of ME
  • What features to use?
  • Feature templates
  • Feature selection algorithms
  • Cutoffs
  • Basic Feature Selection (Berger et al., 1996)

5
ME applications
  • Part of Speech (POS) Tagging (Ratnaparkhi, 1996)
  • P(POS tag context)
  • Information sources
  • Word window (4)
  • Word features (prefix, suffix, capitalization)
  • Previous POS tags

6
ME applications (continued)
  • Abbreviation expansion (Pakhomov, 2002)
  • Information sources
  • Word window (4)
  • Document title
  • Word Sense Disambiguation (WSD) (Chao Dyer,
    2002)
  • Information sources
  • Word window (4)
  • Structurally related words (4)
  • Sentence Boundary Detection (Reynar
    Ratnaparkhi, 1997)
  • Information sources
  • Token features (prefix, suffix, capitalization,
    abbreviation)
  • Word window (2)

7
ME applications (continued)
  • Machine translation
  • Word translation model - P(French word English
    word)
  • Information sources
  • Word window (6)

8
ME applications (continued)
  • Machine translation (continued)
  • Full ME modeling P(English sentence French
    sentence) (Och Ney, 2002)
  • Source channel model as a simplification of ME
  • Information sources
  • Sentence Length Model
  • Conventional Lexicon
  • Additional English language models

9
Why ME?
  • Advantages
  • Combine multiple knowledge sources
  • Local
  • Word prefix, suffix, capitalization (POS -
    (Ratnaparkhi, 1996))
  • Word POS, POS class, suffix (WSD - (Chao Dyer,
    2002))
  • Token prefix, suffix, capitalization,
    abbreviation (Sentence Boundary - (Reynar
    Ratnaparkhi, 1997))
  • Global
  • N-grams (Rosenfeld, 1997)
  • Word window
  • Document title (Pakhomov, 2002)
  • Structurally related words (Chao Dyer, 2002)
  • Sentence length, conventional lexicon (Och Ney,
    2002)
  • Combine dependent knowledge sources

10
Why ME?
  • Advantages
  • Add additional knowledge sources
  • Implicit smoothing
  • Disadvantages
  • Computational
  • Expected value at each iteration
  • Normalizing constant
  • Overfitting
  • Feature selection
  • Cutoffs
  • Basic Feature Selection (Berger et al., 1996)

11
References
  • See my comprehensive write-up and reading list
  • http//www.cs.pitt.edu/mrotaru/comp/
  • ME software available on the net
  • YASMET (http//www.fjoch.com/YASMET.html)
  • yasmetFS (http//www.isi.edu/ravichan/YASMET/)
  • OpenNLP MaxEnt (http//maxent.sourceforge.net/)
Write a Comment
User Comments (0)
About PowerShow.com