CS 904: Natural Language Processing STATISTICS AND LINGUISTICS - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

CS 904: Natural Language Processing STATISTICS AND LINGUISTICS

Description:

Their predictions about grammaticality and ambiguity are simply not in accord ... Grammars obtained by adding probabilities in a fairly transparent way to ' ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 11
Provided by: ven7
Category:

less

Transcript and Presenter's Notes

Title: CS 904: Natural Language Processing STATISTICS AND LINGUISTICS


1
CS 904 Natural Language ProcessingSTATISTICS
AND LINGUISTICS
  • L. Venkata Subramaniam
  • January 10, 2002

2
The Problem with Algebraic Grammars
  • All Grammars leak
  • Their predictions about grammaticality and
    ambiguity are simply not in accord with human
    perceptions.
  • Degrees of grammaticality are not accommodated.
    Either a structure is grammatical or not.

3
Stochastic Grammars
  • Grammars obtained by adding probabilities in a
    fairly transparent way to algebraic (i. e.,
    non-probabilistic) grammars.
  • Stochastic grammars supplement the underlying
    algebraic grammars.

4
Stochastic Grammars Useful Properties
  • Robustness Real user input is noisy it is full
    of misspellings, unanticipated syntactic
    constructions and so on. They are noise tolerant.
  • Portability Can be adapted to new domains or new
    languages by training them on language corpora.
  • Generalization Can act on data never seen in
    training.

5
The case for Stochastic Grammars
  • Language Acquisition different language rules
    coexist. Learning process involves assigning
    higher probability to correct grammar rules.
  • Language Change Language change is slow
    measuring in years and centuries.
  • Language Variation with geographic distance,
    the mix of frequency of various constructions
    changes.

6
The case for Stochastic Grammars (Cont.)
  • Error Correction
  • Thanks for all you help.
  • Thanks for all your help.
  • Thanks for all those who you help.
  • Learning on the fly
  • A Hectare is a hundred ares.

7
Stochastic Grammars How they are used
  • We need to identify the correct structure from
    among the in-principle possible structures.
  • A weight is assigned to each aspect of structure
    permitted by the grammar, and the weight of a
    particular analysis is the combined weight of the
    structural features that make it up.
  • The analysis with the greatest weight is
    predicted to be the perceived analysis for a
    given sentence.

8
Disambiguation
  • John Walks
  • 1. S -gt NP V .7
  • 2. S -gt NP .3
  • 3. NP -gt N .6
  • 4. NP -gt N N .2
  • 5. N -gt John .6
  • 6. N -gt Walks .4
  • 7. V -gt Walks 1.0
  • As a sentence the rules used are 1, 3, 5, 7 so
    weight is (.7)(.8)(.6)(1.0).336
  • As a noun-Phrase the rules are 2, 4, 5, 6 so
    weight is (.3)(.2)(.6)(.4) 0.0144

9
Complexity
  • As the range of possible grammar constructions
    increases, the number of undesired parses for a
    sentence increases dramatically.

10
Conclusion
  • Probabilities are associated with CFG rules.
  • This allows modeling language learning, language
    perception, language errors, language change and
    dialect continua.
Write a Comment
User Comments (0)
About PowerShow.com