CSA2050: Natural Language Processing - PowerPoint PPT Presentation

About This Presentation
Title:

CSA2050: Natural Language Processing

Description:

if word 1 is an adj, adv, or quantifier and the following ... 1 A/ADV/QUANT) ( 2 SENT-LIM) (NOT -1 SVOC/A) Then eliminate non-ADV tags. Else eliminate ADV tag ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 24
Provided by: michael307
Category:

less

Transcript and Presenter's Notes

Title: CSA2050: Natural Language Processing


1
CSA2050 Natural Language Processing
  • Tagging 2
  • Rule-Based Tagging
  • Stochastic Tagging
  • Hidden Markov Models (HMMs)
  • N-Grams

2
Tagging 2 Lecture
  • Slides based on Mike Rosner and Marti Hearst
    notes
  • Additions from NLTK tutorials

3
Rule-Based Tagger
  • Basic Idea
  • Assign all possible tags to words
  • Remove tags according to set of rulesif word1
    is an adj, adv, or quantifier and the following
    is a sentence boundary and word-1 is not a verb
    like consider then eliminate non-adv else
    eliminate adv.
  • Typically more than 1000 hand-written rules, but
    may be machine-learned.

4
ENGTWOL
  • Based on two-level morphology
  • 56,000 entries for English word stems
  • Each entry annotated with morphological and
    syntactic features

5
Sample ENGTWOL Lexicon
6
ENGTWOL Tagger
  • Stage 1 Run words through morphological analyzer
    to get all parts of speech.
  • E.g. for the phrase the tables, we get the
    following output"ltthegt" "the" ltDefgt DET
    CENTRAL ART SG/PL "lttablesgt" "table" N NOM PL
    "table" ltSVOgt V PRES SG3 VFIN
  • Stage 2 Apply constraints to rule out incorrect
    POSs

7
Examples of Constraints
  • Discard all verb readings if to the left there is
    an unambiguous determiner, and between that
    determiner and the ambiguous word itself, there
    are no nominals (nouns, abbreviations etc.).
  • Discard all finite verb readings if the
    immediately preceding word is to.
  • Discard all subjunctive readings if to the left,
    there are no instances of the subordinating
    conjunction that or lest.
  • The first constraint would discard the verb
    reading in the previous representation.
  • There are about 1,100 constraints

8
Example
  • Pavlov PVLOV N NOM SG PROPER
  • had HAVE V PAST VFIN SVO
  • HAVE PCP2 SVO
  • shown SHOW PCP2 SVOO SVO SV
  • that ADV
  • PRON DEM SG
  • DET CENTRAL SEM SG
  • CS
  • salivation N NOM SG

9
Actual Constraint Syntax
  • Given input thatIf (1 A/ADV/QUANT) (2
    SENT-LIM) (NOT -1 SVOC/A)Then eliminate non-ADV
    tagsElse eliminate ADV tag
  • this rule eliminates the adverbial sense of that
    as in it isnt that odd

10
3 Approaches to Tagging
  • Rule-Based Tagger ENGTWOL Tagger(Voutilainen
    1995)
  • Stochastic Tagger HMM-based Tagger
  • Transformation-Based Tagger Brill Tagger(Brill
    1995)

11
Stochastic Tagging
  • Based on the probability of a certain tag given
    various possibilities.
  • Necessitates a training corpus.
  • Difficulties
  • There are no probabilities for words that are not
    in the training corpus. ? Smoothing
  • Training corpus may be too different from test
    corpus.

12
Stochastic Tagging
  • Simple Method Choose the most frequent tag in
    the training text for each word!
  • Result 90 accuracy !!
  • But we can do better than that by employing a
    more elaborate statistical model
  • Hidden Markov Models (HMM) are a class of such
    models.

13
Hidden Markov Model(for pronunciation)
start ax b aw end start ix b aw dx end start
ax b ae t end
Observation Sequences
14
Three Fundamental Questions for HMMs
  • Given an HMM, how likely is a given observation
    sequence?
  • Given an observation sequence, how do we choose a
    state sequence that best explains the
    observations?
  • Given an observation sequence and a space of
    possible HMMs, how do we find the HMM that best
    fits the observed data?

15
Two Observation Sequences for Tagging
16
Two Kinds of Probability involved in generating a
sequence
  • t1 t2 t3 t5 t6
  • w1 w2 w3 w4 w5 Transitional
  • t1 t2 t4 t5 t6 P(tagprevious n
    tags)
  • t1 t2 t3 t5 t6
  • w1 w2 w3 w4 w5 Output
  • t1 t2 t4 t5 t6 P(wt)

17
Simplifying Assumptions cannot handle all
phenomena
  • Limited Horizon a given tag depends only upon a
    N previous tags usually N2.
  • central embedding?The cat the dog the bird saw
    bark meowed.
  • long distance dependenciesIt is easy to consider
    it impossible for anyone but a genius to try to
    talk to Chris.
  • Time (sentence position) invariance (P,V) may
    not be equally likely at beginning/end of sentence

18
Estimating N-gram probabilities
  • To estimate the probability that Z appears
    after XY
  • count how many times XYZ appears A
  • count how many times XY appears B
  • Estimate A/B
  • Same principle applies for tags
  • We can use these estimates to rank alternative
    tags for a given word.

19
Data Used for Training a Hidden Markov Model
  • Estimate the probabilities from relative
    frequencies.
  • Transitional probabilities probability that a
    sequence of tags t1, ... tn is followed by a tag
    t
  • P(tt1..tn) count(t1..tnfollowed by
    t)/count(t1..tn)
  • Output probabilities probability that a given
    tag t will be realised as a word w P(wt)
    count(w tagged as t)/count(t)

20
An Example
  • Secretariat/NNP is/VBZ expected/VBN to/TO
    race/VB tomorrow/NN
  • People/NNS continue/VBP to/TO inquire/VB the DT
    reason/NN for/IN the/DT race/NN for/IN outer/JJ
    space/NN
  • Consider first sentence choose between
  • A to/TO race/VBB to/TO race/NN
  • We need to choose maximum probability
  • P(A) P(VBTO) P(raceVB)
  • P(B) P(NNTO) P(raceNN)

21
Calculating Maximum
22
Remarks
  • We have shown how to calculate the most probable
    tag for one word.
  • Normally we are interested in the most probable
    sequence of tags for the entire sentence.
  • The Viterbi algorithm is used to calculate the
    entire sentence probability
  • Have a look at
  • http//en.wikipedia.org/wiki/Viterbi_algorithm
  • For a quick introduction (PDF on website)

23
Next Sessions
  • Transformation Based Tagging
  • Chunking
Write a Comment
User Comments (0)
About PowerShow.com