Title: Fall 2005
1EECS 595 / LING 541 / SI 661
Natural Language Processing
- Fall 2005
- Lecture Notes 8
2Evaluation of NLP systems
3The classical pipeline (for supervised learning)
- Training set/dev set/test set
- Dumb baseline
- Intelligent baseline
- Your algorithm
- Human ceiling
- Accuracy/precision/recall
- Multiple references
- Statistical significance
4Special cases
- Document retrieval systems
- Part of speech tagging
- Parsing
- Labeled recall
- Labeled precision
- Crossing brackets
5Word classes andpart-of-speech tagging
6Part of speech tagging
- Problems transport, object, discount, address
- More problems content
- French est, président, fils
- Book that flight what is the part of speech
associated with book? - POS tagging assigning parts of speech to words
in a text. - Three main techniques rule-based tagging,
stochastic tagging, transformation-based tagging
7Rule-based POS tagging
- Use dictionary or FST to find all possible parts
of speech - Use disambiguation rules (e.g., ARTV)
- Typically hundreds of constraints can be designed
manually
8Example in French
ltSgt
beginning of sentence La rf b nms
u article teneur nfs nms
noun feminine singular Moyenne
jfs nfs v1s v2s v3s adjective feminine
singular en p a b
preposition uranium nms
noun masculine singular des
p r preposition
rivieres nfp noun
feminine plural , x
punctuation bien_que
cs subordinating conjunction
délicate jfs
adjective feminine singular À p
preposition calculer
v verb
9Sample rules
- BS3 BI1 A BS3 (3rd person subject personal
pronoun) cannot be followed by a BI1 (1st person
indirect personal pronoun). In the example il
nous faut'' (\it we need) - il'' has the tag
BS3MS and nous'' has the tags BD1P BI1P BJ1P
BR1P BS1P. The negative constraint BS3 BI1''
rules out BI1P'', and thus leaves only 4
alternatives for the word nous''. - N K The tag N (noun) cannot be followed by a tag
K (interrogative pronoun) an example in the test
corpus would be ... fleuve qui ...''
(...river, that...). Since qui'' can be tagged
both as an E'' (relative pronoun) and a K''
(interrogative pronoun), the E'' will be chosen
by the tagger since an interrogative pronoun
cannot follow a noun (N''). - R VA word tagged with R (article) cannot be
followed by a word tagged with V (verb) for
example l' appelle'' (calls him/her). The word
appelle'' can only be a verb, but l''' can be
either an article or a personal pronoun. Thus,
the rule will eliminate the article tag, giving
preference to the pronoun.
10Confusion matrix
IN JJ NN NNP RB VBD VBN
IN - .2 .7
JJ .2 - 3.3 2.1 1.7 .2 2.7
NN 8.7 - .2
NNP .2 3.3 4.1 - .2
RB 2.2 2.0 .5 -
VBD .3 .5 - 4.4
VBN 2.8 2.6 -
Most confusing NN vs. NNP vs. JJ, VBD vs. VBN
vs. JJ
11HMM Tagging
- T argmax P(TW), where Tt1,t2,,tn
- By Bayess theorem P(TW) P(T)P(WT)/P(W)
- Thus we are attempting to choose the sequence of
tags that maximizes the rhs of the equation - P(W) can be ignored
- P(T)P(WT) ?
- P(T) is called the prior, P(WT) is called the
likelihood.
12HMM tagging (contd)
- P(T)P(WT) P(wiw1t1wi-1ti-1ti)P(ti
t1ti-2ti-1) - Simplification 1 P(WT) P(witi)
- Simplification 2 P(T) P(titi-1)
- T argmax P(TW) argmax P(witi) P(titi-1)
?
?
?
?
13Estimates
- P(NNDT) C(DT,NN)/C(DT)56509/116454 .49
- P(isVBZ C(VBZ,is)/C(VBZ)10073/21627.47
14Example
- Secretariat/NNP is/VBZ expected/VBN to/TO race/VB
tomorrow/NR - People/NNS continue/VBP to/TO inquire/VB the/AT
reason/NN for/IN the/AT race/NN for/IN outer/JJ
space/NN - TO toVB (to sleep), toNN (to school)
15Example
NNP
VBZ
VBN
TO
VB
NR
Secretariat
is
expected
race
tomorrow
to
NNP
VBZ
VBN
TO
NN
NR
Secretariat
is
expected
race
tomorrow
to
16Example (contd)
- P(NNTO) .00047
- P(VBTO) .83
- P(raceNN) .00057
- P(raceVB) .00012
- P(NRVB) .0027
- P(NRNN) .0012
- P(VBTO)P(NRVB)P(raceVB) .00000027
- P(NNTO)P(NRNN)P(raceNN) .00000000032
17Decoding
- Finding what sequence of states is the source of
a sequence of observations - Viterbi decoding (dynamic programming) finding
the optimal sequence of tags - Input HMM and sequence of words, output
sequence of states
18Transformation-based learning
- P(NNrace) .98
- P(VBrace) .02
- Change NN to VB when the previous tag is TO
- Types of rules
- The preceding (following) word is tagged z
- The word two before (after) is tagged z
- One of the two preceding (following) words is
tagged z - One of the three preceding (following) words is
tagged z - The preceding word is tagged z and the following
word is tagged w