Title: Fall 2005
1EECS 595 / LING 541 / SI 661761
Natural Language Processing
- Fall 2005
- Lecture Notes 2
2Course logistics
- Instructor Prof. Dragomir Radev
(radev_at_umich.edu) Ph.D., Computer Science,
Columbia University Formerly at IBM TJ Watson
Research Center - Times Thursdays 240-525 PM, in 411, West Hall
- Office hours TBA, 3080 West Hall Connector
Course home page
http//www.si.umich.edu/radev/NLP-fall2005
3Linguistic Fundamentals
4Syntactic categories
black Persian tabbysmall
Nathalie likes
cats.
- Open (lexical) and closed (functional) categories
No-fly-zone yadda yadda yadda
the in
5Morphology
The dog chased the yellow bird.
- Parts of speech eight (or so) general types
- Inflection (number, person, tense)
- Derivation (adjective-adverb, noun-verb)
- Compounding (separate words or single word)
- Part-of-speech tagging
- Morphological analysis (prefix, root, suffix,
ending)
6Part of speech tags
From Church (1991) - 79 tags
NN / singular noun / IN / preposition
/ AT / article / NP / proper noun / JJ
/ adjective / , / comma / NNS /
plural noun / CC / conjunction / RB /
adverb / VB / un-inflected verb / VBN /
verb en (taken, looked (passive,perfect)) / VBD
/ verb ed (took, looked (past tense)) / CS
/ subordinating conjunction /
7Jabberwocky (Lewis Carroll)
- Twas brillig, and the slithy tovesDid gyre
and gimble in the wabeAll mimsy were the
borogoves,And the mome raths outgrabe."Beware
the Jabberwock, my son!The jaws that bite, the
claws that catch!Beware the Jubjub bird, and
shunThe frumious Bandersnatch!"
8Nouns
- Nouns dog, tree, computer, idea
- Nouns vary in number (singular, plural), gender
(masculine, feminine, neuter), case (nominative,
genitive, accusative, dative) - Latin filius (m), filia (f), filium
(object)German Mädchen - Clitics (s)
9Pronouns
- Pronouns she, ourselves, mine
- Pronouns vary in person, gender, number, case (in
English nominative, accusative, possessive, 2nd
possessive, reflexive)
Mary saw her in the mirror. Mary saw herself in
the mirror.
- Anaphors herself, each other
10Determiners and adjectives
- Articles the, a
- Demonstratives this, that
- Adjectives describe properties
- Attributive and predicative adjectives
- Agreement in gender, number
- Comparative and superlative (derivative and
periphrastic) - Positive form
11Verbs
- Actions, activities, and states (throw, walk,
have) - English four verb forms
- tenses present, past, future
- other inflection number, person
- gerunds and infinitive
- aspect progressive, perfective
- voice active, passive
- participles, auxiliaries
- irregular verbs
- French and Finnish many more inflections than
English
12Other parts of speech
- Adverbs, prepositions, particles
- phrasal verbs (the plane took off, take it off)
- particles vs. prepositions (she ran up a
bill/hill) - Coordinating conjunctions and, or, but
- Subordinating conjunctions if, because, that,
although - Interjections Ouch!
13Phrase structure
- Constraints on word order
- Constituents NP, PP, VP, AP
- Phrase structure grammars
S
NP
VP
PN
V
N
Det
N
Spot
chased
a
bird
14Phrase structure
- Paradigmatic relationships (e.g., constituency)
- Syntagmatic relationships (e.g., collocations)
S
NP
VP
VBD
That
man
PP
NP
the
butterfly
IN
NP
caught
a
net
with
15Phrase-structure grammars
Peter gave Mary a book. Mary gave Peter a book.
- Constituent order (SVO, SOV)
- imperative forms
- sentences with auxiliary verbs
- interrogative sentences
- declarative sentences
- start symbol and rewrite rules
- context-free view of language
16Sample phrase-structure grammar
S ? NP VPNP ? AT NNSNP ? AT NNNP ? NP
PPVP ? VP PP VP ? VBD VP ? VBD NP P ? IN
NP
AT ? theNNS ? children NNS ? students NNS ?
mountains VBD ? slept VBD ? ate VBD ? saw IN
? in IN ? of NN ? cake
17Phrase structure grammars
- Local dependencies
- Non-local dependencies
- Subject-verb agreement
The women who found the wallet were given a
reward.
Should Peter buy a book? Which book should Peter
buy?
18Dependency arguments and adjuncts
Sue watched the man at the next table.
- Event dependents (verb arguments are usually
NPs) - agent, patient, instrument, goal - semantic roles
- subject, direct object, indirect object
- transitive, intransitive, and ditransitive verbs
- active and passive voice
19Subcategorization
- Arguments subject complements
- adjuncts vs. complements
- adjuncts are optional and describe time, place,
manner - subordinate clauses
- subcategorization frames
20Subcategorization
- Subject The children eat candy.Object The
children eat candy.Prepositional phrase She put
the book on the table.Predicative adjective We
made the man angry.Bare infinitive She helped
me walk.To-infinitive She likes to
walk.Participial phrase She stopped singing
that tune at the end.That-clause She thinks
that it will rain tomorrow.Question-form
clauses She asked me what book I was reading.
21Subcategorization frames
- Intransitive verbs The woman walked
- Transitive verbs John loves Mary
- Ditransitive verbs Mary gave Peter flowers
- Intransitive with PP I rent in Paddington
- Transitive with PP She put the book on the table
- Sentential complement I know that she likes you
- Transitive with sentential complement She told
me that Gary is coming on Tuesday
22Selectional restrictions and preferences
- Subcategorization frames capture syntactic
regularities about complements - Selectional restrictions and preferences capture
semantic regularities bark, eat
23Phrase structure ambiguity
- Grammars are used for generating and parsing
sentences - Parses
- Syntactic ambiguity
- Attachment ambiguity Our company is training
workers. - The children ate the cake with a spoon.
- High vs. low attachment
- Garden path sentences The horse raced past the
barn fell. Is the book on the table red?
24Ungrammaticality vs. semantic abnormality
Slept children the. Colorless green ideas
sleep furiously. The cat barked.
25Semantics and pragmatics
- Lexical semantics and compositional semantics
- Hypernyms, hyponyms, antonyms, meronyms and
holonyms (part-whole relationship, tire is a
meronym of car), synonyms, homonyms - Senses of words, polysemous words
- Homophony (bass).
- Collocations white hair, white wine
- Idioms to kick the bucket
26Discourse analysis
1. Mary helped Peter get out of the car. He
thanked her.2. Mary helped the other passenger
out of the car. The man had asked her for
help because of his foot injury.
- Information extraction problems (entity
crossreferencing)
Hurricane Hugo destroyed 20,000 Florida homes.At
an estimated cost of one billion dollars, the
disasterhas been the most costly in the states
history.
27Pragmatics
- The study of how knowledge about the world and
language conventions interact with literal
meaning. - Speech acts
- Research issues resolution of anaphoric
relations, modeling of speech acts in dialogues
28Other areas of NLP
- Linguistics is traditionally divided into
phonetics, phonology, morphology, syntax,
semantics, and pragmatics. - Sociolinguistics interactions of social
organization and language. - Historical linguistics change over time.
- Linguistic typology
- Language acquisition
- Psycholinguistics real-time production and
perception of language
29Word classes andpart-of-speech tagging
30Part of speech tagging
- Problems transport, object, discount, address
- More problems content
- French est, président, fils
- Book that flight what is the part of speech
associated with book? - POS tagging assigning parts of speech to words
in a text. - Three main techniques rule-based tagging,
stochastic tagging, transformation-based tagging
31Rule-based POS tagging
- Use dictionary or FST to find all possible parts
of speech - Use disambiguation rules (e.g., ARTV)
- Typically hundreds of constraints can be designed
manually
32Example in French
ltSgt
beginning of sentence La rf b nms
u article teneur nfs nms
noun feminine singular Moyenne
jfs nfs v1s v2s v3s adjective feminine
singular en p a b
preposition uranium nms
noun masculine singular des
p r preposition
rivieres nfp noun
feminine plural , x
punctuation bien_que
cs subordinating conjunction
délicate jfs
adjective feminine singular À p
preposition calculer
v verb
33Sample rules
- BS3 BI1 A BS3 (3rd person subject personal
pronoun) cannot be followed by a BI1 (1st person
indirect personal pronoun). In the example il
nous faut'' (\it we need) - il'' has the tag
BS3MS and nous'' has the tags BD1P BI1P BJ1P
BR1P BS1P. The negative constraint BS3 BI1''
rules out BI1P'', and thus leaves only 4
alternatives for the word nous''. - N K The tag N (noun) cannot be followed by a tag
K (interrogative pronoun) an example in the test
corpus would be ... fleuve qui ...''
(...river, that...). Since qui'' can be tagged
both as an E'' (relative pronoun) and a K''
(interrogative pronoun), the E'' will be chosen
by the tagger since an interrogative pronoun
cannot follow a noun (N''). - R VA word tagged with R (article) cannot be
followed by a word tagged with V (verb) for
example l' appelle'' (calls him/her). The word
appelle'' can only be a verb, but l''' can be
either an article or a personal pronoun. Thus,
the rule will eliminate the article tag, giving
preference to the pronoun.
34Stochastic POS tagging
- HMM tagger
- Pick the most likely tag for this word
- P(wordtag) P(tagprevious n tags) find tag
sequence that maximizes the probability formula - A bigram-based HMM tagger chooses the tag ti for
word wi that is most probable given the previous
tag ti-1 and the current word wi - ti argmaxj P(tjti-1,wi)
- ti argmaxj P(tjti-1)P(witj) HMM equation
for a single tag
35Example
- Secretariat/NNP is/VBZ expected/VBN to/TO race/VB
tomorrow/ADV - People/NNS continue/VBP to/TO inquire/VB the/DT
reason/NN for/IN the/DT race/NN for/IN outer/JJ
space/NN - P(VBTO)P(raceVB)
- P(NNTO)P(raceNN)
- TO toVB (to sleep), toNN (to school)
36Example (contd)
- P(NNTO) .021
- P(VBTO) .34
- P(raceNN) .00041
- P(raceVB) .00003
- P(VBTO)P(raceVB) .00001
- P(NNTO)P(raceNN) .000007
37HMM Tagging
- T argmax P(TW), where Tt1,t2,,tn
- By Bayes rule P(TW) P(T)P(WT)/P(W)
- Thus we are attempting to choose the sequence of
tags that maximizes the rhs of the equation - P(W) can be ignored
- P(T)P(WT) ?P(wiw1t1wi-1ti-1ti)P(tiw1t1wi-1t
i-1)
38Transformation-based learning
- P(NNrace) .98
- P(VBrace) .02
- Change NN to VB when the previous tag is TO
- Types of rules
- The preceding (following) word is tagged z
- The word two before (after) is tagged z
- One of the two preceding (following) words is
tagged z - One of the three preceding (following) words is
tagged z - The preceding word is tagged z and the following
word is tagged w
39Confusion matrix
Most confusing NN vs. NNP vs. JJ, VBD vs. VBN
vs. JJ
40Readings
- JM Chapters 1, 2, 3, 8
- What is Computational Linguistics by Hans
Uszkoreithttp//www.coli.uni-sb.de/hansu/what_is
_cl.html - Lecture notes 1
41Readings
- JM Chapters 3, 8
- Lecture notes 2