Title: Natural Language Processing
1Natural Language Processing
- Machine Translation
- Predicate argument structures
- Syntactic parses
- Lexical Semantics
- Probabilistic Parsing
- Ambiguities in sentence interpretation
2Machine Translation
- One of the first applications for computers
- bilingual dictionary word-word translation
- Good translation requires understanding!
- War and Peace, The Sound and The Fury?
- What can we do? Sublanguages.
- technical domains, static vocabulary
- Meteo in Canada, Caterpillar Tractor Manuals,
Botanical descriptions, Military Messages
3Example translation
4Translation Issues Korean to English
- Word order - Dropped arguments - Lexical
ambiguities - Structure vs morphology
5Common Thread
- Predicate-argument structure
- Basic constituents of the sentence and how they
are related to each other - Constituents
- John, Mary, the dog, pleasure, the store.
- Relations
- Loves, feeds, go, to, bring
6Abstracting away from surface structure
7Transfer lexicons
8Machine Translation Lexical Choice- Word Sense
Disambiguation
- Iraq lost the battle.
- Ilakuka centwey ciessta.
- Iraq battle lost.
- John lost his computer.
- John-i computer-lul ilepelyessta.
- John computer misplaced.
9Natural Language Processing
- Syntax
- Grammars, parsers, parse trees, dependency
structures - Semantics
- Subcategorization frames, semantic classes,
ontologies, formal semantics - Pragmatics
- Pronouns, reference resolution, discourse models
10Syntactic Categories
- Nouns, pronouns, Proper nouns
- Verbs, intransitive verbs, transitive verbs,
ditransitive verbs (subcategorization frames) - Modifiers, Adjectives, Adverbs
- Prepositions
- Conjunctions
11Syntactic Parsing
- The cat sat on the mat.
- Det Noun Verb Prep Det Noun
- Time flies like an arrow.
- Noun Verb Prep Det Noun
- Fruit flies like a banana.
- Noun Noun Verb Det Noun
12Parses
The cat sat on the mat
S
NP
VP
Det
PP
V
N
the
sat
cat
NP
Prep
N
on
Det
mat
the
13Parses
Time flies like an arrow.
S
NP
VP
N
V
time
PP
flies
Prep
NP
like
Det
N
arrow
an
14Parses
Time flies like an arrow.
S
NP
VP
N
V
NP
time
N
like
flies
N
Det
arrow
an
15Recursive transition nets for CFGs
NP
VP
S
S1
S2
S3
pp
det
noun
NP
adj
S4
S5
S6
noun
pronoun
- s - np,vp.
- np- pronoun noun det,adj, noun np,pp.
16Lexicon
noun(cat). noun(mat). det(the). det(a).
verb(sat). prep(on).
noun(flies). noun(time). noun(arrow).
det(an). verb(flies). verb(time).
prep(like).
17Lexicon with Roots
noun(flies,fly). noun(time,time). noun(arrow,arrow
). det(an,an). verb(flies,fly). verb(time,time). p
rep(like,like).
noun(cat,cat). noun(mat,mat). det(the,the) det(a,a
). verb(sat,sit). prep(on,on).
18Parses
The old can can hold the water.
S
NP
VP
det
NP
aux
the
V
can
N
adj
hold
can
old
det
N
the
water
19Structural ambiguities
- That factory can can tuna.
- That factory cans cans of tuna and salmon.
- Have the students in cse91 finish the exam in
212. - Have the students in cse91 finished the exam in
212?
20LexiconThe old can can hold the water.
Noun(can,can) Noun(cans,can) Noun(water,water) Nou
n(hold,hold) Noun(holds,hold) Det(the,the)
Verb(hold,hold) Verb(holds,hold) Aux(can,can) Adj(
old,old)
21Simple Context Free Grammar in BNF notation
- S ? NP VP
- NP ? Pronoun Noun Det Adj Noun NP PP
- PP ? Prep NP
- V ? Verb Aux Verb
- VP ? V V NP V NP NP V NP PP VP PP
22Top-down parse in progressThe, old, can, can,
hold, the, water
- S ? NP VP
- NP ? NP?
- NP ? Pronoun?
- Pronoun? fail
- NP ? Noun?
- Noun? fail
- NP ? Det Adj Noun?
- Det? the
- ADJ?old Noun? Can
- Succeed.
- Succeed.
- VP?
23Top-down parse in progresscan, hold, the, water
- VP ? VP?
- V ? Verb?
- Verb? fail
- V ? Aux Verb?
- Aux? can
- Verb? hold
- succeed
- succeed
- fail the, water
24Top-down parse in progresscan, hold, the, water
- VP ? VP NP
- V ? Verb?
- Verb? fail
- V ? Aux Verb?
- Aux? can
- Verb? hold
- NP ? Pronoun?
- Pronoun? fail
- NP ? Noun?
- Noun? fail
- NP ? Det Adj Noun?
- Det? the
- ADJ? fail
-
25Lexicon
Verb(hold,hold) Verb(holds,hold) Aux(can,can) Adj(
old,old) Adj( , )
Noun(can,can) Noun(cans,can) Noun(water,water) Nou
n(hold,hold) Noun(holds,hold) Det(the,the)
26Top-down parse in progresscan, hold, the, water
- VP ? V NP?
- V ? Verb?
- Verb? fail
- V ? Aux Verb?
- Aux? can
- Verb? hold
- NP ? Pronoun?
- Pronoun? fail
- NP ? Noun?
- Noun? fail
- NP ? Det Adj Noun?
- Det? the
- ADJ? Noun? water
- SUCCEED
- SUCCEED
27Lexicon
Verb(hold,hold) Verb(holds,hold) Aux(can,can) Adj(
old,old) Adj( , )
Noun(can,can) Noun(cans,can) Noun(water,water) Nou
n(hold,hold) Noun(holds,hold) Det(the,the) Noun(ol
d,old)
28Top-down approach
- Start with goal of sentence
- S ? NP VP
- S ? Wh-word Aux NP VP
- Will try to find an NP 4 different ways before
trying a parse where the verb comes first. - What does this remind you of?
- search
- What would be better?
29Bottom-up approach
- Start with words in sentence.
- What structures do they correspond to?
- Once a structure is built, kept on a CHART.
30Bottom-up parse in progress
det adj noun aux verb det
noun.
The old can can hold the
water.
det noun aux/verb noun/verb noun det
noun.
31Bottom-up parse in progress
det adj noun aux verb det
noun.
The old can can hold the
water.
det noun aux/verb noun/verb noun det
noun.
32Bottom-up parse in progress
det adj noun aux verb det
noun.
The old can can hold the
water.
det noun aux/verb noun/verb noun det
noun.
33Top-down vs. Bottom-up
- Helps with POS ambiguities only consider
relevant POS - Rebuilds the same structure repeatedly
- Spends a lot of time on impossible parses
- Has to consider every POS
- Builds each structure once
- Spends a lot of time on useless structures
What would be better?
34Hybrid approach
- Top-down with a chart
- Use look ahead and heuristics to pick most likely
sentence type - Use probabilities for pos tagging, pp
attachments, etc.
35Features
- C for Case, Subjective/Objective
- She visited her.
- P for Person agreement, (1st, 2nd, 3rd)
- I like him, You like him, He likes him,
- N for Number agreement, Subject/Verb
- He likes him, They like him.
- G for Gender agreement, Subject/Verb
- English, reflexive pronouns He washed himself.
- Romance languages, det/noun
- T for Tense,
- auxiliaries, sentential complements, etc.
- will finished is bad
36Example Lexicon EntriesUsing FeaturesCase,
Number, Gender, Person
pronoun(subj, sing, fem, third, she,
she). pronoun(obj, sing, fem, third, her,
her). pronoun(obj, Num, Gender, second, you,
you). pronoun(subj, sing, Gender, first, I,
I). noun(Case, plural, Gender, third, flies,fly).
37Language to LogicHow do we get there?
- John went to the book store.
- ? John ? store1, go(John, store1)
- John bought a book.
- buy(John,book1)
- John gave the book to Mary.
- give(John,book1,Mary)
- Mary put the book on the table.
- put(Mary,book1,table1)
38Lexical SemanticsSame event - different sentences
- John broke the window with a hammer.
- John broke the window with the crack.
- The hammer broke the window.
- The window broke.
39Same event - different syntactic frames
- John broke the window with a hammer.
- SUBJ VERB OBJ MODIFIER
- John broke the window with the crack.
- SUBJ VERB OBJ MODIFIER
- The hammer broke the window.
- SUBJ VERB OBJ
- The window broke.
- SUBJ VERB
40Semantics -predicate arguments
- break(AGENT, INSTRUMENT, PATIENT)
- AGENT PATIENT INSTRUMENT
- John broke the window with a hammer.
- INSTRUMENT PATIENT
- The hammer broke the window.
- PATIENT
- The window broke.
- Fillmore 68 - The case
for case
41-
- AGENT PATIENT INSTRUMENT
- John broke the window with a hammer.
- SUBJ OBJ
MODIFIER - INSTRUMENT PATIENT
- The hammer broke the window.
- SUBJ OBJ
- PATIENT
- The window broke.
- SUBJ
42Constraint Satisfaction
- break (Agent animate,
- Instrument tool,
- Patient physical-object)
- Agent subj
- Instrument subj, with-pp
- Patient obj, subj
- ACL81,ACL85,ACL86,MT90,CUP90,AIJ93
43Syntax/semantics interaction
- Parsers will produce syntactically valid parses
for semantically anomalous sentences - Lexical semantics can be used to rule them out
44Constraint Satisfaction
- give (Agent animate,
- Patient physical-object
- Recipient animate)
- Agent subj
- Patient object
- Recipient indirect-object, to-pp
-
45Subcategorization Frequencies
- The women kept the dogs on the beach.
- Where keep? Keep on beach 95
- NP XP 81
- Which dogs? Dogs on beach 5
- NP 19
- The women discussed the dogs on the beach.
- Where discuss? Discuss on beach 10
- NP PP 24
- Which dogs? Dogs on beach 90
- NP 76
Ford, Bresnan, Kaplan 82, Jurafsky 98,
Roland,Jurafsky 99
46Reading times
- NP-bias (slower times to bold word)
- The waiter confirmed the reservation was made
yesterday. - The defendant accepted the verdict would be
decided soon.
47Reading times
- S-bias (no slower times to bold word)
- The waiter insisted the reservation was made
yesterday. - The defendant wished the verdict would be decided
soon.
Trueswell, Tanenhaus and Kello, 93 Trueswell and
Kim 98
48Probabilistic Context Free Grammars
- Adding probabilities
- Lexicalizing the probabilities
49Simple Context Free Grammar in BNF
- S ? NP VP
- NP ? Pronoun
- Noun
- Det Adj Noun
- NP PP
- PP ? Prep NP
- V ? Verb
- Aux Verb
- VP ? V
- V NP
- V NP NP
- V NP PP
- VP PP
50Simple Probabilistic CFG
- S ? NP VP
- NP ? Pronoun 0.10
- Noun 0.20
- Det Adj Noun 0.50
- NP PP 0.20
- PP ? Prep NP 1.00
- V ? Verb 0.20
- Aux Verb 0.20
- VP ? V 0.10
- V NP 0.40
- V NP NP 0.10
- V NP PP 0.20
- VP PP 0.20
51Simple Probabilistic Lexicalized CFG
- S ? NP VP
- NP ? Pronoun 0.10
- Noun 0.20
- Det Adj Noun 0.50
- NP PP 0.20
- PP ? Prep NP 1.00
- V ? Verb 0.20
- Aux Verb 0.20
- VP ? V 0.87 sleep, cry, laugh
- V NP 0.03
- V NP NP 0.00
- V NP PP 0.00
- VP PP 0.10
52Simple Probabilistic Lexicalized CFG
- VP ? V 0.30
- V NP 0.60 break,split,crack..
- V NP NP 0.00
- V NP PP 0.00
- VP PP 0.10
- VP ? V 0.10 what about
- V NP 0.40 leave?
- V NP NP 0.10 leave1, leave2?
- V NP PP 0.20
- VP PP 0.20
53A TreeBanked Sentence
S
VP
NP-SBJ
Analysts
NP
S
VP
NP-SBJ
T-1
would
NP
PP-LOC
54The same sentence, PropBanked
have been expecting
Arg1
Arg0
Analysts
55Headlines
- Police Begin Campaign To Run Down Jaywalkers
- Iraqi Head Seeks Arms
- Teacher Strikes Idle Kids
- Miners Refuse To Work After Death
- Juvenile Court To Try Shooting Defendant
56Events
57Context Sensitivity
- Programming languages are Context Free
- Natural languages are Context Sensitive?
- Movement
- Features
- respectively
- John, Mary and Bill ate peaches, pears and
apples, respectively.
58The Chomsky Grammar Hierarchy
- Regular grammars, aabbbb
- S ? aS nil bS
- Context free grammars, aaabbb
- S ? aSb nil
- Context sensitive grammars, aaabbbccc
- xSy ? xby
- Turing Machines
59Recursive transition nets for CFGs
NP
VP
S
S1
S2
S3
pp
det
noun
NP
adj
S4
S5
S6
noun
pronoun
- s - np,vp.
- np- pronoun noun det,adj, noun np,pp.
60Most parsers are Turing Machines
- To give a more natural and comprehensible
treatment of movement - For a more efficient treatment of features
- Not because of respectively most parsers cant
handle it.
61Nested Dependencies and Crossing Dependencies
CF
The dog chased the cat that bit the mouse that
ran.
The mouse the cat the dog chased bit ran.
CF
CS
- John, Mary and Bill ate peaches, pears and
apples, respectively
62Movement
- What did John give to Mary?
- Where did John give to Mary?
- John gave cookies to Mary.
- John gave to Mary.
63Handling MovementHold registers/Slash Categories
- S - Wh, S/NP
- S/NP - VP
- S/NP - NP VP/NP
- VP/NP - Verb