Title: Computational Linguistics
1Computational Linguistics
IBased on Dan Jurafskys textbook, Speech and
Language Processing Ch. 13 Thanks to Dan Jurafsky
and Jim Martin and James Pustejovsky for many of
these slides!
2Outline for Grammar/Parsing Week
- Context-Free Grammars and Constituency
- Some common CFG phenomena for English
- Sentence-level constructions
- NP, PP, VP
- Coordination
- Subcategorization
- Top-down and Bottom-up Parsing
- Dynamic Programming Parsing
- Quick sketch of probabilistic parsing
3Review
- Parts of Speech tagging For each word in a
given sentence, determine - Assigning parts of speech to all the words in a
sentence - In the context of the sentence
- POS syntactic/morphological categories of word
- Syntax how words group togeter
- Morphology surface word stem affixes
4Syntax
- Syntax
- Refers to the way words are arranged together,
and the relationship between them. - Distinction
- Prescriptive grammar how people should speak or
write - Descriptive grammar how people actually do
- Goal of syntax (grammar)
- model the unconscious knowledge of people about
how to use their native language
5Syntax
- Why should we care?
- Grammar checkers
- Question answering
- Information extraction
- Machine translation
6key ideas of syntax
- Constituency
- Recursive structure
- Part of a structure
- Subcategorization
- Variation of a structure
- Type of head word causing the variation
- Grammatical relations
- Relation of words in a structure
- Dependency relations
- Non-adjacent grammatical relations
7Context-Free Grammars (CFG)
- Capture constituency and ordering
- Ordering Rules governing the ordering of words,
phrases, clauses in the language - E.g., noun phrase such as A high-class spot, A
high-class spot like Mindys - Constituency How words group into units and how
the various kinds of units behave - E.g., noun phrase in prepositional phrase and
verb phrase
8Constituency
- Eamples of Noun phrases (NPs)
- Three parties from Brooklyn
- A high-class spot such as Mindys
- The Broadway coppers
- They
- Harry the Horse
- The reason he comes into the Hot Box
- How do we know these form a constituent?
- Substitution principle
9Constituency (II)
- They can all appear before a verb
- Three parties from Brooklyn arrive
- A high-class spot such as Mindys attracts
- The Broadway coppers love
- They sit
- But individual words cant always appear before
verbs - from arrive
- as attracts
- the is
- spot is
- Must be able to state generalizations like
- Noun phrases occur before verbs
10Constituency (III)
- Preposing and postposing
- On September 17th, Id like to fly from Atlanta
to Denver - Id like to fly on September 17th from Atlanta to
Denver - Id like to fly from Atlanta to Denver on
September 17th. - But not
- On September, Id like to fly 17th from Atlanta
to Denver - On Id like to fly September 17th from Atlanta
to Denver - Express constituency formally
- Rewriting rules
- Grammars
- Context-free grammar
11CFG example
- S -gt NP VP
- NP -gt Det NOMINAL
- NOMINAL -gt Noun
- VP -gt Verb
- Det -gt a
- Noun -gt flight
- Verb -gt left
12CFGs set of rules
- S -gt NP VP
- This says that there are units called S, NP, and
VP in this language - That an S consists of an NP followed immediately
by a VP - Doesnt say that thats the only kind of S
- Nor does it say that this is the only place that
NPs and VPs occur
13Generativity
- As with FSAs you can view these rules as either
analysis or synthesis machines - Generate strings in the language
- Reject strings not in the language
- Impose structures (trees) on strings in the
language - How can we define grammatical vs. ungrammatical
sentences?
14Derivations
- A derivation is a sequence of rules applied to a
string that accounts for that string - Covers all the elements in the string
- Covers only the elements in the string
15Derivations as Trees
16CFGs more formally
- A context-free grammar has 4 parameters
- A set of non-terminal symbols (variables) N
- A set of terminal symbols ? (disjoint from N)
- A set of productions P, each of the form
- A -gt ?
- Where A is a non-terminal and ? is a string of
symbols from the infinite set of strings (? ? N) - A start symbol S
17Defining a CF language via derivation
- A string A derives a string B if
- A can be rewritten as B via some series of rule
applications - More formally
- If A -gt ? is a production of P
- ? and ? are any strings in the set (? ? N)
- Then we say that
- ?A? directly derives ??? or ?A? ? ???
- Derivation is a generalization of direct
derivation - Let ?1, ?2, ?m be strings in (? ? N), mgt 1,
s.t. - ?1? ?2, ?2? ?3 ?m-1? ?m
- We say that ?1derives ?m or ?1? ?m
- We then formally define language LG generated by
grammar G - A set of strings composed of terminal symbols
derived from S - LG w w is in ? and S ? w
18Parsing
- Parsing is the process of taking a string and a
grammar and returning one or many parse trees for
the input sentence
19Context?
- The notion of context in CFGs
- LHS symbol of a rule determines the RHS by itself
(free of context) - Other parts have no influence on the derivation
- Mom -gt N N Means that I can rewrite an N N
regardless of the context of A
20Key Constituents of English
- Key Constituents
- Sentences
- Noun phrases
- Verb phrases
- Prepositional phrases
- Similarity exists among languages
21Sentence-Types
- Declaratives A plane left
- S -gt NP VP
- Imperatives Leave!
- S -gt VP
- Yes-No Questions Did the plane leave?
- S -gt Aux NP VP
- WH Questions When did the plane leave?
- S -gt WH Aux NP VP
22NPs
- NP -gt Pronoun
- I came, you saw it, they conquered
- NP -gt Proper-Noun
- Los Angeles is west of Texas
- John Hennessy is the president of Stanford
- NP -gt Det Noun
- The president
- NP -gt Nominal
- Nominal -gt Noun Noun
- A morning flight to Denver
23PPs
- PP -gt Preposition NP
- From LA
- To the store
- On Tuesday morning
- With lunch
24Recursion
- Well have to deal with rules such as the
following where the non-terminal on the left also
appears somewhere on the right (directly) - NP -gt NP PP The flight to Boston
- VP -gt VP PP departed Miami at noon
25Recursion
- Of course, this is what makes syntax interesting
- Flights from Denver
- Flights from Denver to Miami
- Flights from Denver to Miami in February
- Flights from Denver to Miami in February on a
Friday - Flights from Denver to Miami in February on a
Friday under 300 - Flights from Denver to Miami in February on a
Friday under 300 with lunch
26Recursion
- Flights from Denver
- Flights from Denver to Miami
- Flights from Denver to Miami in
February - Flights from Denver to Miami in
February on a Friday - Etc.
- NP -gt NP PP
27Implications of recursion and context-freeness
- If you have a rule like
- VP -gt V NP
- The important thing the thing after the verb is
an NP - Unimportant internal affairs of that NP
28The point
- VP -gt V NP
- (I) hate
- flights from Denver
- flights from Denver to Miami
- flights from Denver to Miami in February
- flights from Denver to Miami in February on a
Friday - flights from Denver to Miami in February on a
Friday under 300 - flights from Denver to Miami in February on a
Friday under 300 with lunch
29Bracketed Notation
- S NP PRO I VP V prefer NP Det a Nom
N morning - N flight
30Coordination Constructions
- S -gt S and S
- John went to NY and Mary followed him
- NP -gt NP and NP
- VP -gt VP and VP
-
- In fact the right rule for English is
- X -gt X and X (Metarule)
- However we can say
- He was longwinded and a bully.
31Problems
- Agreement
- Subcategorization
- Movement (for want of a better term)
32Agreement
- This dog
- Those dogs
- This dog eats
- Those dogs eat
- This dogs
- Those dog
- This dog eat
- Those dogs eats
33Possible CFG Solution
- S -gt NP VP
- NP -gt Det Nominal
- VP -gt V NP
- SgS -gt SgNP SgVP
- PlS -gt PlNp PlVP
- SgNP -gt SgDet SgNom
- PlNP -gt PlDet PlNom
- PlVP -gt PlV NP
- SgVP -gtSgV Np
34CFG Solution for Agreement
- It works and stays within the power of CFGs
- But its ugly
- And it doesnt scale all that well
35Subcategorization
- Sneeze John sneezed
- John sneezed the book
- Say You said United has a flightS
- Prefer I prefer to leave earlierTO-VP
- I prefer United has a flight
- Give Give meNPa cheaper fareNP
- Help Can you help meNPwith a flightPP
- Give with a flight
36Subcategorization
- Subcat expresses the constraints that a predicate
(verb for now) places on the number and syntactic
types of arguments it wants to take (occur with).
37So?
- So the various rules for VPs overgenerate
- They permit the presence of strings containing
verbs and arguments that dont go together - For example
- VP -gt V NP
- therefore
- Sneezed the book is a VP since sneeze is a
verb and the book is a valid NP
38Possible CFG Solution
- VP -gt V
- VP -gt V NP
- VP -gt V NP PP
- VP -gt IntransV
- VP -gt TransV NP
- VP -gt TransVwPP NP PP
39Forward Pointer
- It turns out that verb subcategorization facts
will provide a key element for semantic analysis
(determining who did what to who in an event). -
40Movement
- Core example
- My travel agent booked the flight
- My travel agentNP booked the flightNPVPS
- i.e. book is a straightforward transitive verb.
It expects a single NP arg within the VP as an
argument, and a single NP arg as the subject.
41Movement
- What about?
- Which flight do you want me to have the travel
agent book? - The direct object argument to book isnt
appearing in the right place. It is in fact a
long way from where its supposed to appear. - And note that its separated from its verb by 2
other verbs.
42CFGs a summary
- CFGs appear to be just about what we need to
account for a lot of basic syntactic structure in
English. - But there are problems
- That can be dealt with adequately, although not
elegantly, by staying within the CFG framework. - There are simpler, more elegant, solutions that
take us out of the CFG framework (beyond its
formal power). - Syntactic theories HPSG, LFG, CCG, Minimalism,
etc.
43Other syntactic stuff
- Grammatical relations
- Subject
- I booked a flight to New York
- The flight was booked by my agent
- Object
- I booked a flight to New York
- Complement
- I said that I wanted to leave
44Dependency parsing
- Word to word links instead of constituency
- Based on the European rather than American
traditions - But dates back to the Greeks
- The original notions of Subject, Object and the
progenitor of subcategorization (called
valence) came out of Dependency theory. - Dependency parsing is quite popular as a
computational model - since relationships between words are quite
useful
45Dependency parsing
Parse tree Nesting of multi-word constituents
Typed dep parse Grammatical relations between
individual words
46Why are dependency parses useful?
- Example multi-document summarization
- Need to identify sentences from different
documents that each say roughly the same thing -
- phrase structure trees of paraphrasing sentences
which differ in word order can be significantly
different - but dependency representations will be very
similar
47Parsing
- Parsing assigning correct trees to input strings
- Correct tree a tree that covers all and only the
elements of the input and has an S at the top - For now enumerate all possible trees
- A further task disambiguation means choosing
the correct tree from among all the possible
trees.
48Treebanks
- Parsed corpora in the form of trees
- The Penn Treebank
- The Brown corpus
- The WSJ corpus
- Tgrep
- http//www.ldc.upenn.edu/ldc/online/treebank/
- Tregex
- http//www-nlp.stanford.edu/nlp/javadoc/javanlp/
49Parsing involves search
- As with everything of interest, parsing involves
a search which involves the making of choices - Well start with some basic (meaning bad) methods
before moving on to the one or two that you need
to know
50For Now
- Assume
- You have all the words already in some buffer
- The input isnt pos tagged
- We wont worry about morphological analysis
- All the words are known
51Top-Down Parsing
- Since were trying to find trees rooted with an S
(Sentences) start with the rules that give us an
S. - Then work your way down from there to the words.
52Top Down Space
53Bottom-Up Parsing
- Of course, we also want trees that cover the
input words. So start with trees that link up
with the words in the right way. - Then work your way up from there.
54Bottom-Up Space
55Control
- Of course, in both cases we left out how to keep
track of the search space and how to make choices - Which node to try to expand next
- Which grammar rule to use to expand a node
56Top-Down, Depth-First, Left-to-Right Search
57Example
58Example
59Example
60Top-Down and Bottom-Up
- Top-down
- Only searches for trees that can be answers (i.e.
Ss) - But also suggests trees that are not consistent
with the words - Bottom-up
- Only forms trees consistent with the words
- Suggest trees that make no sense globally
61So Combine Them
- There are a million ways to combine top-down
expectations with bottom-up data to get more
efficient searches - Most use one kind as the control and the other as
a filter - As in top-down parsing with bottom-up filtering
62Adding Bottom-Up Filtering
633 problems with TDDFLtR Parser
- Left-Recursion
- Ambiguity
- Inefficient reparsing of subtrees
64Left-Recursion
- What happens in the following situation
- S -gt NP VP
- S -gt Aux NP VP
- NP -gt NP PP
- NP -gt Det Nominal
-
- With the sentence starting with
- Did the flight
65Lots of ambiguity
- VP -gt VP PP
- NP -gt NP PP
- Show me the meal on flight 286 from SF to Denver
- 14 parses!
66Lots of ambiguity
- Church and Patil (1982)
- Number of parses for such sentences grows at rate
of number of parenthesizations of arithmetic
expressions - Which grow with Catalan numbers
- PPs Parses
- 1 2
- 2 5
- 3 14
- 4 132
- 5 469
- 6 1430
67Avoiding Repeated Work
- Parsing is hard, and slow. Its wasteful to redo
stuff over and over and over. - Consider an attempt to top-down parse the
following as an NP - A flight from Indianapolis to Houston on TWA
68(No Transcript)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72We done Part I
- Part I
- Context-Free Grammars and Constituency
- Some common CFG feature of English
- Baby parsers Top-down/Bottom-up Parsing
73- Part II
- CYK and Probabilistic parsers
74Grammars and Parsing
- Context-Free Grammars and Constituency
- Some common CFG phenomena for English
- Baby parsers Top-down and Bottom-up Parsing
- Today Real parsers Dynamic Programming parsing
- CKY
- Probabilistic parsing
- Optional section the Earley algorithm
75Dynamic Programming
- We need a method that fills a table with partial
results that - Does not do (avoidable) repeated work
- Does not fall prey to left-recursion
- Can find all the pieces of an exponential number
of trees in polynomial time. - Two popular methods
- CKY
- Earley
76The CKY (Cocke-Kasami-Younger) Algorithm
- Requires the grammar be in Chomsky Normal Form
(CNF) - All rules must be in following form
- A -gt B C
- A -gt w
- Any grammar can be converted automatically to
Chomsky Normal Form
77Converting to CNF
- Rules that mix terminals and non-terminals
- Introduce a new dummy non-terminal that covers
the terminal - INFVP -gt to VP replaced by
- INFVP -gt TO VP
- TO -gt to
- Rules that have a single non-terminal on right
(unit productions) - Rewrite each unit production with the RHS of
their expansions - Rules whose right hand side length gt2
- Introduce dummy non-terminals that spread the
right-hand side
78Automatic Conversion to CNF
79Sample Grammar
80Back to CKY Parsing
- Given rules in CNF
- Consider the rule A -gt BC
- If there is an A in the input then there must be
a B followed by a C in the input. - If the A goes from i to j in the input then there
must be some k st. iltkltj - Ie. The B splits from the C someplace.
81CKY
- So lets build a table so that an A spanning from
i to j in the input is placed in cell i,j in
the table. - So a non-terminal spanning an entire string will
sit in cell 0, n - If we build the table bottom up well know that
the parts of the A must go from i to k and from k
to j
82CKY
- Meaning that for a rule like A -gt B C we should
look for a B in i,k and a C in k,j. - In other words, if we think there might be an A
spanning i,j in the input AND - A -gt B C is a rule in the grammar THEN
- There must be a B in i,k and a C in k,j for
some iltkltj - So just loop over the possible k values
83CKY Table
- Filling the i,jth cell in the CKY table
84CKY Algorithm
85Note
- We arranged the loops to fill the table a column
at a time, from left to right, bottom to top. - This assures us that whenever were filling a
cell, the parts needed to fill it are already in
the table (to the left and below) - Are there other ways to fill the table?
860 Book 1 the 2 flight 3 through 4 Houston 5
87CYK Example
- S -gt NP VP
- VP -gt V NP
- NP -gt NP PP
- VP -gt VP PP
- PP -gt P NP
- NP -gt John, Mary, Denver
- V -gt called
- P -gt from
88Example
S
NP
VP
PP
VP
V
NP
NP
P
John called Mary from Denver
89Example
S
NP
VP
NP
PP
NP
V
John called Mary from Denver
90Example
91Example
92Example
93Example
94Example
95Example
96Example
97Example
98Example
99Example
100Example
101Example
102Example
103Example
104Back to Ambiguity
105Ambiguity
106Ambiguity
- No
- Both CKY and Earley will result in multiple S
structures for the 0,n table entry. - They both efficiently store the sub-parts that
are shared between multiple parses. - But neither can tell us which one is right.
- Not a parser a recognizer
- The presence of an S state with the right
attributes in the right place indicates a
successful recognition. - But no parse tree no parser
- Thats how we solve (not) an exponential problem
in polynomial time
107Converting CKY from Recognizer to Parser
- With the addition of a few pointers we have a
parser - Augment each new cell in chart to point to where
we came from.
108Optional section the Earley alg
109Problem (minor)
- We said CKY requires the grammar to be binary
(ie. In Chomsky-Normal Form). - We showed that any arbitrary CFG can be converted
to Chomsky-Normal Form so thats not a huge deal - Except when you change the grammar the trees come
out wrong - All things being equal wed prefer to leave the
grammar alone.
110Earley Parsing
- Allows arbitrary CFGs
- Where CKY is bottom-up, Earley is top-down
- Fills a table in a single sweep over the input
words - Table is length N1 N is number of words
- Table entries represent
- Completed constituents and their locations
- In-progress constituents
- Predicted constituents
111States
- The table-entries are called states and are
represented with dotted-rules. - S -gt ? VP A VP is predicted
- NP -gt Det ? Nominal An NP is in progress
- VP -gt V NP ? A VP has been found
112States/Locations
- It would be nice to know where these things are
in the input so - S -gt ? VP 0,0 A VP is predicted at the
start of the sentence - NP -gt Det ? Nominal 1,2 An NP is in progress
the Det goes from 1 to 2 - VP -gt V NP ? 0,3 A VP has been found
starting at 0 and ending at 3
113Graphically
114Earley
- As with most dynamic programming approaches, the
answer is found by looking in the table in the
right place. - In this case, there should be an S state in the
final column that spans from 0 to n1 and is
complete. - If thats the case youre done.
- S gt a ? 0,n1
115Earley Algorithm
- March through chart left-to-right.
- At each step, apply 1 of 3 operators
- Predictor
- Create new states representing top-down
expectations - Scanner
- Match word predictions (rule with word after dot)
to words - Completer
- When a state is complete, see what rules were
looking for that completed constituent
116Predictor
- Given a state
- With a non-terminal to right of dot
- That is not a part-of-speech category
- Create a new state for each expansion of the
non-terminal - Place these new states into same chart entry as
generated state, beginning and ending where
generating state ends. - So predictor looking at
- S -gt . VP 0,0
- results in
- VP -gt . Verb 0,0
- VP -gt . Verb NP 0,0
117Scanner
- Given a state
- With a non-terminal to right of dot
- That is a part-of-speech category
- If the next word in the input matches this
part-of-speech - Create a new state with dot moved over the
non-terminal - So scanner looking at
- VP -gt . Verb NP 0,0
- If the next word, book, can be a verb, add new
state - VP -gt Verb . NP 0,1
- Add this state to chart entry following current
one - Note Earley algorithm uses top-down input to
disambiguate POS! Only POS predicted by some
state can get added to chart!
118Completer
- Applied to a state when its dot has reached right
end of role. - Parser has discovered a category over some span
of input. - Find and advance all previous states that were
looking for this category - copy state, move dot, insert in current chart
entry - Given
- NP -gt Det Nominal . 1,3
- VP -gt Verb. NP 0,1
- Add
- VP -gt Verb NP . 0,3
119Earley how do we know we are done?
- How do we know when we are done?.
- Find an S state in the final column that spans
from 0 to n1 and is complete. - If thats the case youre done.
- S gt a ? 0,n1
120Earley
- So sweep through the table from 0 to n1
- New predicted states are created by starting
top-down from S - New incomplete states are created by advancing
existing states as new constituents are
discovered - New complete states are created in the same way.
121Earley
- More specifically
- Predict all the states you can upfront
- Read a word
- Extend states based on matches
- Add new predictions
- Go to 2
- Look at N1 to see if you have a winner
122Example
- Book that flight
- We should find an S from 0 to 3 that is a
completed state
123Example
124Example
125Example
126Details
- What kind of algorithms did we just describe
(both Earley and CKY) - Not parsers recognizers
- The presence of an S state with the right
attributes in the right place indicates a
successful recognition. - But no parse tree no parser
- Thats how we solve (not) an exponential problem
in polynomial time
127Back to Ambiguity
128Ambiguity
129Ambiguity
- No
- Both CKY and Earley will result in multiple S
structures for the 0,n table entry. - They both efficiently store the sub-parts that
are shared between multiple parses. - But neither can tell us which one is right.
- Not a parser a recognizer
- The presence of an S state with the right
attributes in the right place indicates a
successful recognition. - But no parse tree no parser
- Thats how we solve (not) an exponential problem
in polynomial time
130Converting Earley from Recognizer to Parser
- With the addition of a few pointers we have a
parser - Augment the Completer to point to where we came
from.
131Augmenting the chart with structural information
S8
S8
S9
S9
S10
S8
S11
S12
S13
132Retrieving Parse Trees from Chart
- All the possible parses for an input are in the
table - We just need to read off all the backpointers
from every complete S in the last column of the
table - Find all the S -gt X . 0,N1
- Follow the structural traces from the Completer
- Of course, this wont be polynomial time, since
there could be an exponential number of trees - So we can at least represent ambiguity
efficiently
133How to do parse disambiguation
- Probabilistic methods
- Augment the grammar with probabilities
- Then modify the parser to keep only most probable
parses - And at the end, return the most probable parse
134Probabilistic CFGs
- The probabilistic model
- Assigning probabilities to parse trees
- Getting the probabilities for the model
- Parsing with probabilities
- Slight modification to dynamic programming
approach - Task is to find the max probability tree for an
input
135Probability Model
- Attach probabilities to grammar rules
- The expansions for a given non-terminal sum to 1
- VP -gt Verb .55
- VP -gt Verb NP .40
- VP -gt Verb NP NP .05
- Read this as P(Specific rule LHS)
136PCFG
137PCFG
138Probability Model (1)
- A derivation (tree) consists of the set of
grammar rules that are in the tree - The probability of a tree is just the product of
the probabilities of the rules in the derivation.
139Probability model
- P(T,S) P(T)P(ST) P(T) since P(ST)1
140Probability Model (1.1)
- The probability of a word sequence P(S) is the
probability of its tree in the unambiguous case. - Its the sum of the probabilities of the trees in
the ambiguous case.
141Getting the Probabilities
- From an annotated database (a treebank)
- So for example, to get the probability for a
particular VP rule just count all the times the
rule is used and divide by the number of VPs
overall.
142TreeBanks
143Treebanks
144Treebanks
145Treebank Grammars
146Lots of flat rules
147Example sentences from those rules
- Total over 17,000 different grammar rules in the
1-million word Treebank corpus
148Probabilistic Grammar Assumptions
- Were assuming that there is a grammar to be used
to parse with. - Were assuming the existence of a large robust
dictionary with parts of speech - Were assuming the ability to parse (i.e. a
parser) - Given all that we can parse probabilistically
149Typical Approach
- Bottom-up (CKY) dynamic programming approach
- Assign probabilities to constituents as they are
completed and placed in the table - Use the max probability for each constituent
going up
150Whats that last bullet mean?
- Say were talking about a final part of a parse
- S-gt0NPiVPj
- The probability of the S is
- P(S-gtNP VP)P(NP)P(VP)
- The green stuff is already known. Were doing
bottom-up parsing
151Max
- I said the P(NP) is known.
- What if there are multiple NPs for the span of
text in question (0 to i)? - Take the max (where?)
152Problems with PCFGs
- The probability model were using is just based
on the rules in the derivation - Doesnt use the words in any real way
- Doesnt take into account where in the derivation
a rule is used
153Solution
- Add lexical dependencies to the scheme
- Infiltrate the predilections of particular words
into the probabilities in the derivation - I.e. Condition the rule probabilities on the
actual words
154Heads
- To do that were going to make use of the notion
of the head of a phrase - The head of an NP is its noun
- The head of a VP is its verb
- The head of a PP is its preposition
- (Its really more complicated than that but this
will do.)
155Example (right)
Attribute grammar
156Example (wrong)
157How?
- We used to have
- VP -gt V NP PP P(ruleVP)
- Thats the count of this rule divided by the
number of VPs in a treebank - Now we have
- VP(dumped)-gt V(dumped) NP(sacks)PP(in)
- P(rVP dumped is the verb sacks is the head
of the NP in is the head of the PP) - Not likely to have significant counts in any
treebank
158Declare Independence
- When stuck, exploit independence and collect the
statistics you can - Well focus on capturing two things
- Verb subcategorization
- Particular verbs have affinities for particular
VPs - Objects affinities for their predicates (mostly
their mothers and grandmothers) - Some objects fit better with some predicates than
others
159Subcategorization
- Condition particular VP rules on their head so
- r VP -gt V NP PP P(rVP)
- Becomes
- P(r VP dumped)
- Whats the count?
- How many times was this rule used with (head)
dump, divided by the number of VPs that dump
appears (as head) in total
160Preferences
- Subcat captures the affinity between VP heads
(verbs) and the VP rules they go with. - What about the affinity between VP heads and the
heads of the other daughters of the VP - Back to our examples
161Example (right)
162Example (wrong)
163Preferences
- The issue here is the attachment of the PP. So
the affinities we care about are the ones between
dumped and into vs. sacks and into. - So count the places where dumped is the head of a
constituent that has a PP daughter with into as
its head and normalize - Vs. the situation where sacks is a constituent
with into as the head of a PP daughter.
164Preferences (2)
- Consider the VPs
- Ate spaghetti with gusto
- Ate spaghetti with marinara
- The affinity of gusto for eat is much larger than
its affinity for spaghetti - On the other hand, the affinity of marinara for
spaghetti is much higher than its affinity for
ate
165Preferences (2)
- Note the relationship here is more distant and
doesnt involve a headword since gusto and
marinara arent the heads of the PPs.
Vp (ate)
Vp(ate)
Np(spag)
Vp(ate)
Pp(with)
Pp(with)
np
v
v
np
Ate spaghetti with marinara
Ate spaghetti with gusto
166Summary
- Context-Free Grammars
- Parsing
- Top Down, Bottom Up Metaphors
- Dynamic Programming Parsers CKY. Earley
- Disambiguation
- PCFG
- Probabilistic Augmentations to Parsers
- Treebanks