Probabilistic and Lexicalized Parsing - PowerPoint PPT Presentation

About This Presentation

Title:

Probabilistic and Lexicalized Parsing

Description:

... of a PP is preposition Each PFCG rule s LHS shares a lexical item with a non-terminal in its RHS Increase in Size of Rule Set in Lexicalized CFG If R is the ... – PowerPoint PPT presentation

Number of Views:157

Avg rating:3.0/5.0

Slides: 38

Provided by: Sri646

Learn more at: http://www.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Probabilistic and Lexicalized Parsing

1
Probabilistic and Lexicalized Parsing

CS 4705

2
Probabilistic CFGs PCFGs

Weighted CFGs
Attach weights to rules of CFG
Compute weights of derivations
Use weights to choose preferred parses
Utility Pruning and ordering the search space,
disambiguate, Language Model for ASR
Parsing with weighted grammars find the parse T
which maximizes the weights of the derivations in
the parse tree for all the possible parses of S
T(S) argmaxT?t(S) W(T,S)
Probabilistic CFGs are one form of weighted CFGs

3
Rule Probability

Attach probabilities to grammar rules
Expansions for a given non-terminal sum to 1
R1 VP ? V .55
R2 VP ? V NP .40
R3 VP ? V NP NP .05
Estimate probabilities from annotated corpora
E.g. Penn Treebank
P(R1)counts(R1)/counts(VP)

4
Derivation Probability

For a derivation T R1Rn
Probability of the derivation
Product of probabilities of rules expanded in
tree
Most likely probable parse
Probability of a sentence
Sum over all possible derivations for the
sentence
Note the independence assumption Parse
probability does not change based on where the
rule is expanded.

5
One Approach CYK Parser

Bottom-up parsing via dynamic programming
Assign probabilities to constituents as they are
completed and placed in a table
Use the maximum probability for each constituent
type going up the tree to S
The Intuition
We know probabilities for constituents lower in
the tree, so as we construct higher level
constituents we dont need to recompute these

6
CYK (Cocke-Younger-Kasami) Parser

Bottom-up parser with top-down filtering
Uses dynamic programming to store intermediate
results (cf. Earley algorithm for top-down case)
Input PCFG in Chomsky Normal Form
Rules of form A?w or A?BC no e
Chart array i,j,A to hold probability that
non-terminal A spans input i-j
Start State(s) (i,i1,A) for each A?wi1
End State (1,n,S) where n is the input size
Next State Rules (i,k,B) (k,j,C) ? (i,j,A) if
A?BC
Maintain back-pointers to recover the parse

7
Structural Ambiguity

S ? NP VP
VP ? V NP
NP ? NP PP
VP ? VP PP
PP ? P NP

NP ? John Mary Denver
V -gt called
P -gt from

John called Mary from Denver
S
VP
NP
NP
V
NP
PP
called
John
Mary
P
NP
from
Denver
8
Example

John called Mary from Denver
9
Base Case A?w
NP
P Denver
NP from
V Mary
NP called
John
10
Recursive Cases A?BC
NP
P Denver
NP from
X V Mary
NP called
John
11
NP
P Denver
VP NP from
X V Mary
NP called
John
12
NP
X P Denver
VP NP from
X V Mary
NP called
John
13
PP NP
X P Denver
VP NP from
X V Mary
NP called
John
14
PP NP
X P Denver
S VP NP from
V Mary
NP called
John
15
PP NP
X X P Denver
S VP NP from
X V Mary
NP called
John
16
NP PP NP
X P Denver
S VP NP from
X V Mary
NP called
John
17
NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
18
VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
19
VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
20
VP1 VP2 NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
21
S VP1 VP2 NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
22
S VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
23
Problems with PCFGs

Probability model just based on rules in the
derivation.
Lexical insensitivity
Doesnt use words in any real way
But structural disambiguation is lexically driven
PP attachment often depends on the verb, its
object, and the preposition
I ate pickles with a fork.
I ate pickles with relish.
Context insensitivity of the derivation
Doesnt take into account where in the derivation
a rule is used
Pronouns more often subjects than objects
She hates Mary.
Mary hates her.
Solution Lexicalization
Add lexical information to each rule
I.e. Condition the rule probabilities on the
actual words

24
An example Phrasal Heads

Phrasal heads can take the place of whole
phrases, defining most important characteristics
of the phrase
Phrases generally identified by their heads
Head of an NP is a noun, of a VP is the main
verb, of a PP is preposition
Each PFCG rules LHS shares a lexical item with a
non-terminal in its RHS

25
Increase in Size of Rule Set in Lexicalized CFG

If R is the number of binary branching rules in
CFG and ? is the lexicon, O(2?R)
For unary rules O(?R)

26
Example (correct parse)
Attribute grammar
27
Example (less preferred)
28
Computing Lexicalized Rule Probabilities

We started with rule probabilities as before
VP ? V NP PP P(ruleVP)
E.g., count of this rule divided by the number of
VPs in a treebank
Now we want lexicalized probabilities
VP(dumped) ? V(dumped) NP(sacks) PP(into)
i.e., P(ruleVP dumped is the verb sacks is
the head of the NP into is the head of the PP)
Not likely to have significant counts in any
treebank

29
Exploit the Data You Have

So, exploit the independence assumption and
collect the statistics you can
Focus on capturing
Verb subcategorization
Particular verbs have affinities for particular
VPs
Objects affinity for their predicates
Mostly their mothers and grandmothers
Some objects fit better with some predicates than
others

30
Verb Subcategorization

Condition particular VP rules on their heads
E.g. for a rule r VP -gt V NP PP
P(rVP) becomes P(r Vdumped VP dumped)
How do you get the probability?
How many times was rule r used with dumped,
divided by the number of VPs that dumped appears
in, in total
How predictive of r is the verb dumped?
Captures affinity between VP heads (verbs) and VP
rules

31
Example (correct parse)
32
Example (less preferred)
33
Affinity of Phrasal Heads for Other Heads PP
Attachment

Verbs with preps vs. Nouns with preps
E.g. dumped with into vs. sacks with into
How often is dumped the head of a VP which
includes a PP daughter with into as its head
relative to other PP heads or whats
P(intoPP,dumped is mother VPs head))
Vshow often is sacks the head of an NP with a PP
daughter whose head is into relative to other PP
heads or P(intoPP,sacks is mothers head))

34
But Other Relationships do Not Involve Heads
(Hindle Rooth 91)

Affinity of gusto for eat is greater than for
spaghetti and affinity of marinara for spaghetti
is greater than for ate

Vp (ate)
Vp(ate)
Np(spag)
Vp(ate)
Pp(with)
Pp(with)
np
v
v
np
Ate spaghetti with marinara
Ate spaghetti with gusto
35
Log-linear models for Parsing

Why restrict to the conditioning to the elements
of a rule?
Use even larger contextword sequence, word
types, sub-tree context etc.
Compute P(yx) where fi(x,y) tests properties of
context and li is weight of feature
Use as scores in CKY algorithm to find best parse

36
Supertagging Almost parsing
Poachers now control the underground trade
S
S
VP
NP
S
NP
NP
V
VP
NP
e
N
NP
V
e
poachers

e
Adj

underground
37
Summary