Probabilistic and Lexicalized Parsing - PowerPoint PPT Presentation

About This Presentation
Title:

Probabilistic and Lexicalized Parsing

Description:

... of a PP is preposition Each PFCG rule s LHS shares a lexical item with a non-terminal in its RHS Increase in Size of Rule Set in Lexicalized CFG If R is the ... – PowerPoint PPT presentation

Number of Views:152
Avg rating:3.0/5.0
Slides: 38
Provided by: Sri646
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic and Lexicalized Parsing


1
Probabilistic and Lexicalized Parsing
  • CS 4705

2
Probabilistic CFGs PCFGs
  • Weighted CFGs
  • Attach weights to rules of CFG
  • Compute weights of derivations
  • Use weights to choose preferred parses
  • Utility Pruning and ordering the search space,
    disambiguate, Language Model for ASR
  • Parsing with weighted grammars find the parse T
    which maximizes the weights of the derivations in
    the parse tree for all the possible parses of S
  • T(S) argmaxT?t(S) W(T,S)
  • Probabilistic CFGs are one form of weighted CFGs

3
Rule Probability
  • Attach probabilities to grammar rules
  • Expansions for a given non-terminal sum to 1
  • R1 VP ? V .55
  • R2 VP ? V NP .40
  • R3 VP ? V NP NP .05
  • Estimate probabilities from annotated corpora
  • E.g. Penn Treebank
  • P(R1)counts(R1)/counts(VP)

4
Derivation Probability
  • For a derivation T R1Rn
  • Probability of the derivation
  • Product of probabilities of rules expanded in
    tree
  • Most likely probable parse
  • Probability of a sentence
  • Sum over all possible derivations for the
    sentence
  • Note the independence assumption Parse
    probability does not change based on where the
    rule is expanded.

5
One Approach CYK Parser
  • Bottom-up parsing via dynamic programming
  • Assign probabilities to constituents as they are
    completed and placed in a table
  • Use the maximum probability for each constituent
    type going up the tree to S
  • The Intuition
  • We know probabilities for constituents lower in
    the tree, so as we construct higher level
    constituents we dont need to recompute these

6
CYK (Cocke-Younger-Kasami) Parser
  • Bottom-up parser with top-down filtering
  • Uses dynamic programming to store intermediate
    results (cf. Earley algorithm for top-down case)
  • Input PCFG in Chomsky Normal Form
  • Rules of form A?w or A?BC no e
  • Chart array i,j,A to hold probability that
    non-terminal A spans input i-j
  • Start State(s) (i,i1,A) for each A?wi1
  • End State (1,n,S) where n is the input size
  • Next State Rules (i,k,B) (k,j,C) ? (i,j,A) if
    A?BC
  • Maintain back-pointers to recover the parse

7
Structural Ambiguity
  • S ? NP VP
  • VP ? V NP
  • NP ? NP PP
  • VP ? VP PP
  • PP ? P NP
  • NP ? John Mary Denver
  • V -gt called
  • P -gt from

John called Mary from Denver
S
VP
NP
NP
V
NP
PP
called
John
Mary
P
NP
from
Denver
8
Example





John called Mary from Denver
9
Base Case A?w
NP
P Denver
NP from
V Mary
NP called
John
10
Recursive Cases A?BC
NP
P Denver
NP from
X V Mary
NP called
John
11
NP
P Denver
VP NP from
X V Mary
NP called
John
12
NP
X P Denver
VP NP from
X V Mary
NP called
John
13
PP NP
X P Denver
VP NP from
X V Mary
NP called
John
14
PP NP
X P Denver
S VP NP from
V Mary
NP called
John
15
PP NP
X X P Denver
S VP NP from
X V Mary
NP called
John
16
NP PP NP
X P Denver
S VP NP from
X V Mary
NP called
John
17
NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
18
VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
19
VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
20
VP1 VP2 NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
21
S VP1 VP2 NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
22
S VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
23
Problems with PCFGs
  • Probability model just based on rules in the
    derivation.
  • Lexical insensitivity
  • Doesnt use words in any real way
  • But structural disambiguation is lexically driven
  • PP attachment often depends on the verb, its
    object, and the preposition
  • I ate pickles with a fork.
  • I ate pickles with relish.
  • Context insensitivity of the derivation
  • Doesnt take into account where in the derivation
    a rule is used
  • Pronouns more often subjects than objects
  • She hates Mary.
  • Mary hates her.
  • Solution Lexicalization
  • Add lexical information to each rule
  • I.e. Condition the rule probabilities on the
    actual words

24
An example Phrasal Heads
  • Phrasal heads can take the place of whole
    phrases, defining most important characteristics
    of the phrase
  • Phrases generally identified by their heads
  • Head of an NP is a noun, of a VP is the main
    verb, of a PP is preposition
  • Each PFCG rules LHS shares a lexical item with a
    non-terminal in its RHS

25
Increase in Size of Rule Set in Lexicalized CFG
  • If R is the number of binary branching rules in
    CFG and ? is the lexicon, O(2?R)
  • For unary rules O(?R)

26
Example (correct parse)
Attribute grammar
27
Example (less preferred)
28
Computing Lexicalized Rule Probabilities
  • We started with rule probabilities as before
  • VP ? V NP PP P(ruleVP)
  • E.g., count of this rule divided by the number of
    VPs in a treebank
  • Now we want lexicalized probabilities
  • VP(dumped) ? V(dumped) NP(sacks) PP(into)
  • i.e., P(ruleVP dumped is the verb sacks is
    the head of the NP into is the head of the PP)
  • Not likely to have significant counts in any
    treebank

29
Exploit the Data You Have
  • So, exploit the independence assumption and
    collect the statistics you can
  • Focus on capturing
  • Verb subcategorization
  • Particular verbs have affinities for particular
    VPs
  • Objects affinity for their predicates
  • Mostly their mothers and grandmothers
  • Some objects fit better with some predicates than
    others

30
Verb Subcategorization
  • Condition particular VP rules on their heads
  • E.g. for a rule r VP -gt V NP PP
  • P(rVP) becomes P(r Vdumped VP dumped)
  • How do you get the probability?
  • How many times was rule r used with dumped,
    divided by the number of VPs that dumped appears
    in, in total
  • How predictive of r is the verb dumped?
  • Captures affinity between VP heads (verbs) and VP
    rules

31
Example (correct parse)
32
Example (less preferred)
33
Affinity of Phrasal Heads for Other Heads PP
Attachment
  • Verbs with preps vs. Nouns with preps
  • E.g. dumped with into vs. sacks with into
  • How often is dumped the head of a VP which
    includes a PP daughter with into as its head
    relative to other PP heads or whats
    P(intoPP,dumped is mother VPs head))
  • Vshow often is sacks the head of an NP with a PP
    daughter whose head is into relative to other PP
    heads or P(intoPP,sacks is mothers head))

34
But Other Relationships do Not Involve Heads
(Hindle Rooth 91)
  • Affinity of gusto for eat is greater than for
    spaghetti and affinity of marinara for spaghetti
    is greater than for ate

Vp (ate)
Vp(ate)
Np(spag)
Vp(ate)
Pp(with)
Pp(with)
np
v
v
np
Ate spaghetti with marinara
Ate spaghetti with gusto
35
Log-linear models for Parsing
  • Why restrict to the conditioning to the elements
    of a rule?
  • Use even larger contextword sequence, word
    types, sub-tree context etc.
  • Compute P(yx) where fi(x,y) tests properties of
    context and li is weight of feature
  • Use as scores in CKY algorithm to find best parse

36
Supertagging Almost parsing
Poachers now control the underground trade
S
S
VP
NP
S
NP
NP
V
VP
NP
e
N
NP
V
e
poachers

e
Adj


underground
37
Summary
  • Parsing context-free grammars
  • Top-down and Bottom-up parsers
  • Mixed approaches (CKY, Earley parsers)
  • Preferences over parses using probabilities
  • Parsing with PCFG and PCKY algorithms
  • Enriching the probability model
  • Lexicalization
  • Log-linear models for parsing
  • Super-tagging
Write a Comment
User Comments (0)
About PowerShow.com