Title: PARSING
1PARSING
2Analyzing Linguistic Units
Task Formal Mechanism Formal Mechanism Formal Mechanism Resulting Representation
Morphology Analyze words into morphemes Context dependency rules FST composition Morphological structure
Phonology Analyze words into phonemes Context dependency rules FST composition Phonemic structure
Syntax Analyze sentences for syntactic relations between words Grammars CFGs PDA Top-down, Bottom-up, Earley, CKY parsing Parse tree, derivation tree
- Why should we parse a sentence?
- to detect relations among words
- used to normalize surface syntactic variations.
- invaluable for a number of NLP applications
3Some Concepts
- Grammar A generative device that prescribes a
set of valid strings. - Parser A device that uncovers the sequence of
grammar rules that might have generated the input
sentence. - Input Grammar, Sentence
- Output parse tree, derivation tree
- Recognizer A device that returns a yes if the
input string could be generated by the grammar. - Input Grammar, Sentence
- Output boolean
4Searching for a Parse
- Grammar rewrite procedure encodes
- all strings generated by the grammar L(G)
- all parse trees for each string (s) generated
T(G) UTs(G) - Given an input sentence (I), the set of parse
trees is TI (G). - Parsing is searching for TI (G) ? T(G)
- Ideally, parser finds the appropriate parse for
the sentence.
5CFG for Fragment of English
S
S ? NP VP VP ? V
S ? Aux NP VP PP -gt Prep NP
S ? VP N ? book flight meal money
NP ? Det Nom V ? book include prefer
NP ?PropN Aux ? does
Nom ? N Nom Prep ?from to on
Nom ? N PropN ? Houston TWA
Nom ? Nom PP Det ? that this a
VP ? V NP
VP
NP
V
Nom
Book
Det
that
N
flight
Bottom-up Parsing
Top-down Parsing
6Top-down/Bottom-up Parsing
Top-down (recursive decent parser) Bottom-up (shift-reduce parser)
Starts from S (goal) Words (input)
Algorithm (Parallel) a. Pick non-terminals b. Pick rules from the grammar to expand the non-terminals a. Match sequence of input symbols with the RHS of some rule b. Replace the sequence by the LHS of the matching rule
Termination Success When the leaves of a tree match the input Failure No more non-terminals to expand in any of the trees Success When S is reached Failure No more rewrites possible
Pros/Cons Pro Goal-driven, starts with S Con Constructs trees that may not match input Pro Constrained by the input string Con Constructs constituents that may not lead to the goal S
- Control strategy -- how to explore search space?
- Pursuing all parses in parallel or backtrack or
? - Which rule to apply next?
- Which node to expand next?
- Look at how the Top-down and Bottom-up parsing
works on the board for Book that flight
7Top-down, Depth First, Left-to-Right parser
- Systematic, incremental expansion of the search
space. - In contrast to a parallel parser
- Start State (S,0)
- End State (,n) n is the length of input to be
parsed - Next State Rules
- (wj1b,j) ? (b,j1)
- (Bb,j) ? (gb,j) if B?g (note B is left-most
non-terminal) - Agenda A data structure to keep track of the
states to be expanded. - Depth-first expansion, if Agenda is a stack.
8Fig 10.7
CFG
9Left Corners
- Can we help top-down parsers with some bottom-up
information? - Unnecessary states created if there are many B?g
rules. - If after successive expansions B ? w d and w
does not match the input, then the series of
expansion is wasted. - The leftmost symbol derivable from B needs to
match the input. - look ahead to left-corner of the tree
- B is a left-corner of A if A ? B g
- Build table with left-corners of all
non-terminals in grammar and consult before
applying rule - At a given point in state expansion (Bb,j)
- Pick the rule B ?C g if left-corner of C matches
the input wj1
10Limitation of Top-down Parsing Left Recursion
- Depth-first search will never terminate if
grammar is left recursive (e.g. NP --gt NP PP) - Solutions
- Rewrite the grammar to a weakly equivalent one
which is not left-recursive - NP ? NP PP
- NP ? Nom PP
- NP ? Nom
- This may make rules unnatural
- Fix depth of search explicitly
- Other book-keeping needed in top-down parsing
- Memoization for reusing previously parsed
substrings - Packed representation for parse ambiguity
NP ? Nom NP NP ? PP NP NP ? e
11Dynamic Programming for Parsing
- Memoization
- Create table of solutions to sub-problems (e.g.
subtrees) as parse proceeds - Look up subtrees for each constituent rather than
re-parsing - Since all parses implicitly stored, all available
for later disambiguation - Examples Cocke-Younger-Kasami (CYK) (1960),
Graham-Harrison-Ruzzo (GHR) (1980) and Earley
(1970) algorithms - Earley parser O(n3) parser
- Top-down parser with bottom-up information
- State i, A ? a b, j
- j is the position in the string that has been
parsed - i is the position in the string where A begins
- Top-down prediction S ? w1 wi A g
- Bottom-up completion a wj1 wn ? wi wn
12Earley Parser
- Data Structure An n1 cell array called Chart
- For each word position, chart contains set of
states representing all partial parse trees
generated to date. - E.g. chart0 contains all partial parse trees
generated at the beginning of the sentence - Chart entries represent three type of
constituents - predicted constituents (top-down predictions)
- in-progress constituents (were in the midst of
) - completed constituents (weve found )
- Progress in parse represented by Dotted Rules
- Position of indicates type of constituent
- 0 Book 1 that 2 flight 3
- (0,S ? VP, 0) (predicting VP)
- (1,NP ? Det Nom, 2) (finding NP)
- (0,VP ? V NP , 3) (found VP)
13Earley Parser Parse Success
- Final answer is found by looking at last entry in
chart - If entry resembles (0,S ? ? , n) then input
parsed successfully - But note that chart will also contain a record
of all possible parses of input string, given the
grammar -- not just the successful one(s) - Why is this useful?
14Earley Parsing Steps
- Start State (0, S ?S, 0)
- End State (0, S?a, n) n is the input size
- Next State Rules
- Scanner read input
- (i, A?awj1b, j) ? (i, A?awj1b, j1)
- Predictor add top-down predictions
- (i, A?aBb, j) ? (j, B?g, j) if B?g (note B is
left-most non-terminal) - Completer move dot to right when new constituent
found - (i, B?aAb, k) (k, A?g, j) ? (i, B?aAb, j)
- No backtracking and no states removed keep
complete history of parse - Why is this useful?
15Earley Parser Steps
Scanner Predictor Completer
When does it apply Applied when terminals are to the right of a dot (0, VP ? V NP, 0) Applied when non-terminals are to the right of a dot (0, S ? VP ,0) Applied when dot reaches the end of a rule (1, NP ? Det Nom , 3)
What chart cell is affected New states are added to the next cell New states are added to current cell New states are added to current cell
What contents in the chart cell Move the dot over the terminal (0, VP ? V NP, 1) One new state for each expansion of the non-terminal in the grammar (0, VP ? V, 0) (0, VP ? V NP, 0) One state for each rule waiting for the constituent such as (0, VP ? V NP, 1) (0, VP ? V NP , 3)
16Book that flight (Chart 0)
- Seed chart with top-down predictions for S from
grammar
17CFG for Fragment of English
Det ? that this a
N ? book flight meal money
V ? book include prefer
Aux ? does
Nom ? N
Nom ? N Nom
NP ?PropN
VP ? V
Nom ? Nom PP
VP ? V NP
PP ? Prep NP
18Chart1
V? book ? passed to Completer, which finds 2
states in Chart0 whose left corner is V and
adds them to Chart1, moving dots to right
19(No Transcript)
20Retrieving the parses
- Augment the Completer to add pointer to prior
states it advances as a field in the current
state - i.e. what states combined to arrive here?
- Read the pointers back from the final state
- What if the final cell does not have the final
state? Error handling. - Is it a total loss? No...
- Chart contains every constituent and combination
of constituents possible for the input given the
grammar - Useful for partial parsing or shallow parsing
used in information extraction
21Alternative Control Strategies
- Change Earley top-down strategy to bottom-up or
... - Change to best-first strategy based on the
probabilities of constituents - Compute and store probabilities of constituents
in the chart as you parse - Then instead of expanding states in fixed order,
allow probabilities to control order of expansion
22Probabilistic and Lexicalized Parsing
23Probabilistic CFGs
- Weighted CFGs
- Attach weights to rules of CFG
- Compute weights of derivations
- Use weights to pick, preferred parses
- Utility Pruning and ordering the search space,
disambiguate, Language Model for ASR. - Parsing with weighted grammars (like Weighted FA)
- T arg maxT W(T,S)
- Probabilistic CFGs are one form of weighted CFGs.
24Probability Model
- Rule Probability
- Attach probabilities to grammar rules
- Expansions for a given non-terminal sum to 1
- R1 VP ? V .55
- R2 VP ? V NP .40
- R3 VP ? V NP NP .05
- Estimate the probabilities from annotated corpora
P(R1)counts(R1)/counts(VP) - Derivation Probability
- Derivation T R1Rn
- Probability of a derivation
- Most likely probable parse
- Probability of a sentence
- Sum over all possible derivations for the
sentence - Note the independence assumption Parse
probability does not change based on where the
rule is expanded.
25Structural ambiguity
- S ? NP VP
- VP ? V NP
- NP ? NP PP
- VP ? VP PP
- PP ? P NP
- NP ? John Mary Denver
- V -gt called
- P -gt from
John called Mary from Denver
S
VP
NP
NP
V
NP
PP
called
John
Mary
P
NP
from
Denver
26Cocke-Younger-Kasami Parser
- Bottom-up parser with top-down filtering
- Start State(s) (A, i, i1) for each A?wi1
- End State (S, 0,n) n is the input size
- Next State Rules
- (B, i, k) (C, k, j) ? (A, i, j) if A?BC
27Example
John called Mary from Denver
28Base Case A?w
NP
P Denver
NP from
V Mary
NP called
John
29Recursive Cases A?BC
NP
P Denver
NP from
X V Mary
NP called
John
30NP
P Denver
VP NP from
X V Mary
NP called
John
31NP
X P Denver
VP NP from
X V Mary
NP called
John
32PP NP
X P Denver
VP NP from
X V Mary
NP called
John
33PP NP
X P Denver
S VP NP from
V Mary
NP called
John
34PP NP
X X P Denver
S VP NP from
X V Mary
NP called
John
35NP PP NP
X P Denver
S VP NP from
X V Mary
NP called
John
36NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
37VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
38VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
39VP1 VP2 NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
40S VP1 VP2 NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
41S VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
42Probabilistic CKY
- Assign probabilities to constituents as they are
completed and placed in the table - Computing the probability
- Since we are interested in the max P(S,0,n)
- Use the max probability for each constituent
- Maintain back-pointers to recover the parse.
43Problems with PCFGs
- The probability model were using is just based
on the rules in the derivation. - Lexical insensitivity
- Doesnt use the words in any real way
- Structural disambiguation is lexically driven
- PP attachment often depends on the verb, its
object, and the preposition - I ate pickles with a fork.
- I ate pickles with relish.
- Context insensitivity of the derivation
- Doesnt take into account where in the derivation
a rule is used - Pronouns more often subjects than objects
- She hates Mary.
- Mary hates her.
- Solution Lexicalization
- Add lexical information to each rule
44An example of lexical information Heads
- Make use of notion of the head of a phrase
- Head of an NP is a noun
- Head of a VP is the main verb
- Head of a PP is its preposition
- Each LHS of a rule in the PCFG has a lexical item
- Each RHS non-terminal has a lexical item.
- One of the lexical items is shared with the LHS.
- If R is the number of binary branching rules in
CFG, in lexicalized CFG O(2?R) - Unary rules O(?R)
45Example (correct parse)
Attribute grammar
46Example (less preferred)
47Computing Lexicalized Rule Probabilities
- We started with rule probabilities
- VP ? V NP PP P(ruleVP)
- E.g., count of this rule divided by the number of
VPs in a treebank - Now we want lexicalized probabilities
- VP(dumped) ? V(dumped) NP(sacks)PP(in)
- P(ruleVP dumped is the verb sacks is the
head of the NP in is the head of the PP) - Not likely to have significant counts in any
treebank
48Another Example
- Consider the VPs
- Ate spaghetti with gusto
- Ate spaghetti with marinara
- Dependency is not between mother-child.
Vp (ate)
Vp(ate)
Np(spag)
Vp(ate)
Pp(with)
Pp(with)
np
v
v
np
Ate spaghetti with marinara
Ate spaghetti with gusto
49Log-linear models for Parsing
- Why restrict to the conditioning to the elements
of a rule? - Use even larger context
- Word sequence, word types, sub-tree context etc.
- In general, compute P(yx) where fi(x,y) test
the properties of the context li is the weight
of that feature. - Use these as scores in the CKY algorithm to find
the best scoring parse.
50Supertagging Almost parsing
Poachers now control the
underground trade
S
S
VP
NP
S
NP
NP
V
VP
NP
e
N
NP
V
e
poachers
e
Adj
underground
51Summary
- Parsing context-free grammars
- Top-down and Bottom-up parsers
- Mixed approaches (CKY, Earley parsers)
- Preferences over parses using probabilities
- Parsing with PCFG and PCKY algorithms
- Enriching the probability model
- Lexicalization
- Log-linear models for parsing