Title: ContextFree Parsing
1Context-Free Parsing
2Basic issues
- Top-down vs. bottom-up
- Handling ambiguity
- Lexical ambiguity
- Structural ambiguity
- Breadth first vs. depth first
- Handling recursive rules
- Handling empty rules
3Some terminology
- Rules written A ? B c
- Terminal vs. non-terminal symbols
- Left-hand side (head) always non-terminal
- Right-hand side (body) can be mix of terminal
and non-terminal, any number of them - Unique start symbol (usually S)
- ? rewrites as, but is not directional (an
sign would be better)
41. Top-down with simple grammar
S ? NP VP NP ? det n VP ? v VP ? v NP
the man shot an elephant
S ? NP VP
Lexicon det ? an, the n ? elephant, man v ?
shot
NP ? det n
det ? an, the
n ? elephant, man
VP ? v VP ? v NP
v ? shot
No more rules, but input is not completely
accounted for So we must backtrack, and try the
other VP rule
5Lexicon det ? an, the n ? elephant, man v ?
shot
v ? shot
NP ? det n
det ? an, the
n ? elephant, man
No more rules, and input is completely accounted
for
6Breadth-first vs depth-first (1)
- When we came to the VP rule we were faced with a
choice of two rules - Depth-first means following the first choice
through to the end - Breadth-first means keeping all your options
open - Well see this distinction more clearly later,
- And also see that it is quite significant
72. Bottom-up with simple grammar
S ? NP VP NP ? det n VP ? v VP ? v NP
det ? an, the n ? elephant, man v ? shot
Lexicon det ? an, the n ? elephant, man v ?
shot
NP ? det n
VP ? v
VP ? v NP
S ? NP VP
S ? NP VP
the man shot an elephant
Weve reached the top, but input is not
completely accounted for So we must backtrack,
and try the other VP rule
Weve reached the top, and input is completely
accounted for
8Same again but with lexical ambiguity
S ? NP VP NP ? det n VP ? v VP ? v NP
Lexicon det ? an, the n ? elephant, man, shot
v ? shot
shot can be v or n
93. Top-down with lexical ambiguity
S ? NP VP NP ? det n VP ? v VP ? v NP
the man shot an elephant
S ? NP VP
NP ? det n
Lexicon det ? an, the n ? elephant, man, shot
v ? shot
det ? an, the
n ? elephant, man
VP ? v VP ? v NP
Same as before at this point, we are looking for
a v, and shot fits the bill the n reading never
comes into play
104. Bottom-up with lexical ambiguity
S ? NP VP NP ? det n VP ? v VP ? v NP
det ? an, the n ? elephant, man, shot v ?
shot
NP ? det n
Lexicon det ? an, the n ? elephant, man, shot
v ? shot
VP ? v
VP ? v NP
S ? NP VP
Terminology graph nodes arcs (edges)
the man shot an
elephant
114. Bottom-up with lexical ambiguity
S ? NP VP NP ? det n VP ? v VP ? v NP
Lets get rid of all the unused arcs
Lexicon det ? an, the n ? elephant, man, shot
v ? shot
det n
det
v
n
the man shot an
elephant
124. Bottom-up with lexical ambiguity
S ? NP VP NP ? det n VP ? v VP ? v NP
Lets get rid of all the unused arcs
Lexicon det ? an, the n ? elephant, man, shot
v ? shot
det n
det
v
n
the man shot an
elephant
134. Bottom-up with lexical ambiguity
S ? NP VP NP ? det n VP ? v VP ? v NP
And lets clear away all the arcs
Lexicon det ? an, the n ? elephant, man, shot
v ? shot
det n
det
v
n
the man shot an
elephant
144. Bottom-up with lexical ambiguity
S ? NP VP NP ? det n VP ? v VP ? v NP
And lets clear away all the arcs
Lexicon det ? an, the n ? elephant, man, shot
v ? shot
det n
det
v
n
the man shot an
elephant
15Breadth-first vs depth-first (2)
- In chart parsing, the distinction is more clear
cut - At any point there may be a choice of things to
do which arcs to develop - Breadth-first vs. depth-first can be seen as what
order they are done in - Queue (FIFO breadth-first) vs. stack (LIFO
depth-first)
16Same again but with structural ambiguity
in his pyjamas
the man shot an elephant
S ? NP VP NP ? det n NP ? det n PP VP ? v VP ? v
NP VP ? v NP PP PP ? prep NP
Lexicon det ? an, the, his n ? elephant, man,
shot, pyjamas v ? shot prep ? in
We introduce a PP rule in two places
17Same again but with structural ambiguity
in his pyjamas
the man shot an elephant
S ? NP VP NP ? det n NP ? det n PP VP ? v VP ? v
NP VP ? v NP PP PP ? prep NP
Lexicon det ? an, the, his n ? elephant, man,
shot, pyjamas v ? shot prep ? in
We introduce a PP rule in two places
185. Top-down with structural ambiguity
the man shot an elephant in his pyjamas
S ? NP VP NP ? det n NP ? det n PP VP ? v VP ? v
NP VP ? v NP PP PP ? prep NP
S ? NP VP
NP ? det n NP ? det n PP
det ? an, the, his
n ? elephant, man, shot, pyjamas
PP ? prep NP prep ? in
At this point, depending on our strategy
(breadth-first vs. depth-first) we may consider
the NP complete and look for the VP, or we may
try the second NP rule. Lets see what happens in
the latter case.
The next word, shot, isnt a prep, So this rule
simply fails
195. Top-down with structural ambiguity
the man shot an elephant in his pyjamas
S ? NP VP NP ? det n NP ? det n PP VP ? v VP ? v
NP VP ? v NP PP PP ? prep NP
S ? NP VP
NP ? det n NP ? det n PP
det ? an, the, his
n ? elephant, man, shot, pyjamas
VP ? v VP ? v NP VP ? v NP PP
v ? shot
NP ? det n NP ? det n PP
det ? an, the, his
As before, the first VP rule works, But does not
account for all the input.
n ? elephant, man, shot, pyjamas
Similarly, if we try the second VP rule, and the
first NP rule
205. Top-down with structural ambiguity
the man shot an elephant in his pyjamas
S ? NP VP NP ? det n NP ? det n PP VP ? v VP ? v
NP VP ? v NP PP PP ? prep NP
S ? NP VP
NP ? det n NP ? det n PP
det ? an, the, his
n ? elephant, man, shot, pyjamas
Depth-first its a stack, LIFO
VP ? v VP ? v NP VP ? v NP PP
Breadth-first its a queue, FIFO
v ? shot
NP ? det n NP ? det n PP
So what do we try next? This?
Or this?
215. Top-down with structural ambiguity
(depth-first)
the man shot an elephant in his pyjamas
S ? NP VP NP ? det n NP ? det n PP VP ? v VP ? v
NP VP ? v NP PP PP ? prep NP
S ? NP VP
NP ? det n NP ? det n PP
det ? an, the, his
n ? elephant, man, shot, pyjamas
VP ? v VP ? v NP VP ? v NP PP
v ? shot
NP ? det n NP ? det n PP
det ? an, the, his
n ? elephant, man, shot, pyjamas
PP ? prep NP prep ? in
225. Top-down with structural ambiguity
(breadth-first)
the man shot an elephant in his pyjamas
S ? NP VP NP ? det n NP ? det n PP VP ? v VP ? v
NP VP ? v NP PP PP ? prep NP
S ? NP VP
NP ? det n NP ? det n PP
det ? an, the, his
n ? elephant, man, shot, pyjamas
VP ? v VP ? v NP VP ? v NP PP
v ? shot
NP ? det n NP ? det n PP
det ? an, the, his
n ? elephant, man, shot, pyjamas
PP ? prep NP prep ? in
23Recognizing ambiguity
- Notice how the choice of strategy determines
which result we get (first). - In both strategies, there are often rules left
untried, on the list (whether queue or stack). - If we want to know if our input is ambiguous, at
some time we do have to follow these through. - As you will see later, trying out alternative
paths can be quite intensive
24S ? NP VP NP ? det n NP ? det n PP VP ? v VP ? v
NP VP ? v NP PP PP ? prep NP
6. Bottom-up with structural ambiguity
VP ? v
NP ? det n
PP ? prep NP
NP ? det n PP
VP ? v NP
VP ? v NP PP
S ? NP VP
the man shot an
elephant in his pyjamas
25S ? NP VP NP ? det n NP ? det n PP VP ? v VP ? v
NP VP ? v NP PP PP ? prep NP
6. Bottom-up with structural ambiguity
S
VP
S
VP
NP
VP
PP
det
n
v
det
n
prep
det
n
the man shot an
elephant in his pyjamas
26Recursive rules
- Recursive rules call themselves
- We already have some recursive rule pairs
- NP ? det n PP
- PP ? prep NP
- Rules can be immediately recursive
- AdjG ? adj AdjG
- (the) big fat ugly (man)
27Recursive rules
Left recursive AdjG ? AdjG adj AdjG ? adj
Right recursive AdjG ? adj AdjG AdjG ? adj
adj
adj
adj
big fat rich old
big fat rich old
287. Top-down with left recursion
NP ? det n NP ? det AdjG n AdjG ? AdjG adj AdjG ?
adj
the big fat rich old man
NP ? det n NP ? det AdjG n
AdjG ? AdjG adj AdjG ? adj
You cant have left-recursive rules with a
top-down parser, even if the non-recursive rule
is first
297. Top-down with right recursion
NP ? det n NP ? det AdjG n AdjG ? adj AdjG AdjG ?
adj
the big fat rich old man
old
NP ? det n NP ? det AdjG n
AdjG ? adj AdjG AdjG ? adj
308. Bottom-up with left and right recursion
NP ? det n NP ? det AdjG n AdjG ? AdvG adj
AdjG AdjG ? adj AdvG ? AdvG adv AdvG ? adv
AdjG ? adj
AdvG ? adv
NP
AdjG ? AdvG adj AdjG
AdvG ? AdvG adv
AdjG ? AdvG adj AdjG
AdjG
NP ? det AdjG n
AdjG rule is right recursive, AdvG rule is left
recursive
AdvG
AdjG
Quite a few useless paths, but overall no
difficulty
the very very fat ugly man
318. Bottom-up with left and right recursion
NP ? det n NP ? det AdjG n AdjG ? AdvG adj
AdjG AdjG ? adj AdvG ? AdvG adv AdvG ? adv
AdjG ? adj
AdvG ? adv
NP
AdjG ? AdvG adj AdjG
AdvG ? AdvG adv
AdjG ? AdvG adj AdjG
AdjG
NP ? det AdjG n
AdjG rule is right recursive, AdvG rule is left
recursive
AdvG
AdjG
AdvG
det
adv
adv
adj
adj
n
the very very fat ugly man
32Empty rules
- For example
- NP ? det AdjG n
- AdjG ? adj AdjG
- AdjG ? e
- Equivalent to
- NP ? det AdjG n
- NP ? det n
- AdjG ? adj
- AdjG ? adj AdjG
- Or
- NP ? det (AdjG) n
- AdjG ? adj (AdjG)
337. Top-down with empty rules
NP ? det AdjG n AdjG ?adj AdjG AdjG ? e
the man
the big fat man
NP ? det AdjG n
NP ? det AdjG n
AdjG ? adj AdjG AdjgG ? e
AdjG ? adj AdjG AdjgG ? e
348. Bottom-up with empty rules
NP ? det AdjG n AdjG ?adj AdjG AdjG ? e
AdjG ? e
AdjG ? adj AdjG
NP ? det AdjG n
NP
Lots of useless paths, especially in a long
sentence, but otherwise no difficulty
AdjG
det
adj
n
the fat man
35Top down vs. bottom-up
- Bottom-up builds many useless trees
- Top-down can propose false trails, sometimes
quite long, which are only abandoned when they
reach the word level - Especially a problem if breadth-first
- Bottom-up very inefficient with empty rules
- Top-down CANNOT handle left-recursion
- Top-down cannot do partial parsing
- Especially useful for speech
- Wouldnt it be nice to combine them to get the
advantages of both?
36Left-corner parsing
- The left corner of a rule is the first symbol
after the rewrite arrow - e.g. in S ? NP VP, the left corner is NP.
- Left corner parsing starts bottom-up, taking the
first item off the input and finding a rule for
which it is the left corner. - This provides a top-down prediction, but we
continue working bottom-up until the prediciton
is fulfilled. - When a rule is completed, apply the left-corner
principle is that completed constituent a
left-corner?
379. Left-corner with simple grammar
S ? NP VP NP ? det n VP ? v VP ? v NP
the man shot an elephant
NP ? det n
S ? NP VP
VP ? v
but text not all accounted for, so try VP ? v NP
NP ? det n