Title: CPSC 503 Computational Linguistics
1CPSC 503Computational Linguistics
- Parsing
- Lecture 12
- Giuseppe Carenini
2Today 27/2
- Top-down (TD)
- Bottom-up (BU)
- Comparing TD and BU
- TD depth-first left-to-right
- Adding BU Filtering
- The Early Algorithm
3Parsing with CFGs
I prefer a morning flight
Parser
CFG
- Assign valid trees covers all and only the
elements of the input and has an S at the top
4Parsing as Search
- Search space of possible parse trees
- S -gt NP VP
- S -gt Aux NP VP
- NP -gt Det Noun
- VP -gt Verb
- Det -gt a
- Noun -gt flight
- Verb -gt left
- Aux -gt do, does
- Parsing find all trees that cover all and only
the words in the input
5Constraints on Search
I prefer a morning flight
Parser
CFG (search space)
- Search Strategies
- Top-down or goal-directed
- Bottom-up or data-directed
6Top-Down Parsing
- Since were trying to find trees rooted with an S
(Sentences) start with the rules that give us an
S. - Then work your way down from there to the words.
7Next step Top Down Space
- When POS categories are reached, reject trees
whose leaves fail to match all words in the input
8Bottom-Up Parsing
- Of course, we also want trees that cover the
input words. So start with trees that link up
with the words in the right way. - Then work your way up from there.
9Two more steps Bottom-Up Space
10Top-Down vs. Bottom-Up
- Top-down
- Only searches for trees that can be answers
- But suggests trees that are not consistent with
the words - Bottom-up
- Only forms trees consistent with the words
- Suggest trees that make no sense globally
11So Combine Them
- Top-down control strategy to generate trees
- Bottom-up to filter out inappropriate parses
- Top-down Control strategy
- Depth vs. Breadth first
- Which node to try to expand next
- Which grammar rule to use to expand a node
12Top-Down, Depth-First, Left-to-Right Search
Sample sentence Does this flight include a
meal?
13Example
Does this flight include a meal?
14Example
Does this flight include a meal?
flight
flight
15Example
Does this flight include a meal?
flight
flight
16Adding Bottom-up Filtering
- The following sequence was a waste of time
because an NP cannot generate a parse tree
starting with an AUX
Aux
Aux
Aux
Aux
17Bottom-Up Filtering
18Problems with TD-BU-filtering
- Left recursion
- Ambiguity
- Repeated Parsing
- SOLUTION Earley Algorithm
- (once again dynamic programming!)
19(1) Left-Recursion
- These rules appears in most English grammars
- S -gt S and S
- VP -gt VP PP
- NP -gt NP PP
20(2) Ambiguity
- I shot an elephant in my pajamas
21(3) Repeated Work
- Parsing is hard, and slow. Its wasteful to redo
stuff over and over and over. - Consider an attempt to top-down parse the
following as an NP - A flight from Indi to Houston on TWA
22- NP -gt Det Nom
- NP-gt NP PP
- Nom -gt Noun
-
flight
23- NP -gt Det Nom
- NP-gt NP PP
- Nom -gt Noun
flight
24flight
25 26 27Dynamic Programming
- Fills tables with solution to subproblems
Parsing sub-trees consistent with the input,
once discovered, are stored and can be reused
- Does not fall prey to left-recursion
- Stores ambiguous parse compactly
- Does not do (avoidable) repeated work
28Earley Parsing
- Fills a table in a single sweep over the input
words - Table is length N1 N is number of words
- Table entries represent
- Completed constituents and their locations
- In-progress constituents
- Predicted constituents
29States
- The table-entries are called states and are
represented with dotted-rules. - S -gt ? VP A VP is predicted
- NP -gt Det ? Nominal An NP is in progress
- VP -gt V NP ? A VP has been found
30States/Locations
- Each state has a location indicating the portion
of the input it applies to - S -gt ? VP 0,0 A VP is predicted at the
start of the sentence - NP -gt Det ? Nominal 1,2 An NP is in progress
the Det goes from 1 to 2 - VP -gt V NP ? 0,3 A VP has been found
starting at 0 and ending at 3
31Graphically
S -gt ? VP 0,0 NP -gt Det ? Nominal 1,2 VP
-gt V NP ? 0,3
32Earley answer
- As with most dynamic programming approaches, the
answer is found by looking in the table in the
right place. - In this case, the following state should be in
the final column
S gt ?? 0,n1
- i.e., an S state the that spans from 0 to n1 and
is complete.
33Earley processes
- So sweep through the table from 0 to n1
- New predicted states are created
- E.g., S -gt ? VP 0,0 gt VP -gt ? Verb 0,0
- New incomplete states are created by advancing
existing states as new constituents are
discovered - E.g., VP -gt ? Verb NP .. gt VP -gt Verb ? NP
.. - New complete states are created in the same way.
- E.g., VP -gt Verb ? NP .. gt VP -gt Verb NP ?..
34Example
Book that flight
- We should find an S from 0 to 3 that is a
completed state
35Example
Book that flight
36So far only a recognizer
- To generate all parses
- When old states waiting for the just completed
constituent are updated gt add a pointer from
each updated to completed - Then simply read off all the backpointers from
every complete S in the last column of the table
37Earley and Left Recursion
- So Earley solves the left-recursion problem
without having to alter the grammar or
artificially limiting the search. - Never place a state into the chart thats already
there - Copy states before advancing them
38Earley and Left Recursion 1
- S -gt NP VP
- NP -gt NP PP
- The first rule predicts
- S -gt ? NP VP 0,0 that adds
- NP -gt ? NP PP 0,0
- stops there since adding any subsequent
prediction would be fruitless
39Earley and Left Recursion 2
- When a state gets advanced make a copy and leave
the original alone - Say we have NP -gt ? NP PP 0,0
- We find an NP from 0 to 2 so we create
- NP -gt NP ? PP 0,2
- But we leave the original state as is
40Dynamic Programming Approaches
- Earley
- Top-down, no filtering, no restriction on grammar
form - CYK
- Bottom-up, no filtering, grammars restricted to
Chomsky-Normal Form (CNF)
41Next Time
- Read Chpt. 11 Features and Unification