Chapter 9: Parsing with Context-Free Grammars

About This Presentation

Title:

Chapter 9: Parsing with Context-Free Grammars

Description:

Note that the regular grammars are a proper subset of the context-free grammars. ... a node (e.g., S), and then apply that same rule again, and again, ad infinitum. ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 54

Provided by: Inderje9

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 9: Parsing with Context-Free Grammars

1
Chapter 9 Parsing with Context-Free Grammars

Heshaam Faili
hfaili_at_ece.ut.ac.ir
University of Tehran

2
Context-Free Grammars

Context-Free Grammars are of the form
A ? ?, where ? is a string of terminals and/or
non-terminals
Note that the regular grammars are a proper
subset of the context-free grammars.
This means that every regular grammar is
context-free, but there are context-free grammars
that arent regular
CFGs only specify what trees look like, not how
they should be computationally derived ? We need
to parse the sentences

3
Parsing Intro

Input a string
Output a (single) parse tree
A useful step in the process of obtaining meaning
We can view the problem as searching through all
possible parses (tree structures) to find the
right one
Strategies
Top-Down (goal-directed) vs. Bottom-Up
(data-directed)
Breadth-First vs. Depth-First
Adding Bottom-Up to Top-Down Left-Corner Parsing
Example
Book that flight!

4
Grammar and Desired Tree
5
Top-Down Parsing

Expand rules, starting with S and working down to
leaves
Replace the left-most non-terminal with each of
its possible expansions.
The full search is on p. 361, Fig. 10.3
While we guarantee that any parse in progress
will be S-rooted, it will expand non-terminals
that cant lead to the existing input
e.g., 5 of 6 trees in third ply level of the
search space
None of the trees take the properties of the
lexical items into account until the last stage

6
Top-down (breadth-first) parsing
S
7
Expansion techniques

Breadth-First Expansion (shown in figure)
All the nodes at each level are expanded once
before going to the next (lower) level.
This is memory intensive when many grammar rules
are involved
Depth-First (shown on p. 367, Fig. 10.7)
Expand a particular node at a level, only
considering an alternate node at that level if
the parser fails as a result of the earlier
expansion
i.e., expand the tree all the way down until you
cant expand any more

8
Top-down (depth-first) parsing
Does this flight include a meal ?
9
Top-Down Depth-First Parsing

There are still some choices that have to be
made
1. Which leaf node should be expanded first?
Left-to-right strategy moves through the leaf
nodes in a left-to-right fashion
2. Which rule should be applied first?
There are multiple NP rules which should be used
first?
Can just use the textual order of rules from the
grammar
There may be reasons to take rules in a
particular order (e.g., probabilities)

10
Parsing with an agenda

Search states are kept in an agenda
Search states consist of partial trees and a
pointer to the next input word in the sentence
Based on what weve seen before, apply the next
item on the agenda to the current tree
Add new items to (the front of) the agenda, based
on the rules in the grammar which can expand at
the (leftmost) node
We maintain the depth-first strategy by adding
new hypotheses (rules) to the front of the agenda
If we added them to the back, we would have a
breadth-first strategy
See figure 10.6 pg. 366E

11
Bottom-Up Parsing

Bottom-Up Parsing is input-driven ? start from
the words and move up to form a tree
Here we match one or more nodes on the upper
fringe of the parse tree against the right-hand
side of a CFG rule, building the left-hand side
as a parent node of those nodes.
We can also have breadth-first and depth-first
approaches
The example on the next slide (p. 362, Fig. 10.4)
moves in a breadth-first fashion
While any parse in progress will be tied to the
input, many may not lead to an S!
e.g., left-most trees in plies 1-4 of Fig 10.4

12
Bottom-up parsing
13
Comparing Top-Down and Bottom-Up Parsing

Top-Down
While we guarantee that any parse in progress
will be S-rooted, it will expand non-terminals
that cant lead to the existing input, e.g.,
first 4 trees in third ply.
Bottom-Up
While any parse in progress will be tied to the
input, many may not lead to an S, e.g., left-most
trees in plies 1-4 of p. 362, Fig 10.4.
So, both pure top-down and pure bottom up
approaches are highly inefficient.

14
Left-Corner Parsing

Motivation
Both pure top-down and bottom-up approaches are
inefficient
The correct top-down parse has to be consistent
with the left-most word of the input
Left-corner parsing a way of using bottom-up
constraints as part of a top-down strategy.
Left-corner rule expand a node with a grammar
rule only if the current input can serve as the
left corner from this rule.
Left-corner from a rule first word along the
left edge of a derivation from the rule
Put the left-corners into a table, which can then
guide parsing

15
Left-Corner Example

S? NP VP
S? VP
S? Aux NP VP
NP? Det Nominal ProperNoun
Nominal ? Noun Nominal Noun
VP? Verb Verb NP
Noun ? book flight meal money
Verb ? book include prefer
Aux ? does
ProperNoun ? Houston TWA

Left Corners
S gt NP gt Det, ProperNoun
VP gt Verb
Aux gt Aux
NP gt Det, ProperNoun
VP gt Verb
Nominal gt Noun

16
Other problems Left-Recursion

Left-corner parsers still guided by top-down
parsing
Consider rules like
S ? S and S
NP ? NP PP
A top-down left-to-right depth-first parser could
apply a rule to expand a node (e.g., S), and then
apply that same rule again, and again, ad
infinitum.
Left Recursion A grammar is left-recursive if a
non-terminal leads to a derivation that includes
itself as its leftmost immediate or non-immediate
child (i.e., along its leftmost branch).
PROBLEM Top-Down parsers may not terminate on a
left-recursive grammar

17
Other problems Repeated Parsing of Subtrees

When parser backtracks to an alternative
expansion of a non-terminal, it loses all parses
of subconstituents that it built.
There is a good chance that it will rebuild the
parses of some of those constituents again.
This can occur repeatedly.
a flight from Indianopolis to Houston on TWA
NP ? Det Nom
Will build an NP for a flight, before failing
when the parser realizes the input PPs arent
covered
NP ? NP PP
Will again build an NP for a flight, before
failing when the parser realizes the two
remaining PPs in the input arent covered

18
Other problems Ambiguity

Repeated parsing of subtrees is even more of a
problem for ambiguous sentences
PP attachment
NP or VP I shot an elephant in my pajamas.
NP bracketing the meal on flight 286 from SF
to Denver
Coordination
old men and women vs. old men and women
3 kinds of ambiguities attachment, coordination,
noun-phrase bracketing.
Parsers have to disambiguate between lots of
valid parses or return all parses
Will repeat a lot of work parsing the
commonalities of each ambiguity

19
Ambiguity
20
Addressing the problems Chart Parsing

More or less a standard method for carrying out
parsing keeps tables of constituents that have
been parsed earlier, so it doesnt reduplicate
the work.
Each possible sub-tree is represented exactly
once.
This makes it a form of dynamic programming
(which we saw with minimum edit distance and the
Viterbi algorithm)
Combines bottom-up and top-down parsing
Rather simple and elegant in the way it works!

21
Earley Chart Parsing Representation

The parser uses a representation for parse state
based on dotted rules. S ? NP ? VP
Dotted rules distinguish what has been seen so
far from what has not been seen (i.e., the
remainder).
The constituents seen so far are to the left of
the dot in the rule, the remainder is to the
right.
Parse information is stored in a chart,
represented as a graph.
The nodes represent word positions.
The labels represent the portion (using the dot
notation) of the grammar rule that spans that
word position.
? In other words, at each position, there is a
set of labels (each of which is a dotted rule,
also called a state), indicating the partial
parse tree produced until then.

22
Example Chart for A Dog

Given a trivial grammar
NP ? D N
D ? a
N ? dog
Heres the chart for the complete parse of a
dog
0 D ? a? 1 (scan)
1 N? dog? 2 (scan)
0 NP ? ?D N 0 (predict)
0 NP ? D ? N 1 (complete)
0 NP ? D N ? 2 (complete)

23
More Chart Parsing Terminology

A state is complete if it has a dot at the
right-hand side of its rule. Otherwise, it is
incomplete.
At each position, there is a list (actually, a
queue) of states.
The parser moves through the N1 sets of states
in the chart left-to-right, processing the states
in each set in order.
States will be stored in a FIFO (first-in
first-out) queue at each start position
The processing applies one of three operators,
each of which takes a state and produces new
states added to the chart.
Scanner, Predictor, Completer
There is no backtracking.

24
Earley Parsing Algorithm

The parsing algorithm is just a few lines long,
as can be seen on p. 381, Figure 10.16
In the top level loop, for each position, for
each state, it calls the predictor, or else the
scanner, or else the completer.
The algorithm never backtracks and never removes
states, so we dont redo any work
The goal is to have S ? a as the last chart
entry, i.e. the dot has moved over the entire
input to derive an S

25
The Earley Algorithm
26
The 3 Operators Predictor, Scanner, Completer

Procedure PREDICTOR((A???B?, i, j))
For each (B??) in grammar do
Enqueue((B ? ??, j, j), chartj)
End
Procedure SCANNER ((A???B?, i, j))
If B is a part-of-speech for wordj then
Enqueue((B ? wordj?, j, j1), chartj1)
Procedure COMPLETER((B???, j, k))
For each (A???B?, i, j) in chartj do
Enqueue((A ??B??, i, k), chartk)
End

27
Prediction

Procedure PREDICTOR((A???B?, i, j))
For each (B??) in grammar do
Enqueue((B ? ??, j, j), chartj)
End
Predicting is the task of saying we kinds of
input we expect to see
Add a rule to the chart saying that we have not
seen ?, but when we do, it will form a B
The rule covers no input, so it goes from j to j
Such rules provide the top-down aspect of the
algorithm

28
Scanning

Procedure SCANNER ((A???B?, i, j))
If B is a part-of-speech for wordj then
Enqueue((B ? wordj?, j, j1), chartj1)
Scanning reads in lexical items
We add a dotted rule indicating that a word has
been seen between j and j1
This is then added to the following (j1) chart
Such a completed dotted rule can be used to
complete other dotted rules
These rules also show how the Earley parser has a
bottom-up component

29
Completion

Procedure COMPLETER((B???, j, k))
For each (A???B?, i, j) in chartj do
Enqueue((A ??B??, i, k), chartk)
End
Completion combines two rules in order to move
the dot, i.e., indicate that something has been
seen
A rule covering B has been seen, so any rule A
which refers to B in its RHS moves the dot
Instead of spanning from i to j, A now spans from
i to k, which is where B ended
Once the dot is moved, the rule will not be
created again

30
Example (Book that flight)
31
Example(Book that flight)
32
Example(Book that flight), cont
33
Example(Book that flight), cont
34
Example(Book that flight), cont
35
Earley parsing

The Earley algorithm is efficient, running in
polynomial time.
Technically, however, it is a recognizer, not a
parser
To make it a parser, each state needs to be
augmented with a pointer to the states that its
rule covers
For example, a VP would point to the state where
its V was completed and the state where its NP
was completed

36
Other Dynamic Programming methods

CYK (Cocke-Kasami-Younger) Parser
Using CNF grammar rules
Chart Parsing
Modified version of Earley parsing with dynamic
ordering of states in the algorithm

37
CYK Parsing

The DP method by using CNF grammar
A?BC
A?m
Any CFG can be converted to CNF,
So, dont loss anything
A?B unit productions (can be rewrited by A??
for any A??)
Like other DP methods, a simple (n1)(n1)
matrix used to encode the structure of the
sentence (n sentence length)
Indexed is the gap between words
0 Book 1 that 2 flight 3
i,j is a set of non-terminals that represent
all the constituents that span positions i
through j of the input

38
CYK Parsing, cont,d

Since our grammar is in CNF, the non-terminal
entries in the table have exactly two daughters
in the parse.
for each constituent represented by an entry i,
j in the table there must be a position in the
input, k, where it can be split into two parts
such that i lt k lt j.
Given such a k, the first constituent i,k must
lie to the left of entry i, j somewhere along
row i, and the second entry k, j must lie
beneath it, along column j

39
CYK Algorithm
40
(No Transcript)
41
CYK example(CNF Grammar)
42
CYK example(Book the flight through Houston)
43
CYK in practice

Does not have major problem theoretically
The resulted parse tree are not consistent to
syntacticians(because of CNF formal)
Syntax to Semantic approach complicated
Post-processing needed to return-back the result
to more acceptable form

44
Chart Parser

In both the CKY and Earley algorithms, the order
in which events occur (adding entries to the
table, reading words, making predictions, etc.)
is statically determined by the procedures that
make up these algorithms.
Unfortunately, dynamically determining the order
in which events occur based on the current
information is often necessary
Chart Parsing facilitates just such dynamic
determination of the order in which chart entries
are processed.
Using Agenda

45
Chart Parser

fundamental rule generalized the ideas in CYK
and Earley
if the chart contains two edges A ? a B ß , i,
j and B ? ? , j,k then we should add the new
edge A ?a B ß i,k to the chart
Prediction can be top-down of botton-up

46
(No Transcript)
47
Prediction in Chart Parser
48
Inadequacies of parsing with plain CFGs

While the Earley algorithm works well for CFGs,
we have to at some point question the validity of
using plain CFGs
Well show this by looking at two phenomena
(although, there are many more)
Subject-verb agreement
Subcategorization frames

49
Modeling Subject-Verb Agreement in CFGs

The flights leave vs. The flight leaves.
S ? 3sgNP 3sgVP
S ? PluralNP PluralVP
3sgVP? 3sgVerb flies
3sgVerb NP wants a flight
3sgVerb NP PP leaves Boston in the
morning
3sgVerb PP leaves on Thursday
3sgNP? Pronoun I
ProperNoun Denver
Det 3sgNominal a flight
3sgNominal ? Noun 3sgNominal morning
flight
3sgNoun flight

50
Problems with Modeling Agreement in CFGs

You can see how messy this is, resulting in a
massive increase in the size of the grammar.
Of course, once we add in determiner-noun
agreement (e.g., a flight vs. (the) flights),
it would get even larger.
Other languages which have gender agreement
(e.g., French) will make it even worse.
Furthermore, we miss generalizations all
transitive verbs have an NP object, regardless of
whether the verb is 3rd singular or not
We will need to go to feature-based grammars to
address these problems.

51
Subcategorization Frames in CFGs

V1. ? eat, sleep I want to eat
V2. NP prefer, find, leave Find NP the flight
from Pittsburgh to Boston
V3. NP NP show, give, find Show NP me NP the
airlines with flights from Pittsburgh
V4. PPfrom PPto fly, travel I would like to fly
PP from Boston PP to Philadelphia
V5. NP PPwith help, load Can you help NP me
PP with a flight
V6. VPto prefer, want, need I would prefer VP
to go by United Airlines
V7. VPbare_stem can, would, might I can VP go
from Boston
V8. V_S mean, imply Does this mean S American
has a hub in Boston

52
CFG Grammar For Subcategorization

VP ? V1
V2 NP
V3 NP NP
V4 PPfrom PPto
V5 NP PPwith
V6 VPto
V7 VPbare_stem
V8 S
V1? eat sleep,

53
Problem with Modeling Subcat in CFGs

Again, this results in an explosion in the number
of rules, especially when a full set of
subcategorization frames is included.
If we combine these rules with the agreement
rules, it gets even worse
Also, nouns, adjectives, and prepositions can
also subcategorize for complements.
And again, we have no way to state whats in
common about these rules
So, we turn to feature-based grammars