Title: Basic Parsing with Context-Free Grammars
1- Basic Parsing with Context-Free Grammars
2Analyzing Linguistic Units
- Morphological parsing
- analyze words into morphemes and affixes
- rule-based, FSAs, FSTs
- Ngrams for Language Modeling
- POS Tagging
- Syntactic parsing
- identify constituents and their relationships
- to see if a sentence is grammatical
- to assign an abstract representation of meaning
3Syntactic Parsing
- Declarative formalisms like CFGs define the legal
strings of a language -- but dont specify how to
recognize or assign structure to them - Parsing algorithms specify how to recognize the
strings of a language and assign each string one
(or more) syntactic analyses - Parsing useful for grammar checking, semantic
analysis, MT, QA, information extraction, speech
recognitionand almost every task in NLP
4Parsing as a Form of Search
- Searching FSAs
- Finding the right path through the automaton
- Search space defined by structure of FSA
- Searching CFGs
- Finding the right parse tree among all possible
parse trees - Search space defined by the grammar
- Constraints provided by the input sentence and
the automaton or grammar
5CFG for Fragment of English
S ? NP VP VP ? V
S ? Aux NP VP PP -gt Prep NP
S ? VP N ? book flight meal money
NP ? Det Nom V ? book include prefer
NP ?PropN Aux ? does
Nom ? N Nom Prep ?from to on
Nom ? N PropN ? Houston TWA
Nom ? Nom PP Det ? that this a
VP ? V NP
LCs
TopD BotUp
E.g.
6 S VP NP Nom V Det
N Book that flight
Parse Tree for Book that flight for Prior CFG
7Rule Expansion
Det ? that this a
LCs
TopD BotUp
E.g.
8Top-Down Parser
- Builds from the root S node to the leaves
- Assuming we build all trees in parallel
- Find all trees with root S (or all rules w/lhs S)
- Next expand all constituents in these trees/rules
- Continue until leaves are pos
- Candidate trees failing to match pos of input
string are rejected (e.g. Book that flight
matches only one subtree)
9Top-Down Search Space for CFG (expanding only
leftmost leaves)
S S S NP VP Aux NP VP VP S
S S S S S NP VP NP VP Aux NP
VP Aux NP VP VP VP Det Nom PropN Det
Nom PropN V NP V Det
Nom N
10Bottom-Up Parsing
- Parser begins with words of input and builds up
trees, applying grammar rules whose rhs match - Book that flight
- N Det N V Det N
- Book that flight Book that flight
- Book ambiguous (2 pos appear in grammar)
- Parse continues until an S root node reached or
no further node expansion possible -
11Two Candidates One Successful Parse
- S
- VP
- VP NP NP
- Nom Nom
- V Det N V Det N
- Book that flight Book that flight
S ? VP NP
12Whats right/wrong with.
- Top-Down parsers they never explore illegal
parses (e.g. which cant form an S) -- but waste
time on trees that can never match the input - Bottom-Up parsers they never explore trees
inconsistent with input -- but waste time
exploring illegal parses (with no S root) - For both find a control strategy -- how explore
search space efficiently? - Pursuing all parses in parallel or backtrack or
? - Which rule to apply next?
- Which node to expand next?
13A Possible Top-Down Parsing Strategy
- Depth-first search
- Agenda of search states expand search space
incrementally, exploring most recently generated
state (tree) each time - When you reach a state (tree) inconsistent with
input, backtrack to most recent unexplored state
(tree) - Which node to expand?
- Leftmost or rightmost
- Which grammar rule to use?
- Order in the grammar? How?
14Top-Down, Depth-First, Left-Right Strategy
- Initialize agenda with S tree and ptr to first
word and make this current search state (cur) - Loop until successful parse or empty agenda
- Apply all applicable grammar rules to leftmost
unexpanded node of cur - If this node is a POS category and matches that
of the current input, push this onto agenda - O.w. push new trees onto agenda
- Pop new cur from agenda
- Does this flight include a meal?
15Fig 10.7
CFG
16Left Corners Top-Down Parsing with Bottom-Up
Filtering
- We saw Top-Down, depth-first, L2R parsing
- Expands non-terminals along the trees left edge
down to leftmost leaf of tree - Moves on to expand down to next leftmost leaf
- Note In successful parse, current input word
will be first word in derivation of node the
parser currently processing - So.look ahead to left-corner of the tree
- B is a left-corner of A if A gt B
- Build table with left-corners of all
non-terminals in grammar and consult before
applying rule
17Left Corners
18Left-Corner Table for CFG
19Left Recursion vs. Right Recursion
- Depth-first search will never terminate if
grammar is left recursive (e.g. NP --gt NP PP)
20- Solutions
- Rewrite the grammar (automatically?) to a weakly
equivalent one which is not left-recursive - e.g. The man on the hill with the telescope
- NP ? NP PP (wanted Nom plus a sequence of PPs)
- NP ? Nom PP
- NP ? Nom
- Nom ? Det N
- becomes
- NP ? Nom NP
- Nom ? Det N
- NP ? PP NP (wanted a sequence of PPs)
- NP ? e
- Not so obvious what these rules mean
21- Harder to detect and eliminate non-immediate left
recursion - NP --gt Nom PP
- Nom --gt NP
- Fix depth of search explicitly
- Rule ordering non-recursive rules first
- NP --gt Det Nom
- NP --gt NP PP
22The city hall parking lot in town
- NP? NP NP PP
- NP ? Det Nom
- NP ? Adj Nom
- NP ? Nom Nom
- Nom ? NP Nom
- Nom ? N
- PP ? Prep NP
- N ? city hall lot town
- Adj ? parking
- Prep ? to for in
23Structural ambiguity
- Multiple legal structures
- Attachment (e.g. I saw a man on a hill with a
telescope) - Coordination (e.g. younger cats and dogs)
- NP bracketing (e.g. Spanish language teachers)
24NP vs. VP Attachment
25- Solution?
- Return all possible parses and disambiguate using
other methods
26Summing Up
- Parsing is a search problem which may be
implemented with many control strategies - Top-Down or Bottom-Up approaches each have
problems - Combining the two solves some but not all issues
- Left recursion
- Syntactic ambiguity
- Next time Making use of statistical information
about syntactic constituents - Read Ch 11