Title: CoarsetoFine Efficient Viterbi Parsing
1Coarse-to-Fine Efficient Viterbi Parsing
- Nathan Bodenstab
- OGI RPE Presentation
- May 8, 2006
2Outline
- What is Natural Language Parsing?
- Data Driven Parsing
- Hypergraphs and Parsing Algorithms
- High Accuracy Parsing
- Coarse-to-Fine
- Empirical Results
3What is Natural Language Parsing?
- Provides a sentence with syntactic information by
hierarchically clustering and labeling its
constituents. - A constituent is a group of one or more words
that function together as a unit.
4What is Natural Language Parsing?
- Provides a sentence with syntactic information by
hierarchically clustering and labeling its
constituents. - A constituent is a group of one or more words
that function together as a unit.
5Why Parse Sentences?
- Syntactic structure is useful in
- Speech Recognition
- Machine Translation
- Language Understanding
- Word Sense Disambiguation (ex. bottle)
- Question-Answering
- Document Summarization
6Outline
- What is Natural Language Parsing?
- Data Driven Parsing
- Hypergraphs and Parsing Algorithms
- High Accuracy Parsing
- Coarse-to-Fine
- Empirical Results
7Data Driven Parsing
- Parsing Grammar Algorithm
- Probabilistic Context-Free Grammar
P(childrenDeterminer, Adjective, Noun
parentNounPhrase)
8Data Driven Parsing
- Find the maximum likelihood parse tree from all
grammatically valid candidates. - The probability of a parse tree is the product of
all its grammar rule (constituent) probabilities. - The number of grammatically valid parse trees
increases exponentially with the length of the
sentence.
9Outline
- What is Natural Language Parsing?
- Data Driven Parsing
- Hypergraphs and Parsing Algorithms
- High Accuracy Parsing
- Coarse-to-Fine
- Empirical Results
10Hypergraphs
- A directed hypergraph can facilitate dynamic
programming (Klein and Manning, 2001). - A hyperedge connects a set of tail nodes to a set
of head nodes.
Standard Edge
Hyperedge
11Hypergraphs
12The CYK Algorithm
- Separates the hypergraph into levels
- Exhaustively traverses every hyperedge, level by
level
13The A Algorithm
- Maintains a priority queue of traversable
hyperedges - Traverses best-first until a complete parse tree
is found
Priority Queue
14Outline
- What is Natural Language Parsing?
- Data Driven Parsing
- Hypergraphs and Parsing Algorithms
- High Accuracy Parsing
- Coarse-to-Fine
- Empirical Results
15High(er) Accuracy Parsing
- Modify the Grammar to include more context
- (Grand) Parent Annotation (Johnson, 1998)
P(childrenDeterminer, Adjective, Noun
parentNounPhrase, grandParentSentence)
16Increased Search Space
Original Grammar
Parent Annotated Grammar
17Increased Search Space
Original Grammar
Parent Annotated Grammar
18Increased Search Space
Original Grammar
Parent Annotated Grammar
19Increased Search Space
Original Grammar
Parent Annotated Grammar
20Increased Search Space
Original Grammar
Parent Annotated Grammar
21Grammar Comparison
- Exact Inference with the CYK algorithm becomes
intractable. - Most algorithms using Lexical models resort to
greedy search strategies. - We want to find the globally optimal (Viterbi)
parse tree for these high- - accuracy models efficiently.
22Outline
- What is Natural Language Parsing?
- Data Driven Parsing
- Hypergraphs and Parsing Algorithms
- High Accuracy Parsing
- Coarse-to-Fine
- Empirical Results
23Coarse-to-Fine
- Efficiently find the optimal parse tree of a
large, context-enriched model (Fine) by following
hyperedges suggested by solutions of a simpler
model (Coarse). - To evaluate the feasibility of Coarse-to-Fine, we
use - Coarse WSJ
- Fine Parent
24Increased Search Space
Coarse Grammar
Fine Grammar
25Coarse-to-Fine
Build Coarse hypergraph
26Coarse-to-Fine
Choose a Coarse hyperedge
27Coarse-to-Fine
Replace the Coarse hyperedge with Fine hyperedge
(modifies probability)
28Coarse-to-Fine
Propagate probability difference
29Coarse-to-Fine
Repeat until optimal parse tree has only Fine
hyperedges
30Upper-Bound Grammar
- Replacing a Coarse hyperedge with a Fine
hyperedge can increase or decrease its
probability. - Once we have found a parse tree with only Fine
hyperedges, how can we be sure it is optimal? - Modify the probability of Coarse grammar rules to
be an upper-bound on the probability of Fine
grammar rules.
where N is the set of non-terminals and
is a grammar rule.
31Outline
- What is Natural Language Parsing?
- Data Driven Parsing
- Hypergraphs and Parsing Algorithms
- High Accuracy Parsing
- Coarse-to-Fine
- Empirical Results
32Results
33Summary Future Research
- Coarse-to-Fine is a new exact inference algorithm
to efficiently traverse a large hypergraph space
by using the solutions of simpler models. - Full probability propagation through the
hypergraph hinders computational performance. - Full propagation is not necessary lower-bound of
log2(n) operations. - Over 95 reduction in search space compared to
baseline CYK algorithm. - Should prune even more space with higher-accuracy
(Lexical) models.
34Thanks
35Choosing a Coarse HyperedgeTop-Down vs. Bottom-Up
36Top-Down vs. Bottom-Up
- Top-Down
- Traverses more hyperedges
- Hyperedges are closer to the root
- Requires less propagation (1/2)
- Bottom-Up
- Traverses less hyperedges
- Hyperedges are near the leaves
- (words) and shared by many trees
- True probability of trees isnt
- know at the beginning of CTF
37Coarse-to-Fine Motivation
Optimal Fine Tree
Optimal Coarse Tree