Learning and Inference for Hierarchically Split PCFGs - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Learning and Inference for Hierarchically Split PCFGs

Description:

The Game of Designing a Grammar. Annotation refines base treebank symbols to improve ... [Goodman 97, Charniak&Johnson 05] Coarse grammar. NP ... VP ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 34
Provided by: EECS
Category:

less

Transcript and Presenter's Notes

Title: Learning and Inference for Hierarchically Split PCFGs


1
Learning and Inference for Hierarchically Split
PCFGs
  • Slav Petrov and Dan Klein

2
The Game of Designing a Grammar
  • Annotation refines base treebank symbols to
    improve statistical fit of the grammar
  • Parent annotation Johnson 98

3
The Game of Designing a Grammar
  • Annotation refines base treebank symbols to
    improve statistical fit of the grammar
  • Parent annotation Johnson 98
  • Head lexicalization Collins 99, Charniak 00

4
The Game of Designing a Grammar
  • Annotation refines base treebank symbols to
    improve statistical fit of the grammar
  • Parent annotation Johnson 98
  • Head lexicalization Collins 99, Charniak 00
  • Automatic clustering?

5
Learning Latent Annotations
Matsuzaki et al. 05
  • EM algorithm
  • Brackets are known
  • Base categories are known
  • Only induce subcategories

Just like Forward-Backward for HMMs.
6
Overview
- Hierarchical Training - Adaptive Splitting -
Parameter Smoothing
7
Refinement of the DT tag
DT
8
Refinement of the DT tag
DT
9
Hierarchical refinement of the DT tag
DT
10
Hierarchical Estimation Results
11
Refinement of the , tag
  • Splitting all categories the same amount is
    wasteful

12
Adaptive Splitting
  • Want to split complex categories more
  • Idea split everything, roll back splits which
    were least useful

13
Adaptive Splitting
  • Want to split complex categories more
  • Idea split everything, roll back splits which
    were least useful

14
Adaptive Splitting Results
15
Number of Phrasal Subcategories
16
Number of Phrasal Subcategories
NP
VP
PP
17
Number of Phrasal Subcategories
NAC
X
18
Number of Lexical Subcategories
POS
TO
,
19
Number of Lexical Subcategories
NNP
JJ
NNS
NN
20
Smoothing
  • Heavy splitting can lead to overfitting
  • Idea Smoothing allows us to pool
  • statistics

21
Result Overview
22
Linguistic Candy
  • Proper Nouns (NNP)
  • Personal pronouns (PRP)

23
Linguistic Candy
  • Relative adverbs (RBR)
  • Cardinal Numbers (CD)

24
Inference
  • She heard the noise.

Exhaustive parsing 1 min per sentence
25
Coarse-to-Fine Parsing
Goodman 97, CharniakJohnson 05
26
Hierarchical Pruning
lt t
  • Consider again the span 5 to 12

coarse
split in two
split in four
split in eight
27
Intermediate Grammars
X-BarG0
G
28
Projected Grammars
X-BarG0
G
29
Final Results (Efficiency)
  • Parsing the development set (1600 sentences)
  • Berkeley Parser
  • 10 min
  • Implemented in Java
  • Charniak Johnson 05 Parser
  • 19 min
  • Implemented in C

30
Final Results (Accuracy)
31
Extensions
  • Acoustic modeling
  • Infinite Grammars
  • Nonparametric Bayesian Learning

Petrov, Pauls Klein 07
Liang, Petrov, Jordan Klein 07
32
Conclusions
  • Split Merge Learning
  • Hierarchical Training
  • Adaptive Splitting
  • Parameter Smoothing
  • Hierarchical Coarse-to-Fine Inference
  • Projections
  • Marginalization
  • Multi-lingual Unlexicalized Parsing

33
Thank You!
  • http//nlp.cs.berkeley.edu
Write a Comment
User Comments (0)
About PowerShow.com