Title: PrototypeDriven Grammar Induction
1Prototype-Driven Grammar Induction
- Aria Haghighi and Dan Klein
- Computer Science Division
- University of California Berkeley
2 Grammar Induction
DT NN VBD DT NN IN NN The screen
was a sea of red
3First Attempt
DT NN VBD DT NN IN NN The screen
was a sea of red
4Central Questions
- How do we specify what we want to learn?
- How do we fix observed errors?
Whats an NP?
Thats not quite it!
5Experimental Set-up
- Binary Grammar
- X1, X2, Xn plus POS tags
- Data
- WSJ-10 7k sentences
- Evaluate on Labeled F1
- Grammar Upper Bound 86.1
Xi
Xj
Xk
6 Experiment Roadmap
- Unconstrained Induction
- Need bracket constraint!
- Gold Bracket Induction
- Prototypes and Similarity
- CCM Bracket Induction
7Unconstrained PCFG Induction
- Learn PCFG with EM
- Inside-Outside Algorithm
- Lari Young 93
- Results
8Constrained PCFG Induction
- Gold Brackets
- Periera Schables 93
- Result
-
9 Encoding Knowledge
Whats an NP?
Semi-Supervised Learning
10 Encoding Knowledge
Whats an NP?
For instance, DT NN JJ NNS NNP NNP
Prototype Learning
11 Grammar Induction Experiments
- Add Prototypes
- Manually
- constructed
-
12 How to use prototypes?
?
S
?
VP
?
PP
?
NP
?
NP
DT The
NN koala
VBD sat
IN in
DT the
NN tree
13 How to use prototypes?
S
VP
PP
?
NP
NP
DT The
NN koala
VBD sat
IN in
DT the
NN tree
JJ hungry
14Distributional Similarity
- Context Distribution
- ? (DT JJ NN) __ VBD 0.3,
- VBD __ 0.2,
- IN __ VBD 0.1, ..
- Similarity
-
-
-
-
? (DT NN)
? (NNP NNP)
? (DT JJ NN)
? (JJ NNS)
NP
15 Distributional Similarity
- Prototype Approximation
- ?(NP) ¼
- Uniform ( ?(DT NN), ?(JJ NNS), ?(NNP NNP) )
-
-
- Prototype Similarity Feature
- span(DT JJ NN) emits protoNP
- span(MD NNS) emits protoNONE
-
-
-
16 Prototype CFG Model
S
P (DT NP NP) P (protoNP NP)
VP
NP
PP
NP
NP
NN koala
VBD sat
IN in
DT the
NN tree
JJ hungry
DT The
17Prototype CFG Induction
- Experimental Set-Up
- BLIPP corpus
- Gold Brackets
- Results
18 Summary So Far
- Bracket constraint and prototypes give good
performance!
19Constituent-Context Model
20 Product Model
- Different Aspects of Syntax
- CCM Yield and Context properties
- CFG Hierarchical properties
- Intersected EM Klein 2005
- Encourages mass on trees compatible with CCM and
CFG
21 Grammar Induction Experiments
- Intersected CFG and CCM
- No prototypes
- Results
22 Grammar Induction Experiments
- Intersected CFG and CCM
- Add Prototypes
- Results
23Reacting to Errors
Our Tree
Correct Tree
24 Reacting to Errors
- Add Prototype NP-POS NN POS
New Analysis
25Error Analysis
Our Tree
Correct Tree
26 Reacting to Errors
- Add Prototype VP-INF VB NN
New Analysis
27 Fixing Errors
- Supplement Prototypes
- NP-POS and VP-INF
- Results
28 Results Summary
29 Conclusion
- Prototype-Driven Learning
- Flexible Weakly Supervised Framework
- Merged distributional clustering techniques with
supervised structured models
30Thank You!
- http//www.cs.berkeley.edu/aria42
31Unconstrained PCFG Induction
- Binary Grammar
-
- X1, X2, Xn
- Learn PCFG with EM
- Inside-Outside Algorithm
- Lari Young 93
Xi
Xi
Xi
Xi
V
Xj
Xk
N
Xk
Xj
V
N