Improved Inference for Unlexicalized Parsing - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Improved Inference for Unlexicalized Parsing

Description:

Parser available at http://nlp.cs.berkeley.edu To conclude, ... The dynamic programming results are significantly better than the reranking results, ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 32
Provided by: EECS
Category:

less

Transcript and Presenter's Notes

Title: Improved Inference for Unlexicalized Parsing


1
Improved Inference for Unlexicalized Parsing
  • Slav Petrov and Dan Klein

2
Unlexicalized Parsing
Petrov et al. 06
  • Hierarchical, adaptive refinement

91.2 F1 score on Dev Set (1600 sentences)

1,140 Nonterminal symbols 1621min Parsing time
531,200 Rewrites
3
  • 1621 min

4
Coarse-to-Fine Parsing
Goodman 97, CharniakJohnson 05
















5
Prune?
  • For each chart item Xi,j, compute posterior
    probability

lt threshold
E.g. consider the span 5 to 12
coarse
QP NP VP

refined


6
  • 1621 min
  • 111 min
  • (no search error)

7
Multilevel Coarse-to-Fine Parsing
Charniak et al. 06
  • Add more rounds of
  • pre-parsing
  • Grammars coarser
  • than X-bar

8
Hierarchical Pruning
  • Consider again the span 5 to 12

coarse
QP NP VP

split in two


QP1 QP2 NP1 NP2 VP1 VP2
split in four
QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4





split in eight

9
Intermediate Grammars
X-BarG0
G
10
  • 1621 min
  • 111 min
  • 35 min
  • (no search error)

11
State Drift (DT tag)
12
Projected Grammars
X-BarG0
G
13
Estimating Projected Grammars
  • Nonterminals?

NP0
NP1
VP1
VP0
S0
S1
Nonterminals in ?(G)
Nonterminals in G
14
Estimating Projected Grammars
  • Rules?

S ? NP VP
S1 ? NP1 VP1 0.20 S1 ? NP1 VP2 0.12 S1 ?
NP2 VP1 0.02 S1 ? NP2 VP2 0.03 S2 ? NP1
VP1 0.11 S2 ? NP1 VP2 0.05 S2 ? NP2 VP1
0.08 S2 ? NP2 VP2 0.12
15
Estimating Projected Grammars
Corazza Satta 06
Estimating Grammars
0.56
16
Calculating Expectations
  • Nonterminals
  • ck(X) expected counts up to depth k
  • Converges within 25 iterations (few seconds)
  • Rules

17
  • 1621 min
  • 111 min
  • 35 min
  • 15 min
  • (no search error)

18
Parsing times
X-BarG0
G
19
Bracket Posteriors
(after G0)
20
Bracket Posteriors (after G1)
21
Bracket Posteriors
(Movie)
(Final Chart)
22
Bracket Posteriors (Best Tree)
23
Parse Selection
  • Computing most likely unsplit tree is NP-hard
  • Settle for best derivation.
  • Rerank n-best list.
  • Use alternative objective function.

24
Parse Risk Minimization
Titov Henderson 06
  • Expected loss according to our beliefs
  • TT true tree
  • TP predicted tree
  • L loss function (0/1, precision, recall, F1)
  • Use n-best candidate list and approximate
  • expectation with samples.

25
Reranking Results
Objective Precision Recall F1 Exact

BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION
Viterbi Derivation 89.6 89.4 89.5 37.4
RERANKING RERANKING RERANKING RERANKING RERANKING
Precision (sampled) 91.1 88.1 89.6 21.4
Recall (sampled) 88.2 91.3 89.7 21.5
F1 (sampled) 90.2 89.3 89.8 27.2
Exact (sampled) 89.5 89.5 89.5 25.8
Exact (non-sampled) 90.8 90.8 90.8 41.7
Exact/F1 (oracle) 95.3 94.4 95.0 63.9
26
Dynamic Programming
Matsuzaki et al. 05 Approximate posterior
parse distribution
à la Goodman 98 Maximize number of expected
correct rules
27
Dynamic Programming Results
Objective Precision Recall F1 Exact

BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION
Viterbi Derivation 89.6 89.4 89.5 37.4

DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING
Variational 90.7 90.9 90.8 41.4
Max-Rule-Sum 90.5 91.3 90.9 40.4
Max-Rule-Product 91.2 91.1 91.2 41.4
28
Final Results (Efficiency)
  • Berkeley Parser
  • 15 min
  • 91.2 F-score
  • Implemented in Java
  • Charniak Johnson 05 Parser
  • 19 min
  • 90.7 F-score
  • Implemented in C

29
Final Results (Accuracy)
40 words F1 all F1
ENG CharniakJohnson 05 (generative) 90.1 89.6
ENG This Work 90.6 90.1
ENG CharniakJohnson 05 (reranked) 92.0 91.4

GER Dubey 05 76.3 -
GER This Work 80.8 80.1

CHN Chiang et al. 02 80.0 76.6
CHN This Work 86.3 83.4
30
Conclusions
  • Hierarchical coarse-to-fine inference
  • Projections
  • Marginalization
  • Multi-lingual unlexicalized parsing

31
Thank You!
  • Parser available at
  • http//nlp.cs.berkeley.edu
Write a Comment
User Comments (0)
About PowerShow.com