Boosting-based parse re-ranking with subtree features

About This Presentation

Title:

Boosting-based parse re-ranking with subtree features

Description:

DOP deals with the all the subtrees representation explicitly like our method. Pros ... Kernels vs DOP vs Boosting. Yes. Yes, but difficult because of redundant ... – PowerPoint PPT presentation

Number of Views:141

Avg rating:3.0/5.0

Slides: 30

Provided by: Taku9

Learn more at: http://chasen.org

Category:

more less

Transcript and Presenter's Notes

Title: Boosting-based parse re-ranking with subtree features

1
Boosting-based parse re-ranking with subtree
features

Taku Kudo
Jun Suzuki
Hideki Isozaki
NTT Communication Science Labs.

2
Discriminative methods for parsing

have shown a remarkable performance compared to
traditional generative models, e.g., PCFG
two approaches
re-ranking Collins 00, Collins 02
discriminative machine learning algorithms are
used to rerank n-best outputs of
generative/conditional parsers.
dynamic programming
Max margin parsing Tasker 04

3
Reranking
x I buy cars with money
G(x)
n-best results

Let x be an input sentence, and y be a parse tree
for x
Let G(x) be a function that returns a set of
n-best results for x
A re-ranker gives a score to each sentence and
selects the result which has the highest score

y1
y2
y3
.
4
Scoring with linear model

is a feature function that maps output
y into space
is a parameter vector (weights) modeled with
training data

5
Two issues in linear model 1/2

How to estimate the weights ?
try to minimize a loss for given training data
definition of loss

ME
SVMs
Boosting
6
Two issues in linear model 2/2

How to define the feature set ?
use all subtrees
Pros - natural extension of CFG rules
- can capture long contextual
information
Cons naïve enumerations give huge complexities

7
A question for all subtrees

Do we always need all subtrees?
only a small set of subtrees is informative
most subtrees are redundant
Goal automatic feature selection from all
subtrees
can perform fast parsing
can give good interpretation to selected
subtrees
Boosting meets our demand!

8
Why Boosting?

Different regularization strategies for
L1 (Boosting)
better when most given features are irrelevant
can remove redundant features
L2 (SVMs)
better when most given features are relevant
uses features as much as they can
Boosting meets our demand, because most subtrees
are irrelevant and redundant

9
RankBoost Freund03
10
How to find the optimal subtree?

Set of all subtrees is huge
Need to find the optimal subtree efficiently

11
Ad-hoc techniques

Size constraints
Use subtrees whose size is less than s (s 68)
Frequency constraints
Use subtrees that occur no less than f times in
training data (f 2 5)
Pseudo iterations
After several 5- or 10-iterations of boosting, we
alternately perform 100- or 300 pseudo
iterations, in which the optimal subtee is
selected from the cache that maintains the
features explored in the previous iterations.

12
Relation to previous work
Boosting vs Kernel methods Collins 00 Boosting
vs Data Oriented Parsing Bod 98
13
Kernels Collins 00

Kernel methods reduce the problem into the dual
form that only depends on dot products of two
instances (parsed trees)
Pros
No need to provide explicit feature vector
A dynamic programming is used to calculate dot
products between trees, which is very efficient!
Cons
Require a large number of kernel evaluations in
testing
Parsing is slow
Difficult to see which features are relevant

14
DOP Bod 98

DOP is not based on re-ranking
DOP deals with the all the subtrees
representation explicitly like our method
Pros
high accuracy
Cons
exact computation is NP-complete
cannot always provide sparse feature
representation
very slow since the number of subtrees the DOP
uses is huge

15
Kernels vs DOP vs Boosting
Kernel DOP Boosting
How to enumerate all the subtrees? implicitly explicitly explicitly
Complexity in training polynomial NP-hard NP-hard (worst case) Branch-and-bound
Sparse feature representations No No Yes
Parsing speed slow slow fast
Can see relevant features? No Yes, but difficult because of redundant features Yes
16
Experiments
WSJ parsing Shallow parsing
17
Experiments

WSJ parsing
Standard data training 2-21, test 23 of PTB
Model2 of Collins 99 was used to obtain n-best
results
exactly the same setting as Collins 00
(Kernels)
Shallow parsing
CoNLL 2000 shared task
training15-18, test 20 of PTB
CRF-based parser Sha 03 was used to obtain
n-best results

18
Tree representations

WSJ parsing
lexicalized tree
each non-terminal has a special node labeled
with a head word
Shallow parsing
right-branching tree where adjacent phrases are
child/parent relation
special node for right/left boundaries

19
Results WSJ parsing
LR/LP labeled recall/precision. CBs is the
average number of cross brackets per sentence. 0
CBs, and 2CBs are the percentage of sentences
with 0 or 2 crossing brackets, respectively

Comparable to other methods
Better than kernel method that uses all subtree
representations with different parameter
estimation

20
Results Shallow parsing
Fß1 is a harmonic mean between precision and
recall

Comparable to other methods
Our method is also comparable to Zhangs method
even without extra linguistic features

21
Advantages

Compact feature set
WSJ parsing 8,000
Shallow parsing 3,000
Kernels implicitly use a huge number of features
Parsing is very fast
WSJ parsing 0.055 sec./sentence
Shallow parsing 0.042 sec./sentence

(n-best parsing time is NOT included)
22
Advantages, contd

Sparse feature representations allow us to
analyze which kinds of subtrees are relevant

Shallow parsing
WSJ parsing
positive subtrees
positive subtrees
negative subtrees
negative subtrees
23
Conclusions

All subtrees are potentially used as features
Boosting
L1 norm regularization performs automatic
feature selection
Branch and bound
enables us to find the optimal subtrees
efficiently
Advantages
comparable accuracy to other parsing methods
fast parsing
good interpretability

24
Efficient computation
25
Right most extension Asai02, Zaki02

Extend a given tree of size (n-1) by adding a new
node to obtain trees of size n
a node is added to the right-most-path
a node is added as the rightmost sibling

26
Right most extension, cont.

Recursive applications of right most extensions
create a search space

27
Pruning

For all propose an upper bound
such that
Can prune the node t if ,
where is a suboptimal gain

28
Upper bound of the gain
29
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Boosting-based parse re-ranking with subtree features - PowerPoint PPT Presentation

Boosting-based parse re-ranking with subtree features

DOP deals with the all the subtrees representation explicitly like our method. Pros ... Kernels vs DOP vs Boosting. Yes. Yes, but difficult because of redundant ... – PowerPoint PPT presentation