Boosting-based parse re-ranking with subtree features - PowerPoint PPT Presentation

About This Presentation
Title:

Boosting-based parse re-ranking with subtree features

Description:

DOP deals with the all the subtrees representation explicitly like our method. Pros ... Kernels vs DOP vs Boosting. Yes. Yes, but difficult because of redundant ... – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 30
Provided by: Taku9
Learn more at: http://chasen.org
Category:

less

Transcript and Presenter's Notes

Title: Boosting-based parse re-ranking with subtree features


1
Boosting-based parse re-ranking with subtree
features
  • Taku Kudo
  • Jun Suzuki
  • Hideki Isozaki
  • NTT Communication Science Labs.

2
Discriminative methods for parsing
  • have shown a remarkable performance compared to
    traditional generative models, e.g., PCFG
  • two approaches
  • re-ranking Collins 00, Collins 02
  • discriminative machine learning algorithms are
    used to rerank n-best outputs of
    generative/conditional parsers.
  • dynamic programming
  • Max margin parsing Tasker 04

3
Reranking
x I buy cars with money
G(x)
n-best results
  • Let x be an input sentence, and y be a parse tree
    for x
  • Let G(x) be a function that returns a set of
    n-best results for x
  • A re-ranker gives a score to each sentence and
    selects the result which has the highest score

y1
y2
y3
.
4
Scoring with linear model
  • is a feature function that maps output
    y into space
  • is a parameter vector (weights) modeled with
    training data

5
Two issues in linear model 1/2
  • How to estimate the weights ?
  • try to minimize a loss for given training data
  • definition of loss

ME
SVMs
Boosting
6
Two issues in linear model 2/2
  • How to define the feature set ?
  • use all subtrees
  • Pros - natural extension of CFG rules
  • - can capture long contextual
    information
  • Cons naïve enumerations give huge complexities

7
A question for all subtrees
  • Do we always need all subtrees?
  • only a small set of subtrees is informative
  • most subtrees are redundant
  • Goal automatic feature selection from all
    subtrees
  • can perform fast parsing
  • can give good interpretation to selected
    subtrees
  • Boosting meets our demand!

8
Why Boosting?
  • Different regularization strategies for
  • L1 (Boosting)
  • better when most given features are irrelevant
  • can remove redundant features
  • L2 (SVMs)
  • better when most given features are relevant
  • uses features as much as they can
  • Boosting meets our demand, because most subtrees
    are irrelevant and redundant

9
RankBoost Freund03
10
How to find the optimal subtree?
  • Set of all subtrees is huge
  • Need to find the optimal subtree efficiently

11
Ad-hoc techniques
  • Size constraints
  • Use subtrees whose size is less than s (s 68)
  • Frequency constraints
  • Use subtrees that occur no less than f times in
    training data (f 2 5)
  • Pseudo iterations
  • After several 5- or 10-iterations of boosting, we
    alternately perform 100- or 300 pseudo
    iterations, in which the optimal subtee is
    selected from the cache that maintains the
    features explored in the previous iterations.

12
Relation to previous work
Boosting vs Kernel methods Collins 00 Boosting
vs Data Oriented Parsing Bod 98
13
Kernels Collins 00
  • Kernel methods reduce the problem into the dual
    form that only depends on dot products of two
    instances (parsed trees)
  • Pros
  • No need to provide explicit feature vector
  • A dynamic programming is used to calculate dot
    products between trees, which is very efficient!
  • Cons
  • Require a large number of kernel evaluations in
    testing
  • Parsing is slow
  • Difficult to see which features are relevant

14
DOP Bod 98
  • DOP is not based on re-ranking
  • DOP deals with the all the subtrees
    representation explicitly like our method
  • Pros
  • high accuracy
  • Cons
  • exact computation is NP-complete
  • cannot always provide sparse feature
    representation
  • very slow since the number of subtrees the DOP
    uses is huge

15
Kernels vs DOP vs Boosting
Kernel DOP Boosting
How to enumerate all the subtrees? implicitly explicitly explicitly
Complexity in training polynomial NP-hard NP-hard (worst case) Branch-and-bound
Sparse feature representations No No Yes
Parsing speed slow slow fast
Can see relevant features? No Yes, but difficult because of redundant features Yes
16
Experiments
WSJ parsing Shallow parsing
17
Experiments
  • WSJ parsing
  • Standard data training 2-21, test 23 of PTB
  • Model2 of Collins 99 was used to obtain n-best
    results
  • exactly the same setting as Collins 00
    (Kernels)
  • Shallow parsing
  • CoNLL 2000 shared task
  • training15-18, test 20 of PTB
  • CRF-based parser Sha 03 was used to obtain
    n-best results

18
Tree representations
  • WSJ parsing
  • lexicalized tree
  • each non-terminal has a special node labeled
    with a head word
  • Shallow parsing
  • right-branching tree where adjacent phrases are
    child/parent relation
  • special node for right/left boundaries

19
Results WSJ parsing
LR/LP labeled recall/precision. CBs is the
average number of cross brackets per sentence. 0
CBs, and 2CBs are the percentage of sentences
with 0 or 2 crossing brackets, respectively
  • Comparable to other methods
  • Better than kernel method that uses all subtree
    representations with different parameter
    estimation

20
Results Shallow parsing
Fß1 is a harmonic mean between precision and
recall
  • Comparable to other methods
  • Our method is also comparable to Zhangs method
    even without extra linguistic features

21
Advantages
  • Compact feature set
  • WSJ parsing 8,000
  • Shallow parsing 3,000
  • Kernels implicitly use a huge number of features
  • Parsing is very fast
  • WSJ parsing 0.055 sec./sentence
  • Shallow parsing 0.042 sec./sentence

(n-best parsing time is NOT included)
22
Advantages, contd
  • Sparse feature representations allow us to
    analyze which kinds of subtrees are relevant

Shallow parsing
WSJ parsing
positive subtrees
positive subtrees
negative subtrees
negative subtrees
23
Conclusions
  • All subtrees are potentially used as features
  • Boosting
  • L1 norm regularization performs automatic
    feature selection
  • Branch and bound
  • enables us to find the optimal subtrees
    efficiently
  • Advantages
  • comparable accuracy to other parsing methods
  • fast parsing
  • good interpretability

24
Efficient computation
25
Right most extension Asai02, Zaki02
  • Extend a given tree of size (n-1) by adding a new
    node to obtain trees of size n
  • a node is added to the right-most-path
  • a node is added as the rightmost sibling

26
Right most extension, cont.
  • Recursive applications of right most extensions
    create a search space

27
Pruning
  • For all propose an upper bound
    such that
  • Can prune the node t if ,
  • where is a suboptimal gain

28
Upper bound of the gain
29
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com