Simple Training of Dependency Parsers via Structured Boosting - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Simple Training of Dependency Parsers via Structured Boosting

Description:

A simple variant of standard boosting algorithms. Global optimization. As cheap as local methods. Can be easily applied to any local predictor ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 31
Provided by: wang6
Category:

less

Transcript and Presenter's Notes

Title: Simple Training of Dependency Parsers via Structured Boosting


1
Simple Training of Dependency Parsers via
Structured Boosting
Qin Iris Wang University of Alberta Joint
work with Dekang Lin Dale Schuurmans Hyderab
ad, India Jan 11, 2007
2
Structured Boosting
  • A simple variant of standard boosting algorithms
  • Global optimization
  • As cheap as local methods
  • Can be easily applied to any local predictor
  • Successfully applied to dependency parsing

3
Dependency Tree
  • A dependency tree structure for a sentence
  • Syntactic relationships between word pairs in
    the sentence

4
Increasing Interest
  • Dependency parsing has been an active research
    area
  • Dependency trees are much easier to understand
    and annotate than constituency trees
  • Dependency relations have been widely used
  • Machine translation (Fox 2002, Cherry Lin 2003,
    Ding Palmer 2005)
  • Information extraction (Culotta Sorensen 2004)
  • Question answering (Pinchak Lin 2006)
  • Coreference resolution (Bergsma Lin 2006)

5
Overview
  • Dependency parsing model
  • Local training methods for dependency parsing
  • Global training methods
  • Structured boosting
  • Experimental results
  • Conclusion

6
Dependency Parsing Model
  • W (w1, , wn ) an input sentence
  • T a candidate dependency tree
  • the set of possible dependency trees
    spanning on W.
  • Scoring functions

Eisner 1996 McDonald 2005
Feature weights
Feature functions
7
Features for a Link
Lots!
  • Word pair indicator
  • POS
  • of that word pair
  • Pointwise Mutual Information (PMI) for that word
    pair
  • Distance between words

8
Static Features
p-word/c-word word of parent/child p-pos/c-pos
pos of parent/child
9
Dynamic Features
  • Take into account the link labels of the
    surrounding components when predicting the label
    of a target
  • Commonly used in sequential labeling task
    (McCallum et al 2000, Toutanova et al. 2003)
  • A simple but useful idea for improving structured
    predictors
  • Can also be easily employed for dependency
    parsing (Wang et al. 2005, McDonald and Pereira
    2006)

10
Local Training Examples
  • Given training examples (S, T)

local examples
link label features L W1_The,
W2_boy, W1W2_The_boy, T1_DT, T2_NN, T1T2_DT_NN,
Dist_1 L W1_boy, W2_skipped, R
W1_skipped, W2_ school, R W1_skipped,
W2_regularly, N W1_The, W2_skipped,
N W1_The, W2_school, N W1_The,
W2_regularly,
L left link R right link N no link
11
Local Training Methods
  • Learn a local link predictor given feature inputs
  • A purely local approach to the learning problem
  • For each word pair in a sentence
  • No link, left link or right link ?

12
But, If Only Use a Local Classifier
  • The output may not be a tree
  • Dependency parsing is a structured classification
    problem, i.e., the output has to be a dependency
    tree
  • Need to satisfy the constraints between the
    classifications of different local links

How to perform a structured prediction using a
local prediction model?
13
Local Training Parsing Algorithm
Training sentences (S, T)
Local training examples
Eisner 1996
Link score
Dependency parsing algorithm
A projective spanning tree
Local predictor
Dependency trees
14
Parsing With a Local Link Predictor
  • Support vector machines (Yamada and Matsumoto
    2003)
  • Logistic regression / Maximum entropy models
    (Ratnaparkhi 1999 and Charniak 2000)
  • But, we can do better if we use global training

15
Global Training for Structured Prediction
  • Recent global training algorithms for learning
    structured predictors
  • CRFs (Lafferty et at. 2001)
  • Structured SVMs (Tsochantaridis et al. 2004,
    Altun et al. 2003)
  • Max-Margin Markov Networks (Taskar et al. 2003)
  • Incorporate the effects of the structured
    predictor directly into the training algorithm
  • These training algorithms have been applied to
    parsing (Taskar et al. 2004, McDonald et al.
    2005, Wang et al. 2006)

16
But, Drawbacks
  • Unfortunately, for a complicated task like
    dependency parsing, these structured training
    techniques
  • Expensive
  • Specialized
  • Complex to implement

17
Our Idea Structured Boosting
  • Global optimization
  • Almost as cheap as local methods
  • Promising results

18
A Generic Boosting Algorithm
  • Train a local classifier (weak hypothesis), h1
  • Re-weight the local training examples
  • Re-train the local link predictor using the
    re-weighted training examples, getting h2
  • Finally we have h1 , h2 , , hk

19
Standard Boosting for Classification
Training sentences (S, T)
Local training examples
Local predictor
Classification
Re-weighting the local examples
20
Structured Boosting
  • Train a local link predictor, h1
  • Re-parse training data using h1
  • Re-weight local examples
  • Compare the parser outputs with the gold standard
    trees
  • Increase the weight of mis-parsed local examples
  • Re-train local link predictor, getting h2
  • Finally we have h1 , h2 , , hk

21
Structured Boosting for Dependency Parsing
Training sentences (S, T)
Local training examples
Eisner 1996
Link score
Dependency parsing algorithm
A projective spanning tree
Local predictor
Re-parse the training data
Dependency trees
Re-weight the local examples
22
Experimental Design
  • Learning dependency parsers for English and
    Chinese
  • Data set
  • English Penn Treebank 3.0 (PTB3)
  • Chinese Treebank 4.0 5.0 (CTB4, CTB5)
  • Features
  • Static
  • Dynamic

23
Experimental Design Cont.
  • Local training algorithm
  • Logistic regression model/ Maximum Entropy
  • Parser
  • Standard bi-lexical CKY parser (O(n5))
  • Boosting method
  • A variant of Adaboost M1 (Freund Schapire 1997)

24
Experimental Results - 1
Table 1 Boosting with static features
25
Experimental Results - 2
Table 2 Boosting with dynamic features
26
Comparison With Others
IWPT 2005
Chinese Treebank 4.0 Chinese Treebank
5.0 _at_ personal communication
27
Conclusion
  • Structured boosting is a simple and general way
    to coordinate local link predictor with global
    optimization
  • Successfully applied to natural language
    dependency parsing

28
Thanks!
Questions?
29
Advantages
  • Structured boosting is a simple variant of
    standard boosting algorithms
  • Local parameter optimization is coordinated to
    global structured prediction (parsing)
  • Training (parameter estimation) is directly
    influenced by the resulting global accuracy of
    the parser
  • Simpler, more general, and can be easily applied
    to any local predictor

30
Decomposition of Training Data
  • Given training examples (S, T)
  • Decomposed into a set of local examples
    arbitrary word pairs and their link label (none,
    left, right) in context (S, T)
  • Learn a weight vector over a set of features
    defined on the local examples
Write a Comment
User Comments (0)
About PowerShow.com