Simple Training of Dependency Parsers via Structured Boosting presentation

About This Presentation

Title:

Simple Training of Dependency Parsers via Structured Boosting

Description:

A simple variant of standard boosting algorithms. Global optimization. As cheap as local methods. Can be easily applied to any local predictor ... –

Number of Views:42

Avg rating:3.0/5.0

Slides: 31

Provided by: wang6

Category:

more less

Transcript and Presenter's Notes

Title: Simple Training of Dependency Parsers via Structured Boosting

1
Simple Training of Dependency Parsers via
Structured Boosting
Qin Iris Wang University of Alberta Joint
work with Dekang Lin Dale Schuurmans Hyderab
ad, India Jan 11, 2007
2
Structured Boosting

A simple variant of standard boosting algorithms
Global optimization
As cheap as local methods
Can be easily applied to any local predictor
Successfully applied to dependency parsing

3
Dependency Tree

A dependency tree structure for a sentence
Syntactic relationships between word pairs in
the sentence

4
Increasing Interest

Dependency parsing has been an active research
area
Dependency trees are much easier to understand
and annotate than constituency trees
Dependency relations have been widely used
Machine translation (Fox 2002, Cherry Lin 2003,
Ding Palmer 2005)
Information extraction (Culotta Sorensen 2004)
Question answering (Pinchak Lin 2006)
Coreference resolution (Bergsma Lin 2006)

5
Overview

Dependency parsing model
Local training methods for dependency parsing
Global training methods
Structured boosting
Experimental results
Conclusion

6
Dependency Parsing Model

W (w1, , wn ) an input sentence
T a candidate dependency tree
the set of possible dependency trees
spanning on W.
Scoring functions

Eisner 1996 McDonald 2005
Feature weights
Feature functions
7
Features for a Link
Lots!

Word pair indicator
POS
of that word pair
Pointwise Mutual Information (PMI) for that word
pair
Distance between words

8
Static Features
p-word/c-word word of parent/child p-pos/c-pos
pos of parent/child
9
Dynamic Features

Take into account the link labels of the
surrounding components when predicting the label
of a target
Commonly used in sequential labeling task
(McCallum et al 2000, Toutanova et al. 2003)
A simple but useful idea for improving structured
predictors
Can also be easily employed for dependency
parsing (Wang et al. 2005, McDonald and Pereira
2006)

10
Local Training Examples

Given training examples (S, T)

local examples
link label features L W1_The,
W2_boy, W1W2_The_boy, T1_DT, T2_NN, T1T2_DT_NN,
Dist_1 L W1_boy, W2_skipped, R
W1_skipped, W2_ school, R W1_skipped,
W2_regularly, N W1_The, W2_skipped,
N W1_The, W2_school, N W1_The,
W2_regularly,
L left link R right link N no link
11
Local Training Methods

Learn a local link predictor given feature inputs
A purely local approach to the learning problem
For each word pair in a sentence
No link, left link or right link ?

12
But, If Only Use a Local Classifier

The output may not be a tree
Dependency parsing is a structured classification
problem, i.e., the output has to be a dependency
tree
Need to satisfy the constraints between the
classifications of different local links

How to perform a structured prediction using a
local prediction model?
13
Local Training Parsing Algorithm
Training sentences (S, T)
Local training examples
Eisner 1996
Link score
Dependency parsing algorithm
A projective spanning tree
Local predictor
Dependency trees
14
Parsing With a Local Link Predictor

Support vector machines (Yamada and Matsumoto
2003)
Logistic regression / Maximum entropy models
(Ratnaparkhi 1999 and Charniak 2000)
But, we can do better if we use global training

15
Global Training for Structured Prediction

Recent global training algorithms for learning
structured predictors
CRFs (Lafferty et at. 2001)
Structured SVMs (Tsochantaridis et al. 2004,
Altun et al. 2003)
Max-Margin Markov Networks (Taskar et al. 2003)
Incorporate the effects of the structured
predictor directly into the training algorithm
These training algorithms have been applied to
parsing (Taskar et al. 2004, McDonald et al.
2005, Wang et al. 2006)

16
But, Drawbacks

Unfortunately, for a complicated task like
dependency parsing, these structured training
techniques
Expensive
Specialized
Complex to implement

17
Our Idea Structured Boosting

Global optimization
Almost as cheap as local methods
Promising results

18
A Generic Boosting Algorithm

Train a local classifier (weak hypothesis), h1
Re-weight the local training examples
Re-train the local link predictor using the
re-weighted training examples, getting h2
Finally we have h1 , h2 , , hk

19
Standard Boosting for Classification
Training sentences (S, T)
Local training examples
Local predictor
Classification
Re-weighting the local examples
20
Structured Boosting

Train a local link predictor, h1
Re-parse training data using h1
Re-weight local examples
Compare the parser outputs with the gold standard
trees
Increase the weight of mis-parsed local examples
Re-train local link predictor, getting h2
Finally we have h1 , h2 , , hk

21
Structured Boosting for Dependency Parsing
Training sentences (S, T)
Local training examples
Eisner 1996
Link score
Dependency parsing algorithm
A projective spanning tree
Local predictor
Re-parse the training data
Dependency trees
Re-weight the local examples
22
Experimental Design

Learning dependency parsers for English and
Chinese
Data set
English Penn Treebank 3.0 (PTB3)
Chinese Treebank 4.0 5.0 (CTB4, CTB5)
Features
Static
Dynamic

23
Experimental Design Cont.

Local training algorithm
Logistic regression model/ Maximum Entropy
Parser
Standard bi-lexical CKY parser (O(n5))
Boosting method
A variant of Adaboost M1 (Freund Schapire 1997)

24
Experimental Results - 1
Table 1 Boosting with static features
25
Experimental Results - 2
Table 2 Boosting with dynamic features
26
Comparison With Others
IWPT 2005
Chinese Treebank 4.0 Chinese Treebank
5.0 _at_ personal communication
27
Conclusion

Structured boosting is a simple and general way
to coordinate local link predictor with global
optimization
Successfully applied to natural language
dependency parsing

28
Thanks!
Questions?
29
Advantages

Structured boosting is a simple variant of
standard boosting algorithms
Local parameter optimization is coordinated to
global structured prediction (parsing)
Training (parameter estimation) is directly
influenced by the resulting global accuracy of
the parser
Simpler, more general, and can be easily applied
to any local predictor

30
Decomposition of Training Data

Given training examples (S, T)
Decomposed into a set of local examples
arbitrary word pairs and their link label (none,
left, right) in context (S, T)
Learn a weight vector over a set of features
defined on the local examples

Write a Comment

User Comments (0)

About PowerShow.com