Title: SemiSupervised Boosting for Statistical Word Alignment
1Semi-Supervised Boosting for Statistical Word
Alignment
2Outline
- Introduction to semi-supervised learning
- Introduction to boosting
- Semi-supervised boosting for word alignment
- Evaluation results
- Conclusion
3Machine Learning Methods
- Supervised Learning
- Labeled data
- Unsupervised learning
- Unlabeled data
- Semi-supervised learning
- Combine both labeled data and unlabeled data
4Semi-Supervised Learning in NLP
- Word sense disambiguation
- (Yarowsky, 1995 Pham et al., 2005)
- Classification
- (Blum and Mitchell, 1998 Thorsten, 1999)
- Clustering
- (Basu et al., 2004)
- Named entity classification
- (Collins and Singer, 1999)
- Parsing
- (Sarkar, 2001)
5Boosting Supervised Learning
Initialization
Supervised Learning
Call Learner
Calculate Error Rate
Re-weight Training data
Yes
Build Ensemble
6Boosting in NLP
- Tagging and PP attachment
- (Abney et al., 1999)
- Word sense disambiguation
- (Escudero et al., 2000)
- Parser construction
- (Haruno et al., 1999 Henderson and Brill, 2000)
- Sentence generation
- (Walker et al., 2001)
7Semi-Supervised Boosting
- Three main problems
- Semi-supervised learner
- Combine labeled data and unlabeled data
- Reference set
- Automatically construct a reference set for
unlabeled data - Error rate calculation
- How to calculate the error rate with both labeled
data and unlabeled data
8Semi-Supervised Boosting Applied to Word Alignment
Labeled Data
Unlabeled Data
Supervised Training
Unsupervised Training
Model Interpolation
Real Reference Set
Error Rate Calculation
Pseudo Reference Set
Re-weight Training data
Yes
Build Ensemble
9Semi-Supervised Boosting Applied to Word Alignment
- Five main components
- Word alignment model interpolation
- Pseudo reference set construction for unlabeled
data - Error rate calculation
- Weight update
- Final Ensemble
10Word Alignment Model
- Supervised alignment model
- Calculate the probabilities for IBM Model 4 based
on the labeled data - Unsupervised alignment model
- Use GIZA to train IBM Model 4
- Perform model interpolation
-
11Pseudo Reference Set Construction
- Obtain bi-directional word alignment sets S1 and
S2 on the training data - Obtain the intersection set of these two
alignment sets - Filter the union set of the two alignment sets
- Build the pseudo reference set
where
12Error Rate Calculation
- For a sentence pair
- Calculate the error rate of a aligner
- Based on the labeled data instead of the whole
data
where
is the normalized weight of the ith sentence pair
at the lth round
13Re-Weight the Training Data
- Reweight each sentence pair in the training set
- For each sentence pair, there may exist correct
links and incorrect links as compared with the
pseudo reference set - Calculate the weight of each sentence pair
according to the correct and incorrect links
where
K is the number of the error links n is the total
number of the links in the reference
14Final Ensemble
- Obtain the final ensemble according to the
trained word aligners on each round
where
is the final ensemble for word alignment
is the weight of each alignment pair (s,t)
produced by the word aligner
is the weight of the word aligner
15Evaluation
- Training set
- Unlabeled data 320,000 English-Chinese pairs
- Labeled data 30,000 English-Chinese pairs
- Held-out set
- 1,500 sentence pairs
- Testing set
- 1,000 bilingual English-Chinese sentence pairs
- Totally 8,651 alignment links
16Evaluation Metric
- Word alignment
- Precision and Recall
- Alignment Error Rate (AER)
- Phrase-based machine translation
- System Pharaoh
- Metrics NIST and BLEU
17Word Alignment Results
18Weights in Ensembles
- Two kinds of weights
- Weights for the individual aligners
- Weights for the individual alignment links
Baseline only use the first kind of weights Our
method use the two kinds of weights
19Translation Results
20Conclusion
- Features in our semi-supervised boosting method
- Perform model interpolation
- Automatically build pseudo reference set
- Calculate the error rate of training set with the
labeled data - Use two kinds of weights in the ensemble
- One for aligners
- The other for alignment links
- Boosting improves the word alignment and
translation quality - Boosting does improve word alignment and
translation quality - Semi-supervised boosting performs the best
21Thanks!