Title: Discriminative Modeling extraction Sets for Machine Translation
1Discriminative Modeling extraction Sets for
Machine Translation
- Author
- John DeNero and Dan Klein UC Berkeley
- Presenter
- Justin Chiu
2Contribution
- Extraction set
- Nested collections of all the overlapping phrase
pairs consistent with an underlying
word-alignment - Advantages over word-factored alignment model
- Can incorporate features on phrase pairs, more
than word link - Optimize a extraction-based loss function really
direct to generating translation - Perform better than both supervised and
unsupervised baseline
3Progress of Statistical MT
- Generate translated sentences word by word
- Using while fragments of training example,
building translation rules - Aligned at the word level
- Extract fragment-level rules from word aligned
sentence pair - Tree to string translation
- Extraction Set Models
- Set of all overlapping phrasal translation rule
alignment
4Outline
- Extraction Set Models
- Model Estimation
- Model Inference
- Experiments
5Extraction set models
6Extraction Set Models
- Input
- Unaligned sentence
- Output
- Extraction set of phrasal translation rules
- Word alignment
7Extraction Sets from Word Alignments
8Extraction Sets from Word Alignments
9Extraction Sets from Word Alignments
10Possible and Null Alignment Links
- Possible links has two types
- Function words that is unique in its language
- Short phrase that has no lexical equivalent
- Null alignment
- Express content that isabsent in its translation
-
-
11Interpreting Possible and Null Alignment Links
12Interpreting Possible and Null Alignment Links
13Linear Model for Extraction Set
14Scoring Extraction Sets
15Model Estimation
16MIRA(Margin-infused Relaxed Algorithm)
17Extraction Set Loss Function
18Model Inference
19Possible Decompositions
20DP for Extraction Sets
21DP for Extraction Sets
22Finding Pseudo-Gold ITG Alignment
23Experiments
24Five systems for comparison
- Unsupervised baseline
- Giza
- Joint HMM
- Supervised baseline
- Block ITG
- Extraction Set Coarse Pass
- Does not score bispans that corss bracketing of
ITG derivations - Full Extraction Set Model
25Data
- Discriminative training and alignment evaluation
- Trained baseline HMM on 11.3 million words of
FBIS newswire data - Hand-aligned portion of the NIST MT02 test set
- 150 training and 191 test sentences
- End-to-end translation experiments
- Trained on 22.1 million word prarllel corpus
consisting of sentence up to 40 of newswire data
from GALE program - NIST MT04/MT05 test sets
26Results
27Discussion
- Syntax labels v.s words
- Word align to rule ? Rule to word align
- Information from two directions
- 65 of type 1 error