Title: Machine Translation
1Machine Translation
- A Presentation by
- Julie Conlonova,
- Rob Chase,
- and Eric Pomerleau
2Overview
- Language Alignment System
- Datasets
- Sentence-aligned sets for training (ex. The
Hansards Corpus, European Parliamentary
Proceedings Parallel Corpus) - A word-aligned set for testing and evaluation to
measure accuracy and precision - Decoding
3Language Alignment
- Goal Produce a word-aligned set from a
sentence-aligned dataset - First step on the road toward Statistical Machine
Translation - Example Problem
- The motion to adjourn the House is now deemed to
have been adopted. - La motion portant que la Chambre s'ajourne
maintenant est réputée adoptée.
4IBM Models 1 and 2-Kevin Knight, A Statistical
MT Tutorial Workbook, 1999
- Each capable of being used to produce a
word-aligned dataset separately. - EM Algorithm
- Model 1 produces T-values based on normalized
fractional counting of corresponding words. - Additionally, Model 2 uses A-values for reverse
distortion probabilities probabilities based
on the positions of the words
5Training Data
- European Parliament Proceedings Parallel Corpus
1996-2003 - Aligned Languages
- English - French
- English - Dutch
- English - Italian
- English - Finish
- English - Portuguese
- English - Spanish
- English - Greek
6Training Data cont.
- Eliminated
- Misaligned sentences
- Sentences with 50 or more words
- XML tags
- Symbols and numerical characters other then
commas and periods
7Ideally
http//www.cs.berkeley.edu/klein/cs294-5
8Bypassing Interlingua Models I-III
- Variables contributing to the probability of a
sentence - Correlation between words in the source/target
languages - Fertility of a word
- Correlation between order of words in source
sentence and order of words in target
9A Translation Matrix
Rob Cat is Dog
Rob 1 0 0 0
Gato 0 1 0 0
es 0 0 .5 0
esta 0 0 .5 0
Perro 0 0 0 1
10Building the Translation Matrix Starting from
alignments
- Find the sentence alignment
- If a word in the source aligns with a word in the
target, then increment the translation matrix. - Normalize the translation matrix
11Cant find alignments
- Most sentences in the hansards corpus are 60
words long. There are many that can be over 100. - 100100 possible alignments
12Counting
- Rob is a boy. Rob es nino.
- Rob is tall. Rob es alto.
- Eric is tall. Eric es alto.
-
- Base counts on co-occurrence, weighting based on
sentence length.
13Iterative Convergence
- Use Estimation Maximization algorithm
- Creates translation matrix
Rob Is Tall boy
Rob .66 .33 .25 .25
es .30 .66 .25 .25
alto .2 .05 .5 0
nino .2 .05 0 .5
14Distorting the Sentence
- Word order changes between languages
- How is a sentence with 2 words distorted?
- How is a sentence with 3 words distorted?
- How is a sentence with
- To keep track of this information we use
15A tesseract!
- (A quadruply nested default dictionary)
- This could be a problem if there are more than
100 words in a sentence. - 100x100x100x100 too big for RAM and takes too
much time
16Broad Look at MT
- The translation process can be described simply
as - Decoding the meaning of the source text, and
- Re-encoding this meaning in the target language.
- - Translation Process, Wikipedia, May 2006
17Decoding
- How to go from the T-matrix and A-matrix to a
word alignment? - There are several approaches
18Viterbi
- If only doing alignment, much smaller memory and
time requirements. - Returns optimal path.
- T-Matrix probabilities function as the emission
matrix - A-Matrix probabilities concerned with the
positioning of words
19Decoding as a Translator
- Without supplying a translated sentence to the
program, it is capable of being a stand-alone
translator instead of a word aligner. - However, while the Viterbi algorithm runs quickly
with pruning for decoding, for translating the
run time skyrockets.
20Greedy Hill ClimbingKnight Koehn, Whats New
in Statistical Machine Translation, 2003
- Best first search
- 2-step look ahead to avoid getting stuck in most
probable local maxima
21Beam SearchKnight Koehn, Whats New in
Statistical Machine Translation, 2003
- Optimization of Best First Search with heuristics
and beam of choices - Exponential tradeoff when increasing the beam
width
22Other Decoding MethodsKnight Koehn, Whats New
in Statistical Machine Translation, 2003
- Finite State Transducer
- Mapping between languages based on a finite
automaton - Parsing
- String to Tree Model
23Problem One to Many
- Necessary to take all alignments over a certain
probability in order to capture the probability
that e has fertility at least a given value
Al-Onaizan, Curin, Jahr, etc., Statistical
Machine Translation, 1999
24Results
- Study done in 2003 on word alignment error rates
in Hansards corpus - Model 2
- 29.3 on 8K training sentence pairs
- 19.5 on 1.47M training sentence pairs
- Optimized Model 6
- 20.3 on 8K training sentence pairs
- 8.7 on 1.47M training sentence pairs
- Och and Ney, A Systematic Comparison of Various
Statistical Alignment Models, 2003
25Expected Accuracy
- 70 overall
- Language performance
- Dutch
- French
- Italian, Spanish, Portuguese
- Greek
- Finish
26Possible Future Work
- Given more time, we wouldve implemented IBM
Model 3 - Additionally uses n, p, and d fertilities for
weighted alignments - N, number of words produced by one word
- D, distortion
- P, parameter involving words that arent involved
directly - Invokes Model 2 for scoring
27Another Possible Translation Scheme
- Example-Based Machine Translation
- Translation-by-Analogy
- Can sometimes achieve better than the gist
translations from other models
28Why Is Improving Machine Translation Necessary?
29A Chinese to English Translation
30The End
- Are there any questions/comments?
31(No Transcript)