Title: Direct Translation Approaches: Statistical Machine Translation
1Direct Translation ApproachesStatistical
Machine Translation
- Stephan Vogel, Alicia Tribble
- Interactive Systems Lab
- Carnegie Mellon University
- University Karlsruhe
Speech-to-Speech Translation Workshop ESSLLI
2002, Trento, Italy
2Overview
- Translation Approaches
- Statistical Machine Translation
- Translating with Cascaded Transducers
- Experiments on Nespole Data
3Translation Approaches
- Interlingua based
- Transfer based
- Direct
- Example based
- Statistical
4Statistical Machine Translation
Based on Bayes Decision Rule ê argmax p(e
f) argmax p(e) p(f e)
5Tasks in SMT
- Modelling build statistical models which capture
characteristic features of translation
equivalences and of the target language - Training train translation model on bilingual
corpus, train language model on monolingual
corpus - Decoding find best translation for new sentences
according to models
6Alignment Example
7Translation Models
- IBM1 lexical probabilities only
- IBM2 lexicon plus absolut position
- HMM lexicon plus relative position
- IBM3 plus fertilities
- IBM4 inverted relative position alignment
- IBM5 non-deficient version of model 4
Brown, et.al. 93, Vogel, et.al. 96
8HMM Alignment Model
p(fe) Sa p(f1J, a1J e1I) Sa Pj
p(fj , aj f1j-1, a1j-1, e1I) Sa Pj
p(aj aj-1) p(fj ea(j)) maxa Pj
p(aj aj-1) p(fj ea(j))
Alignment aj of current word fj depends on
alignment aj-1 of previous word fj-1 .
9Phrase Translation
- Why?
- To capture context
- Local word reordering
- How?
- Train alignment model
- Extract phrase-to-phrase translations from
Viterbi path - Notes
- Often better results when training target to
source for extraction of phrase translations - Phrases are not fully integrated into alignment
model, they are extracted only after training is
completed
10Translation with Transducers
- Transducer
- Finite state machine
- Read sequence of words, write sequene of words
- Output vocaculary can be different from input
vocabulary - Transducer used in current implementation
- Tree Transducer, i.e. prefix tree over input
strings - Output from final states
- Used to encode lexicon, phrase translations,
bilingual word classes and grammers
11Cascaded Transducers
- Generalization through cascaded transducers
- Replace words by category labels and have a
transducer for each category
Vogel, Ney 2000
12Language Model
- Standard n-gram model
- p(w1 ... wn) Pi p(wi w1... wi-1)
- Pi p(wi wi-2 wi-1)
trigram - Pi p(wi wi-1)
bigram - Many events not seen -gt smoothing required
13Decoding Strategies
- Sequential construction of target sentence
- Extend partial translation by words which are
translations of words in the source sentence - Language model can be applied immediately
- Mechanism to ensure proper coverage of source
sentence required - Left right over source sentence
- Find translations for sequences of words
- Construct translation lattice
- Apply language model and select best path
14Translation Graph
15Speech Recognition and Translation
- Search best string in target language for given
acoutsic signal in source language - ê argmax p(e) p(xe)
- argmax p(e) Sf p(f,xe)
- argmax p(e) Sf p(fe) p(f) p(xf,x)
argmax p(e) Sf p(fe) p(f) p(xf) - i.e. recognizer language model not needed !?
Ney, 2001
16Coupling Recognition and Translation
- Sequential first recognition, then translation
- First best recognition hypothesis
- N-best list translate n times
- Word lattice translate all pathes in lattice,
reuse results from partial pathes - Integrated recognition and translation in
combined search - Subsequential transducer approach uses this
- Note In Eutrans project best results when
translation on first-best hypothesis
17Example-Based Machine Translation
- Re-use translations to create new translations
- Store bilingual corpus with (partial) alignment
- Find partial matches, i.e. sequences of words in
stored corpus to cover a new sentence - Extract translation(s) and build translation
lattice - Apply language model to find best path, i.e. best
translation
18Nespole Experiments
- Application of direct translation techniques to
dialogue data collected in Nespole! - Testing the effect of phrase translation
- Experiments with additional knowledge sources
- Preexisting monolingual data for the LM and
publically available Lexica - Engineered handwritten rules for fixed
expressions and knowledge extracted from semantic
grammars
19Nespole Project Data
- CMU database of dialogues in the travel domain
- German, English (Italian, French)
- Speech recognizer hypotheses and human
transcriptions both available - Segmented into SDUs (Speech Dialogue Units)
20Nespole Corpus Training
3182 Parallel SDUs
Language English German
Tokens 15572 14992
Vocabulary 1032 1338
Singletons 404 620
21Nespole Corpus Testing
70 Parallel SDUs
German Reference A Reference B
Tokens 437 610 607
Vocabulary 183 (45 OOV) 165 160
22Corpus Challenges Sentence Length
Training Data
Testing Data
23Evaluation
- Human Scoring
- Good, Okay, Bad (c.f. Nespole evaluation)
- Collapsed into a human score on 0,1
- Bleu Score
- Average of N-gram precisions from (1..N),
typically N3 or 4 - Penalty for short translations to substitute for
recall measure
Papinini et.al. 2001
24Phrase Translation
- Unequal sentence lengths means that training can
be improved directionally S T or T
S - German compounds are better for 1 to many
alignments with English multiword phrases, so
direction is important
Statistical lexicon alone Statistical lexicon, phrases from S T training Statistical lexicon, phrases from bidir. training
0,1903 0,2350 0,2654
25Language Model
- Monolingual text available from Verbmobil
- 500.000 words (32x the size of orig. English
corpus) - Helps to choose among translation hypotheses but
will not generate new ones
Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and small LM Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and large LM
0,2613 0,3172
26General-Purpose Lexicon
Statistical lexicon, phrases, and fixed exps with small LM 0,2654
Adding general-purpose lexicon as a transducer 0,2522
Using large instead of small LM 0,3141
general-purpose lexicon as training data instead of separate transducer 0,3275
27Fixed Expression Rules
- Transducer rules are human readable and can be
added by hand - Fixed expressions for times and dates are
re-usable, require less time to build than
domain-specific rules and improve coverage of
some semi-idiomatic constructions.
Statistical lexicon with small LM Statistical lexicon and fixed-expression transducer with small LM
0,1893 0,1903
28Knowledge from Existing Grammars
- Could help in domain- but not language-
portability - Benefit mostly in additional vocabulary
Statistical lexicon, fixed exps, phrases, and general lexicon with large LM Statistical lexicon, fixed exps, phrases, general lexicon and I-transducer with large LM
0,3141 0,3172
29Comparative Evaluation Results
Good Okay Bad Score Bleu
Text IF 77 104 227 0,32 0,068
SMT 127 80 205 0,40 0,333
Speech IF 64 101 243 0,28 0,059
SMT 95 83 227 0,34 0,262
30Selected References
Peter F. Brown, Stephen A. Della Pietra, Vincent
J. Della Pietra, Robert L. Mercer. The
Mathematics of Statistical Machine Translation
Parameter Estimation, Computational Linguistics,
1993, 19,2, pp.263311 Stephan Vogel, Hermann
Ney, Christoph Tillmann. HMM-Based Word
Alignment in Statistical Translation. Int. Conf.
on Computational Linguistics, Kopenhagen,
Danemark, pp. 836-841, August 1996. Stephan
Vogel, Hermann Ney. Translation with Cascaded
Finite State Transducers. 36th Annual Conference
of the Association for Computational Linguistics,
pp. 23-30, Hongkong, China, October2000. Stephan
Vogel, Alicia Tribble. Improving statistical
machine translation for a speech-to-speech
translation task. To appear in ICSLP 2002. H.
Ney. The Statistical Approach to Spoken Language
Translation. Proc. IEEE Automatic Speech
Recognition and Understanding Workshop, Madonna
di Campiglio, Trento, Italy, 8 pages, CD ROM,
IEEE Catalog No. 01EX544, December 2001. Kishore
Papinini, Salim Roukos, Todd Ward, Wei-Jing Zhu.
Bleu a Method for Automatic Evaluation ofMachine
Translation. IBM Research Report
RC22176(W0109-022), September17, 2001. Â