Direct Translation Approaches: Statistical Machine Translation - PowerPoint PPT Presentation

About This Presentation

Title:

Direct Translation Approaches: Statistical Machine Translation

Description:

Direct Translation Approaches: Statistical Machine Translation. Stephan Vogel, Alicia Tribble ... Stephan Vogel, Alicia Tribble. ... – PowerPoint PPT presentation

Number of Views:184

Avg rating:3.0/5.0

Slides: 31

Provided by: Vog55

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Direct Translation Approaches: Statistical Machine Translation

1
Direct Translation ApproachesStatistical
Machine Translation

Stephan Vogel, Alicia Tribble
Interactive Systems Lab
Carnegie Mellon University
University Karlsruhe

Speech-to-Speech Translation Workshop ESSLLI
2002, Trento, Italy
2
Overview

Translation Approaches
Statistical Machine Translation
Translating with Cascaded Transducers
Experiments on Nespole Data

3
Translation Approaches

Interlingua based
Transfer based
Direct
Example based
Statistical

4
Statistical Machine Translation
Based on Bayes Decision Rule ê argmax p(e
f) argmax p(e) p(f e)
5
Tasks in SMT

Modelling build statistical models which capture
characteristic features of translation
equivalences and of the target language
Training train translation model on bilingual
corpus, train language model on monolingual
corpus
Decoding find best translation for new sentences
according to models

6
Alignment Example
7
Translation Models

IBM1 lexical probabilities only
IBM2 lexicon plus absolut position
HMM lexicon plus relative position
IBM3 plus fertilities
IBM4 inverted relative position alignment
IBM5 non-deficient version of model 4

Brown, et.al. 93, Vogel, et.al. 96
8
HMM Alignment Model
p(fe) Sa p(f1J, a1J e1I) Sa Pj
p(fj , aj f1j-1, a1j-1, e1I) Sa Pj
p(aj aj-1) p(fj ea(j)) maxa Pj
p(aj aj-1) p(fj ea(j))
Alignment aj of current word fj depends on
alignment aj-1 of previous word fj-1 .
9
Phrase Translation

Why?
To capture context
Local word reordering
How?
Train alignment model
Extract phrase-to-phrase translations from
Viterbi path
Notes
Often better results when training target to
source for extraction of phrase translations
Phrases are not fully integrated into alignment
model, they are extracted only after training is
completed

10
Translation with Transducers

Transducer
Finite state machine
Read sequence of words, write sequene of words
Output vocaculary can be different from input
vocabulary
Transducer used in current implementation
Tree Transducer, i.e. prefix tree over input
strings
Output from final states
Used to encode lexicon, phrase translations,
bilingual word classes and grammers

11
Cascaded Transducers

Generalization through cascaded transducers
Replace words by category labels and have a
transducer for each category

Vogel, Ney 2000
12
Language Model

Standard n-gram model
p(w1 ... wn) Pi p(wi w1... wi-1)
Pi p(wi wi-2 wi-1)
trigram
Pi p(wi wi-1)
bigram
Many events not seen -gt smoothing required

13
Decoding Strategies

Sequential construction of target sentence
Extend partial translation by words which are
translations of words in the source sentence
Language model can be applied immediately
Mechanism to ensure proper coverage of source
sentence required
Left right over source sentence
Find translations for sequences of words
Construct translation lattice
Apply language model and select best path

14
Translation Graph
15
Speech Recognition and Translation

Search best string in target language for given
acoutsic signal in source language
ê argmax p(e) p(xe)
argmax p(e) Sf p(f,xe)
argmax p(e) Sf p(fe) p(f) p(xf,x)
argmax p(e) Sf p(fe) p(f) p(xf)
i.e. recognizer language model not needed !?

Ney, 2001
16
Coupling Recognition and Translation

Sequential first recognition, then translation
First best recognition hypothesis
N-best list translate n times
Word lattice translate all pathes in lattice,
reuse results from partial pathes
Integrated recognition and translation in
combined search
Subsequential transducer approach uses this
Note In Eutrans project best results when
translation on first-best hypothesis

17
Example-Based Machine Translation

Re-use translations to create new translations
Store bilingual corpus with (partial) alignment
Find partial matches, i.e. sequences of words in
stored corpus to cover a new sentence
Extract translation(s) and build translation
lattice
Apply language model to find best path, i.e. best
translation

18
Nespole Experiments

Application of direct translation techniques to
dialogue data collected in Nespole!
Testing the effect of phrase translation
Experiments with additional knowledge sources
Preexisting monolingual data for the LM and
publically available Lexica
Engineered handwritten rules for fixed
expressions and knowledge extracted from semantic
grammars

19
Nespole Project Data

CMU database of dialogues in the travel domain
German, English (Italian, French)
Speech recognizer hypotheses and human
transcriptions both available
Segmented into SDUs (Speech Dialogue Units)

20
Nespole Corpus Training
3182 Parallel SDUs
Language English German
Tokens 15572 14992
Vocabulary 1032 1338
Singletons 404 620
21
Nespole Corpus Testing
70 Parallel SDUs
German Reference A Reference B
Tokens 437 610 607
Vocabulary 183 (45 OOV) 165 160
22
Corpus Challenges Sentence Length
Training Data
Testing Data
23
Evaluation

Human Scoring
Good, Okay, Bad (c.f. Nespole evaluation)
Collapsed into a human score on 0,1
Bleu Score
Average of N-gram precisions from (1..N),
typically N3 or 4
Penalty for short translations to substitute for
recall measure

Papinini et.al. 2001
24
Phrase Translation

Unequal sentence lengths means that training can
be improved directionally S T or T
S
German compounds are better for 1 to many
alignments with English multiword phrases, so
direction is important

Statistical lexicon alone Statistical lexicon, phrases from S T training Statistical lexicon, phrases from bidir. training
0,1903 0,2350 0,2654
25
Language Model

Monolingual text available from Verbmobil
500.000 words (32x the size of orig. English
corpus)
Helps to choose among translation hypotheses but
will not generate new ones

Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and small LM Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and large LM
0,2613 0,3172
26
General-Purpose Lexicon
Statistical lexicon, phrases, and fixed exps with small LM 0,2654
Adding general-purpose lexicon as a transducer 0,2522
Using large instead of small LM 0,3141
general-purpose lexicon as training data instead of separate transducer 0,3275
27
Fixed Expression Rules

Transducer rules are human readable and can be
added by hand
Fixed expressions for times and dates are
re-usable, require less time to build than
domain-specific rules and improve coverage of
some semi-idiomatic constructions.

Statistical lexicon with small LM Statistical lexicon and fixed-expression transducer with small LM
0,1893 0,1903
28
Knowledge from Existing Grammars

Could help in domain- but not language-
portability
Benefit mostly in additional vocabulary

Statistical lexicon, fixed exps, phrases, and general lexicon with large LM Statistical lexicon, fixed exps, phrases, general lexicon and I-transducer with large LM
0,3141 0,3172
29
Comparative Evaluation Results
Good Okay Bad Score Bleu
Text IF 77 104 227 0,32 0,068
SMT 127 80 205 0,40 0,333
Speech IF 64 101 243 0,28 0,059
SMT 95 83 227 0,34 0,262
30
Selected References
Peter F. Brown, Stephen A. Della Pietra, Vincent
J. Della Pietra, Robert L. Mercer. The
Mathematics of Statistical Machine Translation
Parameter Estimation, Computational Linguistics,
1993, 19,2, pp.263311 Stephan Vogel, Hermann
Ney, Christoph Tillmann. HMM-Based Word
Alignment in Statistical Translation. Int. Conf.
on Computational Linguistics, Kopenhagen,
Danemark, pp. 836-841, August 1996. Stephan
Vogel, Hermann Ney. Translation with Cascaded
Finite State Transducers. 36th Annual Conference
of the Association for Computational Linguistics,
pp. 23-30, Hongkong, China, October2000. Stephan
Vogel, Alicia Tribble. Improving statistical
machine translation for a speech-to-speech
translation task. To appear in ICSLP 2002. H.
Ney. The Statistical Approach to Spoken Language
Translation. Proc. IEEE Automatic Speech
Recognition and Understanding Workshop, Madonna
di Campiglio, Trento, Italy, 8 pages, CD ROM,
IEEE Catalog No. 01EX544, December 2001. Kishore
Papinini, Salim Roukos, Todd Ward, Wei-Jing Zhu.
Bleu a Method for Automatic Evaluation ofMachine
Translation. IBM Research Report
RC22176(W0109-022), September17, 2001.

Write a Comment

User Comments (0)