Direct Translation Approaches: Statistical Machine Translation - PowerPoint PPT Presentation

About This Presentation
Title:

Direct Translation Approaches: Statistical Machine Translation

Description:

Direct Translation Approaches: Statistical Machine Translation. Stephan Vogel, Alicia Tribble ... Stephan Vogel, Alicia Tribble. ... – PowerPoint PPT presentation

Number of Views:184
Avg rating:3.0/5.0
Slides: 31
Provided by: Vog55
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Direct Translation Approaches: Statistical Machine Translation


1
Direct Translation ApproachesStatistical
Machine Translation
  • Stephan Vogel, Alicia Tribble
  • Interactive Systems Lab
  • Carnegie Mellon University
  • University Karlsruhe

Speech-to-Speech Translation Workshop ESSLLI
2002, Trento, Italy
2
Overview
  • Translation Approaches
  • Statistical Machine Translation
  • Translating with Cascaded Transducers
  • Experiments on Nespole Data

3
Translation Approaches
  • Interlingua based
  • Transfer based
  • Direct
  • Example based
  • Statistical

4
Statistical Machine Translation
Based on Bayes Decision Rule ê argmax p(e
f) argmax p(e) p(f e)
5
Tasks in SMT
  • Modelling build statistical models which capture
    characteristic features of translation
    equivalences and of the target language
  • Training train translation model on bilingual
    corpus, train language model on monolingual
    corpus
  • Decoding find best translation for new sentences
    according to models

6
Alignment Example
7
Translation Models
  • IBM1 lexical probabilities only
  • IBM2 lexicon plus absolut position
  • HMM lexicon plus relative position
  • IBM3 plus fertilities
  • IBM4 inverted relative position alignment
  • IBM5 non-deficient version of model 4

Brown, et.al. 93, Vogel, et.al. 96
8
HMM Alignment Model
p(fe) Sa p(f1J, a1J e1I) Sa Pj
p(fj , aj f1j-1, a1j-1, e1I) Sa Pj
p(aj aj-1) p(fj ea(j)) maxa Pj
p(aj aj-1) p(fj ea(j))
Alignment aj of current word fj depends on
alignment aj-1 of previous word fj-1 .
9
Phrase Translation
  • Why?
  • To capture context
  • Local word reordering
  • How?
  • Train alignment model
  • Extract phrase-to-phrase translations from
    Viterbi path
  • Notes
  • Often better results when training target to
    source for extraction of phrase translations
  • Phrases are not fully integrated into alignment
    model, they are extracted only after training is
    completed

10
Translation with Transducers
  • Transducer
  • Finite state machine
  • Read sequence of words, write sequene of words
  • Output vocaculary can be different from input
    vocabulary
  • Transducer used in current implementation
  • Tree Transducer, i.e. prefix tree over input
    strings
  • Output from final states
  • Used to encode lexicon, phrase translations,
    bilingual word classes and grammers

11
Cascaded Transducers
  • Generalization through cascaded transducers
  • Replace words by category labels and have a
    transducer for each category

Vogel, Ney 2000
12
Language Model
  • Standard n-gram model
  • p(w1 ... wn) Pi p(wi w1... wi-1)
  • Pi p(wi wi-2 wi-1)
    trigram
  • Pi p(wi wi-1)
    bigram
  • Many events not seen -gt smoothing required

13
Decoding Strategies
  • Sequential construction of target sentence
  • Extend partial translation by words which are
    translations of words in the source sentence
  • Language model can be applied immediately
  • Mechanism to ensure proper coverage of source
    sentence required
  • Left right over source sentence
  • Find translations for sequences of words
  • Construct translation lattice
  • Apply language model and select best path

14
Translation Graph
15
Speech Recognition and Translation
  • Search best string in target language for given
    acoutsic signal in source language
  • ê argmax p(e) p(xe)
  • argmax p(e) Sf p(f,xe)
  • argmax p(e) Sf p(fe) p(f) p(xf,x)
    argmax p(e) Sf p(fe) p(f) p(xf)
  • i.e. recognizer language model not needed !?

Ney, 2001
16
Coupling Recognition and Translation
  • Sequential first recognition, then translation
  • First best recognition hypothesis
  • N-best list translate n times
  • Word lattice translate all pathes in lattice,
    reuse results from partial pathes
  • Integrated recognition and translation in
    combined search
  • Subsequential transducer approach uses this
  • Note In Eutrans project best results when
    translation on first-best hypothesis

17
Example-Based Machine Translation
  • Re-use translations to create new translations
  • Store bilingual corpus with (partial) alignment
  • Find partial matches, i.e. sequences of words in
    stored corpus to cover a new sentence
  • Extract translation(s) and build translation
    lattice
  • Apply language model to find best path, i.e. best
    translation

18
Nespole Experiments
  • Application of direct translation techniques to
    dialogue data collected in Nespole!
  • Testing the effect of phrase translation
  • Experiments with additional knowledge sources
  • Preexisting monolingual data for the LM and
    publically available Lexica
  • Engineered handwritten rules for fixed
    expressions and knowledge extracted from semantic
    grammars

19
Nespole Project Data
  • CMU database of dialogues in the travel domain
  • German, English (Italian, French)
  • Speech recognizer hypotheses and human
    transcriptions both available
  • Segmented into SDUs (Speech Dialogue Units)

20
Nespole Corpus Training
3182 Parallel SDUs
Language English German
Tokens 15572 14992
Vocabulary 1032 1338
Singletons 404 620
21
Nespole Corpus Testing
70 Parallel SDUs
German Reference A Reference B
Tokens 437 610 607
Vocabulary 183 (45 OOV) 165 160
22
Corpus Challenges Sentence Length
Training Data
Testing Data
23
Evaluation
  • Human Scoring
  • Good, Okay, Bad (c.f. Nespole evaluation)
  • Collapsed into a human score on 0,1
  • Bleu Score
  • Average of N-gram precisions from (1..N),
    typically N3 or 4
  • Penalty for short translations to substitute for
    recall measure

Papinini et.al. 2001
24
Phrase Translation
  • Unequal sentence lengths means that training can
    be improved directionally S T or T
    S
  • German compounds are better for 1 to many
    alignments with English multiword phrases, so
    direction is important

Statistical lexicon alone Statistical lexicon, phrases from S T training Statistical lexicon, phrases from bidir. training
0,1903 0,2350 0,2654
25
Language Model
  • Monolingual text available from Verbmobil
  • 500.000 words (32x the size of orig. English
    corpus)
  • Helps to choose among translation hypotheses but
    will not generate new ones

Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and small LM Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and large LM
0,2613 0,3172
26
General-Purpose Lexicon
Statistical lexicon, phrases, and fixed exps with small LM 0,2654
Adding general-purpose lexicon as a transducer 0,2522
Using large instead of small LM 0,3141
general-purpose lexicon as training data instead of separate transducer 0,3275
27
Fixed Expression Rules
  • Transducer rules are human readable and can be
    added by hand
  • Fixed expressions for times and dates are
    re-usable, require less time to build than
    domain-specific rules and improve coverage of
    some semi-idiomatic constructions.

Statistical lexicon with small LM Statistical lexicon and fixed-expression transducer with small LM
0,1893 0,1903
28
Knowledge from Existing Grammars
  • Could help in domain- but not language-
    portability
  • Benefit mostly in additional vocabulary

Statistical lexicon, fixed exps, phrases, and general lexicon with large LM Statistical lexicon, fixed exps, phrases, general lexicon and I-transducer with large LM
0,3141 0,3172
29
Comparative Evaluation Results
Good Okay Bad Score Bleu
Text IF 77 104 227 0,32 0,068
SMT 127 80 205 0,40 0,333
Speech IF 64 101 243 0,28 0,059
SMT 95 83 227 0,34 0,262
30
Selected References
Peter F. Brown, Stephen A. Della Pietra, Vincent
J. Della Pietra, Robert L. Mercer. The
Mathematics of Statistical Machine Translation
Parameter Estimation, Computational Linguistics,
1993, 19,2,  pp.263311 Stephan Vogel, Hermann
Ney, Christoph Tillmann. HMM-Based Word
Alignment in Statistical Translation. Int. Conf.
on Computational Linguistics, Kopenhagen,
Danemark, pp. 836-841, August 1996. Stephan
Vogel, Hermann Ney. Translation with Cascaded
Finite State Transducers. 36th Annual Conference
of the Association for Computational Linguistics,
pp. 23-30, Hongkong, China, October2000. Stephan
Vogel, Alicia Tribble. Improving statistical
machine translation for a speech-to-speech
translation task. To appear in ICSLP 2002. H.
Ney. The Statistical Approach to Spoken Language
Translation. Proc. IEEE Automatic Speech
Recognition and Understanding Workshop, Madonna
di Campiglio, Trento, Italy, 8 pages, CD ROM,
IEEE Catalog No. 01EX544, December 2001. Kishore
Papinini, Salim Roukos, Todd Ward, Wei-Jing Zhu.
Bleu a Method for Automatic Evaluation ofMachine
Translation. IBM Research Report
RC22176(W0109-022), September17, 2001.  
Write a Comment
User Comments (0)
About PowerShow.com