MEMT: MultiEngine Machine Translation - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

MEMT: MultiEngine Machine Translation

Description:

... ( 11 ) 7 years ) woman , wife of bus drivers Egyptian nationality . : 0.5293 ... to the man and his wife and bus drivers egyptian nationality young girls. ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 19
Provided by: AlonL
Category:

less

Transcript and Presenter's Notes

Title: MEMT: MultiEngine Machine Translation


1
MEMTMulti-Engine Machine Translation
  • Faculty
  • Alon Lavie, Robert Frederking, Ralf Brown, Jaime
    Carbonell
  • Students
  • Shyamsundar Jayaraman, Satanjeev Banerjee

2
Goals and Approach
  • Combine the output of multiple MT engines into a
    synthetic output that outperforms the originals
    in translation quality
  • Synthetic combination of the originals, NOT
    selecting the best system
  • Experimented with two approaches
  • Approach-1 Merging of Lattice outputs joint
    decoding
  • Each MT system produces a lattice of translation
    fragments, indexed based on source word positions
  • Lattices are merged into a single common lattice
  • Statistical MT decoder selects a translation
    path through the lattice
  • Approach-2 Align best output from engines new
    decoder
  • Each MT system produces a sentence translation
    output
  • Establish an explicit word matching between all
    words of the various MT engine outputs
  • Decoding create a collection of synthetic
    combinations of the original strings based on
    matched words, target LM, and constraints
    re-combination and pruning
  • Score resulting hypotheses and select a final
    output

3
Approach-1 Lattice MEMT
  • Approach
  • Multiple MT systems produce a lattice of output
    segments
  • Create a union lattice of the various systems
  • Decode the joint lattice and select best
    synthetic output

4
Lattice MEMT
  • Main Drawbacks
  • Requires MT engines to provide lattice output
    ? difficult to obtain!
  • Lattice output from all engines must be
    compatible common indexing based on source word
    positions ? difficult to standardize!
  • Common TM used for scoring edges may not work
    well for all engines
  • Decoding does not take into account any
    reinforcements from multiple engines proposing
    the same translation for any portion of the input

5
Approach-2 Sentence MEMT
  • Idea
  • Start with output sentences of the various MT
    engines
  • Explicitly align the words that are common
    between any pair of systems, and apply
    transitivity
  • Use the alignments as reinforcement and as
    indicators of possible locations for the words
  • Each engine has a weight that is used for the
    words that it contributes
  • Decoder searches for an optimal synthetic
    combination of words and phrases that optimizes a
    scoring function that combines the alignment
    weights and a LM score

6
The Sentence Matcher
  • Developed by Satanjeev Banerjee as a component in
    our METEOR Automatic MT Evaluation metric
  • Finds maximal alignment match with minimal
    crossing branches
  • Implementation Clever search algorithm for best
    match using pruning of sub-optimal sub-solutions

7
Matcher Example
  • IBM the sri lankan prime minister criticizes
    head of the country's
  • ISI The President of the Sri Lankan Prime
    Minister Criticized the President of the Country
  • CMU Lankan Prime Minister criticizes her country

8
The MEMT Algorithm
  • Algorithm builds collections of partial
    hypotheses of increasing length
  • Partial hypotheses are extended by selecting the
    next available word from one of the original
    systems
  • Sentences are assumed synchronous
  • Each word is either aligned with another word or
    is an alternative of another word
  • Extending a partial hypothesis with a word
    pulls and uses its aligned words with it, and
    marks its alternatives as used vectors keep
    track of this
  • Partial hypotheses are scored and ranked
  • Pruning and re-combination
  • Hypothesis can end if any original system
    proposes an end of sentence as next word

9
The MEMT Algorithm
  • Scoring
  • Alignment score based on reinforcement from
    alignments of the words
  • LM score based on trigram LM
  • Sum logs of alignment score and LM score
    (equivalent to product of probabilities)
  • Select best scoring hypothesis based on
  • Total score (bias towards shorter hypotheses)
  • Average score per word

10
The MEMT Algorithm
  • Parameters
  • lingering word horizon how long is a word
    allowed to linger when words following it have
    already been used?
  • lookahead horizon how far ahead can we look
    for an alternative for a word that is not
    aligned?
  • POS matching limit search for an alternative
    to only words of the same POS

11
Example
  • IBM korea stands ready to allow visits to
    verify that it does not manufacture nuclear
    weapons 0.7407
  • ISI North Korea Is Prepared to Allow
    Washington to Verify that It Does Not Make
    Nuclear Weapons 0.8007
  • CMU North Korea prepared to allow Washington to
    the verification of that is to manufacture
    nuclear weapons 0.7668
  • Selected MEMT Sentence
  • north korea is prepared to allow washington to
    verify that it does not manufacture nuclear
    weapons . 0.8894 (-2.75135)

12
Example
  • IBM victims russians are one man and his wife
    and abusing their eight year old daughter plus a
    ( 11 and 7 years ) man and his wife and driver ,
    egyptian nationality . 0.6327
  • ISI The victims were Russian man and his wife,
    daughter of the most from the age of eight years
    in addition to the young girls ) 11 7 years ( and
    a man and his wife and the bus driver Egyptian
    nationality. 0.7054
  • CMU the victims Cruz man who wife and daughter
    both critical of the eight years old addition to
    two Orient ( 11 ) 7 years ) woman , wife of bus
    drivers Egyptian nationality . 0.5293
  • MEMT Sentence
  • Selected the victims were russian man and his
    wife and daughter of the eight years from the age
    of a 11 and 7 years in addition to man and his
    wife and bus drivers egyptian nationality .
    0.7647 -3.25376
  • Oracle the victims were russian man and wife
    and his daughter of the eight years old from the
    age of a 11 and 7 years in addition to the man
    and his wife and bus drivers egyptian nationality
    young girls . 0.7964 -3.44128

13
Example
  • IBM the sri lankan prime minister criticizes
    head of the country's 0.8862
  • ISI The President of the Sri Lankan Prime
    Minister Criticized the President of the Country
    0.8660
  • CMU Lankan Prime Minister criticizes her
    country 0.6615
  • MEMT Sentence
  • Selected the sri lankan prime minister
    criticizes president of the country . 0.9353
    -3.27483
  • Oracle the sri lankan prime minister criticizes
    president of the country's . 0.9767 -3.75805

14
Current System
  • Some features of decoding algorithm and final
    scoring still under experimentation
  • Initial development tests performed on TIDES 2003
    Arabic-to-English MT data, using IBM, ISI and CMU
    SMT system output
  • Further development tests performed on
    Arabic-to-English EBMT Apptek and SYSTRAN system
    output and on three Chinese-to-English COTS
    systems
  • Integrated within CACI REFLEX Demonstration
    Platform

15
Experimental ResultsChinese-to-English
16
Experimental ResultsArabic-to-English
17
Other Examples
  • http//www-2.cs.cmu.edu/afs/cs/user/alavie/Student
    s/Shyam/Comps100

18
Conclusions
  • New sentence-level MEMT approach with promising
    performance
  • Easy to run on both research and COTS systems
  • Tuning of parameter space for hypothesis
    generation too tuned to METEOR?
  • Decoding is still suboptimal
  • Oracle scores show there is much room for
    improvement
  • Need for additional discriminant features
  • Some ideas currently under investigation
Write a Comment
User Comments (0)
About PowerShow.com