Title: Wrapper Syntax for ExampleBased Machine Translation
1Wrapper Syntax for Example-Based Machine
Translation
- Karolina Owczarzak, Bart Mellebeek, Declan
Groves, Josef Van Genabith, Andy Way - National Centre for Language Technology
- School of Computing
- Dublin City University
2Overview
- TransBooster wrapper technology for MT
- motivation
- decomposition process
- variables and template contexts
- recomposition
- Example-Based Machine Translation
- marker-based EBMT
- Experiment
- English-Spanish
- Europarl, Wall Street Journal section of Penn II
Treebank - automatic and manual evaluation
- Comparison with previous experiments
3TransBooster wrapper technology for MT
- Assumption
- MT systems perform better at translating short
sentences than long ones. - Decompose long sentences into shorter and
syntactically simpler chunks, send to
translation, recompose on output - Decomposition linguistically guided by syntactic
parse of the sentence
4TransBooster wrapper technology for MT
- TransBooster technology is universal and can be
applied to any MT system - Experiments to date
- TB and Rule-Based MT (Mellebeek et al., 2005a,b)
- TB and Statistical MT (Mellebeek et al., 2006a)
- TB and Multi-Engine MT (Mellebeek et al., 2006b)
- TransBooster outperforms baseline MT systems
5TransBooster decomposition
- Input syntactically parsed sentence (Penn II
format) - Decompose into pivot and satellites
- pivot usually main predicate (plus additional
material) - satellites arguments and adjuncts
- Recursively decompose satellites if longer than x
leaves - Replace satellites around pivot with variables
- static simple same-type phrases with known
translation - dynamic simplified version of original
satellites - send off to translation
- Insert each satellite into a template context
- static simple predicate with known translation
- dynamic simpler version of original clause
(pivot simplified arguments, no adjuncts) - send off to translation
6TransBooster decomposition example
- (S (NP (NP (DT the) (NN chairman)) (, ,) (NP (NP
(DT a) (JJ long-time) (NN rival)) (PP (IN of) (NP
(NNP Bill) (NNP Gates)))) (, ,)) (VP (VBZ likes)
(NP (ADJP (JJ fast) (CC and) (JJ confidential))
(NNS deals))) (. .)) - The chairman, a long-time rival of Bill
Gates,ARG1 likespivot fast and confidential
dealsARG2.
The manV1 likespivot carsV2. The chairman,
a long-time rival of Bill Gates,ARG1 is
sleepingV1. The man seesV1 fast and
confidential dealsARG2.
The chairmanV1 likespivot dealsV2. The
chairman, a long-time rival of Bill Gates,ARG1
likes dealsV1. The chairman likesV1 fast and
confidential dealsARG2.
MT engine
7TransBooster recomposition
- MT output a set of translations with dynamic and
static variables and contexts for a sentence S - Remove translations of dynamic variables and
contexts from translation of S - If unsuccessful, back off to translation with
static variables and contexts, remove those - Recombine translated pivot and satellites into
output sentence
8TransBooster recomposition example
The chairman, a long-time rival of Bill Gates,
likes fast and confidential deals.
The chairmanV1 likespivot dealsV2. - El
presidente tiene gusto de repartos. The
chairman, a long-time rival of Bill Gates,ARG1
likes dealsV1. - El presidente, un rival de
largo plazo de Bill Gates, tiene gusto de
repartos. The chairman likesV1 fast and
confidential dealsARG2. - El presidente tiene
gusto de repartos rápidos y confidenciales.
The manV1 likespivot carsV2. - El hombre
tiene gusto de automóviles. The chairman, a
long-time rival of Bill Gates,ARG1 is
sleepingV1. - El presidente, un rival de largo
plazo de Bill Gates, está durmiendo. The man
seesV1 fast and confidential dealsARG2. - El
hombre ve repartos rápidos y confidenciales.
El presidente, un rival de largo plazo de Bill
Gates, tiene gusto de repartos rápidos y
confidenciales.
Original translation El presidente, rival de
largo plazo de Bill Gates, gustos ayuna y los
repartos confidenciales.
9EBMT Overview
- An aligned bilingual corpus
- Input text is matched against this corpus
- The best match is found and a translation is
produced
EX (input)
search
F2 F4
FX (output)
10EBMT Marker-Based Chunking
the,a,these
le,la,l,une,un,ces.. on, of
sur, d
.. English phrase on virtually all uses of
asbestos French translation sur virtuellement
tous usages dasbeste on virtually
all uses of
asbestos sur virtuellement tous
usages d asbeste
Marker Chunks on virtually sur
virtuellement all uses tous
usages of asbestos dasbeste
Lexical Chunks on sur
virtually virtuellement all tous
uses usages of d
asbestos asbeste
11EBMT System Overview
12Experiment
- English - Spanish
- Two test sets
- Wall Street Journal section of Penn II Treebank
800 sentences - Europarl 800 sentences
- Out-of-domain factor
- TransBooster developed on perfect Penn II trees
- EBMT trained on 958K English-Spanish Europarl
sentences
13Experiment Results
Automatic evaluation
- Results for EBMT vs TransBooster on 741-sentence
test set from Europarl.
Results for EBMT vs TransBooster on 800-sentence
test set from Penn II Treebank.
14Experiment - Results
Manual evaluation
- 100 randomly selected sentences from EP test set
- source English sentence
- EBMT translation
- EBMT TransBooster translation
- 3 judges, native speakers of Spanish fluent in
English - Accuracy and fluency relative scale for
comparing the two translations
Inter-judge agreement (Kappa) Fluency 0.948,
Accuracy 0.926
Absolute quality gain when using TransBooster
Fluency 19.33 of sentences Accuracy 15.67 of
sentences
15Experiment Results
TB improvements
Example 1 Source women have decided that they
wish to work, that they wish to make their
work compatible with their family life. EBMT
hemos decidido su deseo de trabajar, su deseo de
hacer su trabajo compatible con su vida
familiar. empresarias TB mujeres han decidido
su deseo de trabajar, su deseo de hacer su
trabajo compatible con su vida familiar.
Example 2 Source if this global warming
continues, then part of the territory of the eu
member states will become sea or desert. EBMT
si esto continúa calentamiento global, tanto
dentro del territorio de los estados miembros
tendrán tornarse altamar o desértico TB si esto
calentamiento global perdurará, entonces parte
del territorio de los estados miembros de la
unión europea tendrán tornarse altamar o desértico
16Previous experiments
- TransBooster vs. SMT on 800-sentence test set
- from Europarl.
TransBooster vs. EBMT on 800-sentence test set
from Europarl.
TransBooster vs. EBMT on 800-sentence test set
from Penn II Treebank.
17Previous experiments
- TransBooster vs. SMT on 800-sentence test set
from Europarl.
TransBooster vs. EBMT on 800-sentence test set
from Europarl.
18Previous experiments
TransBooster vs. Rule-Based MT on 800-sentence
test set from Penn II Treebank.
TransBooster vs. SMT on 800-sentence test set
from Penn II Treebank.
TransBooster vs. EBMT on 800-sentence test set
from Penn II Treebank.
19Summary
- TransBooster is a universal technology to
decompose and recompose MT text - Net improvement in translation quality against
EBMT - Fluency 19.33 of sentences Accuracy 15.67
of sentences - Successful experiments to date rule-based MT,
phrase-based SMT, multi-engine MT, EBMT - Journal article in preparation
20Thank You