Title: Avenue Architecture
1Avenue Architecture
2Interactive and Automatic Refinement of
translation Rules
- Problem Improve Machine Translation Quality.
- Proposed Solution Put bilingual speakers back
into the loop use their corrections to detect
the source of the error and automatically improve
the lexicon and the grammar. - Approach Automate post-editing efforts by
feeding them back into the MT system. - Automatic refinement of translation rules that
caused an error beyond post-editing. - Goal Improve MT coverage and overall quality.
3Technical Challenges
Automatic Evaluation of Refinement process
Elicit minimal MT information from non-expert
users
4Error Typology for Automatic Rule Refinement
(simplified)
Interactive elicitation of error information
- Missing word
- Extra word
- Wrong word order
- Incorrect word
- Wrong agreement
5TCTool (Demo)
Interactive elicitation of error information
- Add a word
- Delete a word
- Modify a word
- Change word order
Actions
6Types of Refinement Operations
Automatic Rule Adaptation
- 1. Refine a translation rule
- R0 ? R1 (change R0 to make it more specific
or more general)
R0
una casa bonito
a nice house
R1
N gender ADJ gender
a nice house
una casa bonita
7Types of Refinement Operations
Automatic Rule Adaptation
- 2. Bifurcate a translation rule
- R0 ? R0 (same, general rule)
- ? R1 (add a new more specific rule)
R0
una casa bonita
a nice house
R1
ADJ type pre-nominal
un gran artista
a great artist
8Automatic Rule Adaptation
A concrete example
Error Information Elicitation
error
Change word order SL Gaudí was a great artist
MT system output TL Gaudí era un artista
grande Ucorrection Gaudí era un artista
grande Gaudí era un gran artista
correction
clue word
Refinement Operation Typology
9Automatic Rule Adaptation
- Finding Triggering Feature(s)
- ?(error word, corrected word) ?
-
- ? need to postulate a new binary feature feat1
- Blame assignment (from MT system output)
- tree lt((S,1 (NP,2 (N,51 "GAUDI") )
- (VP,3 (VB,2 (AUX,172 "ERA") )
- (NP,8 (DET,03 "UN")
- (N,45 "ARTISTA")
- (ADJ,54 "GRANDE")
) ) ) )gt
ADJADJ great -gt grande ((X1Y1) ((x0
form) great) ((y0 agr num) sg) ((y0 agr gen)
masc))
ADJADJ great -gt gran ((X1Y1) ((x0
form) great) ((y0 agr num) sg) ((y0 agr gen)
masc))
S,1 NP,1 NP,8
Grammar
10Refining Rules
Automatic Rule Adaptation
- Bifurcate NP,8 ? NP,8 (R0) NP,8 (R1)
- (flip order of ADJ-N)
- NP,8
- NPNP DET ADJ N -gt DET ADJ N
- ( (X1Y1) (X2Y2) (X3Y3)
- ((x0 def) (x1 def))
- (x0 x3)
- ((y1 agr) (y3 agr)) det-noun agreement
- ((y2 agr) (y3 agr)) adj-noun agreement
- (y2 x3)
- ((y2 feat1) c ))
11Refining Lexical Entries
Automatic Rule Adaptation
- ADJADJ great -gt grande
- ((X1Y1)
- ((x0 form) great)
- ((y0 agr num) sg)
- ((y0 agr gen) masc)
- ((y0 feat1) -))
- ADJADJ great -gt gran
- ((X1Y1)
- ((x0 form) great)
- ((y0 agr num) sg)
- ((y0 agr gen) masc)
- ((y0 feat1) ))
12Evaluating Improvement
Automatic Rule Adaptation
- Given the initial and final Translation Lattices,
the Rule Refinement module needs to take into
account, whether the following are present - Corrected Translation Sentence
- Original Translation Sentence (labelled as
incorrect by the user)
un artista gran un gran artista un grande artista
un artista grande
13Evaluating Improvement
Automatic Rule Adaptation
- Given the initial and final Translation Lattices,
the Rule Refinement module needs to take into
account, whether the following are present - Corrected Translation Sentence
- Original Translation Sentence (labelled as
incorrect by the user)
un artista gran un gran artista un grande
artista
un artista grande
14Challenges and future work
- Credit and Blame assignment from TCTool Log Files
and Xfer engines trace - Order of corrections matters explore rule
interactions - Explore the space between batch mode and fully
interactive system - Online TCTool always running to collect
corrections from bilingual speakers - ? make it into a game with rewards for the best
users
15Publications
- Font Llitjós, A., J.G. Carbonell and A. Lavie. "A
Framework for Interactive and Automatic
Refinement of Transfer-based Machine Translation"
EAMT 10th Annual Conference 30-31 May 2005,
Budapest, Hungary. - Font Llitjós, A., R. Aranovich and L. Levin.
"Building Machine translation systems for
indigenous languages". Second Conference on the
Indigenous Languages of Latin America (CILLA II),
27-29 October 2005, Texas, USA. - Font Llitjós, A., K. Probst and J.G. Carbonell .
"Error Analysis of Two Types of Grammar for the
Purpose of Automatic Rule Refinement". AMTA,
2004, Washington, USA. - Font Llitjós, A. and J.G. Carbonell . "The
Translation Correction Tool English-Spanish user
studies. LREC, 2004. Lisbon, Portugal.
16Quechua?Spanish MT
- V-Unit funded Summer project in Cusco (Peru)
June-August 2005 preparations and data
collection started earlier - Intensive Quechua course in Centro Bartolome de
las Casas (CBC) - Worked together with two Quechua
native and one non-native speakers
on developing infrastructure
(correcting elicited translations,
segmenting and translating list of
most frequent words)
17Quechua ? Spanish prototype MT system
- Stem Lexicon (semi-automatically
generated) 753 lexical entries - Suffix lexicon 21 suffixes
- (150 Cusihuaman)
- Quechua morphology analyzer
- 25 translation rules
- Spanish morphology generation module
- User-Studies 10 sentences, 3 users (2 native, 1
non-native)