Title: MT with Limited Resources: Approaches and Results
1MT with Limited ResourcesApproaches and Results
- Ralf Brown, Stephan Vogel, Alon Lavie, Lori
Levin, Jaime Carbonell - Students Christian Monson, Erik Peterson,
Kathrin Probst, Ashish Venugopal, Ying Zhang - Carnegie Mellon University
2RADD CMU's "incubator" for MT technologies
- Multiple Techniques
- Statistical
- Example-Based
- Transfer-Rule
- Common Pre-Processing
- Segmentation
- Conversion of numbers to Arabic numerals
- Translation of month names to English month names
- Multi-Engine combinations
3Statistical MT
- The major improvement for June evaluation was
phrase-to-phrase alignments. - Performance (NIST score)ME-Compatible-SMT
5.7354Full-SMT 6.1361
4Example-Based MT
- Given an indexed training corpus,
- find phrases in the corpus which occur in the
input to be translated - retrieve the sentence pairs containing matches,
and - perform a word-level alignment to determine
translations. - Our "standard" EBMT system is actually a
multi-enginecombination of phrasal EBMT, LDC
lexicon, and statistical dictionary extracted
from the training text.
5Example-Based MT (2)
- Inexact matching A phrase can match even if one
of the words inside the phrase differs, provided
that the dictionary can provide a more-or-less
unambiguous translation for the unmatched word. - "More-or-less Unambiguous" means that the
translationwith the second-highest frequency has
frequency less thanTHRESHOLDhighest-frequency - we experimentally determined the best threshold
to be 0.55.
6Transfer Rule MT
- Manually-developed transfer rules for translation
with our newly developed Transfer Engine. - (71 hours development time)
- and a transfer lexicon automatically derived from
theLDC 10k-word lexicon. - A language model selected from among ambiguous
translations. - Performance (NIST score)XFERLM 4.8404
7Multi-Engine MT
- Hypothesis by combining multiple
translationmethods, we can mitigate weaknesses
and enhancestrengths of individual methods. - Each engine generates whatever partial
translationsit can and assigns an approximate
quality score. - The partial translations are then combined into
alattice and a trigram model of the output
language(plus other scoring heuristics) is used
to selectthe best path through the lattice.
8Multi-Engine MT Results
- Most combinations outperform the individual
engines. - We submitted two combinations and the engines
they combinedfor official scores, with the
result shown below.PhrEBMT 3.9668 PhrEBMT 3.96
68SMT 5.7354 XFER
4.8404Combo 5.9524 Combo 5.2170 - Additionally, we see the effect of
combininglexica with phrasal EBMTPhrEBMT
3.9668PhrEBMTlex 5.2883
9CMU Small Data Results
- Official results were submitted with segmentor
trained on full LDC word list (same as large
data track) - We retrained our segmentor with only 10K small
dict and the words from the 100K Chinese treebank
and re-evaluated our latest systems. - Results with new (small) segmentation are
reported in parentheses
10CMU Small Data Results SMT
- Results with full segmentor and with re-trained
small segmentor for different versions of our
SMT system
11Learning Transfer-Rules for Languages with
Limited Resources
- Rationale
- Large bilingual corpora not available
- Bilingual native informant(s) can translate and
align a small pre-designed elicitation corpus,
using elicitation tool - Elicitation corpus designed to be typologically
comprehensive and compositional - Transfer-rule engine and new learning approach
support acquisition of generalized transfer-rules
from the data
12AVENUE Transfer
13Sample Transfer Rule
- Rules contain necessary information for analysis,
transfer and generation - Unification equations used to build source,
target feature structures - Transfer Chinese questions formed by appending
particle MA to English - S,2 Rule ID
- SS NP VP MA -gt AUX NP VP Source
Target production rules - (
- (x1y2) Source NP
aligns with target NP - (x2y3) Source VP
aligns with target VP - ((x0 subj) x1) Build the source
feature structure - ((x0 subj case) nom)
- ((x0 act) quest)
- (x0 x2)
- ((y1 form) do) Set inserted constituent
AUXs base form to do -
- ((y3 vform) c inf) Constrain verb to
infinitive form - ((y1 agr) (y2 agr)) Enforce agreement
between do and subject - )
14Transfer Overview
- The AVENUE translation engine developed
internally and follows a three-step transfer
approach - Analysis
- Transfer
- Generation
- The engine can be run with manually developed
transfer-rules as a stand-alone system or operate
as part our larger rule-learning system.
15RADD Transfer Development
- Total Chinese-specific rule and lexicon
development time 71 hours - Small and Large Tracks used same transfer rules
but different sized lexicons (10K vs. 50K) - Rule development by a bilingual speaker with
linguistic background, based upon manual
evaluation of training data and personal
grammatical knowledge. - Development concentrated on translating noun
phrases and structures where Chinese and English
word order differed
16Analysis
- Analysis uses a uses a unification-based chart
parser to find the input sentences grammatical
structure. - Different possible analyses and transfer paths
are all efficiently packed together in a packed
forest for later usage.
17Transfer
- Transfer rule manipulate the parse tree(s)
created during analysis. - Constituents (such as noun, verb phrases) can be
reordered, inserted, or deleted. - Words are translated using a transfer lexicon
- For sentences without a complete parse, transfer
occurs on the longest sub-parses found during
analysis.
18Generation
- During generation, the engine checks that the
target language tree from transfer satisfies
target language constraints (e.g. subject-verb
agreement in English) - Finally, the target sentence is read from the
leaves of the target tree and returned.
19Rule Learning - Overview
- Goal Acquisition of Syntactic Transfer Rules
- 1) Flat Seed Generation produce rules from
word-aligned sentence pairs, abstracted only to
POS level no syntactic structure - 2) Add compositional structure to Seed Rule by
exploiting previously learned rules - 3) Seeded Version Space Learning group seed rules
by constituent sequences and alignments, seed
rules form s-boundary of VS generalize with
validation
20Flat Seed Generation
- Create a seed rule that is specific to the
sentence pair, but abstracted to the pos level.
Use SL information (e.g. parses), and any TL
information. E.g. - The highly qualified applicant visited the
company. - Der äußerst qualifizierte Bewerber besuchte die
Firma. - ((1,1),(2,2),(3,3),(4,4),(5,5),(6,6))
SS det adv adj n v det n? det adv adj n v
det n (alignments (x1y1) (x2y2) (x3y3)
(x4y4) (x5y5) (x6y6) (x7y7) constraints
((x1 def) ) ((x4 agr) 3-sing) ((x5 tense)
past) . ((y1 def) ) ((y3 case) nom)
((y4 agr) 3sg) )
21Compositionality
- If there is a previously learned rule that can
account for part of the sentence, adjust seed
rule to reflect this compositional element. - Adjust constituent sequences, alignments, and
constraints add context constraints (from
possible translations), remove unnecessary ones
SS det adv adj n v det n? det adv adj n v
det n (alignments (x1y1) (x2y2) (x3y3)
(x4y4) (x5y5) (x6y6) (x7y7) constraints
((x1 def) ) ((x4 agr) 3-sing) ((x5 tense)
past) . ((y1 def) ) ((y4 agr) 3sg)
)
NPNP det adv adj n det adv adj
n ((x1y1) ((y4 agr) (x4 agr) .)
SS NP v det n? NP v det n (alignments (x1
y1) (x2y2) (x3y3) (x4y4) (x5y5)
(x6y6) (x7y7) constraints ((x5 tense)
past) . ((y1 def) ) ((y1 case) nom)
((y1 agr) 3sg) )
22Seeded Version Space Learning
-
-
- NP v det n NP VP
- Group seed rules into version spaces as above.
- Make use of partial order of rules in version
space. Partial order is defined - via the f-structures satisfying the constraints.
- Generalize in the space by repeated merging of
rules - Deletion of constraint
- Moving value constraints to agreement
constraints, e.g. - ((x1 num) pl), ((x3 num) pl) ?
- ((x1 num) (x3 num)
- 4. Check translation power of generalized rules
against sentence pairs
23Future Work
- Baseline evaluation
- Adjust generalization step size
- Revisit generalization operators
- Introduce specialization operators to retract
from overgeneralizations (including seed rules) - Learn from an unstructured bilingual corpus
- Evaluate merges to pick the optimal one at any
step based on cross-validation, number of
sentences it can translate