Title: Recent Approaches to Machine Translation
1Recent Approaches to Machine Translation
- Vincent Vandeghinste
- Centre for Computational Linguistics
2Outline
- Traditional MT
- Rule-based MT
- Data-driven MT
- Statistical Machine Translation
- Example-based Machine Translation
- Recent Approaches to MT
- Without parallel Corpora
- Matador (Habash, 2003)
- Context-based Machine Translation (Carbonell,
2006) - METIS (Dologlou, 2003 Vandeghinste, 2006)
- With parallel corpora
- Data-oriented Translation (Hearne, 2005)
- PaCo-MT (Vandeghinste, 2007)
3Outline
- Automated Metrics for MT Evaluation
- BLEU / NIST / METEOR
- TER
- Other Experimental Metrics
4Traditional MT Rule-based MT
- Rules are hand-made
- Linguistic knowledge can be incorporated
- Rules are human readable
- Transfer-based MT
- Cf. presentation of yesterday
- New set of transfer rules for each new language
pair - Shift as much of the labour to monolingual
components - Interlingual MT
- Analyse source sentence to abstract interlingual
representation - Generate target sentence from interlingual
representation - New languages need only to convert to or from
interlingua - Interlingua needs to take set of target languages
into account
5Traditional MT Data-driven MT
- Makes use of aligned parallel corpora
- Not always available for new language pairs
- Sentence alignment
- Sub-sentential alignment / Word alignment
- Statistical Machine Translation
- Language model gives probability of target
language sentence P(e) - Based on n-grams
- Based on probabilistic grammars
- Translation model gives probability of
translation of source segment into target segment
(based on alignment) P(f e) - Decoder looks for maximisation using language
model and translation model - ê argmax P(ef) argmax (P(e)P(fe))/P(f)
argmax P(e)P(fe)
6Traditional MT Data-driven MT
- Example-based Machine Translation
- Inspired by Translation Memories
- Match source sentence or fragment with source
side of parallel corpus - Retrieve translations of sentence or fragment
- Recombine fragments in target language sentence
how? - What with contradicting examples?
7Recent Approaches to MT
- Without parallel corpora
- For many language pairs parallel corpora are
- Unavailable
- Not big enough
- Too domain dependent
- Creation of parallel corpora is expensive
- Hybrid taking ideas from
- RBMT
- SMT
- EBMT
- MATADOR, Context-based MT, METIS
8Recent Approaches to MT
- Using parallel corpora
- parsing both source and target side parallel
treebanks - Sub-sentential alignment
- sub-tree alignment
- word alignment
- Linguistically motivated data-driven MT
- Data-oriented translation, Parse and Corpus-based
Translation
9MATADOR (Habash, 2003)
- Spanish to English
- Analysis
- Target language independent
- Full parse deep syntactic dependencies
- Normalizing over syntactic phenomena as
- Passivization
- Morphological expressions (tense, number, etc.)
- Maria puso la mantequilla en el pan
- (Mary put the butter on the bread)
- (puso subj Maria
- obj (mantequilla mod la)
- mod (en obj (pan mod el)))
10MATADOR
- Translation
- Dictionary lookup
- Convert Spanish words into bags of English words
- Maintain Spanish dependency structure
- ((lay locate place put render set stand)
- subj Maria
- obj ((butter bilberry) mod the)
- mod ((on in into at) obj ((bread loaf)
- mod the)))
11MATADOR
- Generation
- Lexical and structural manipulation of input to
produce English sentences - Using symbolic resources
- Word-class lexicon defines verbs and preps in
terms of subcategorization frames - Categorial variation database relates words to
their categorial variants (hunger_V, hunger_N,
hungry_ADJ) - Syntactic thematic linking map relates syntactic
relations (subj/obj) and preps to their thematic
roles (goal / source / benefactor / ...) - (put subj Maria
- obj ((butter bilberry) mod the)
- mod (on obj ((bread loaf) mod the)))
- (lay subj Maria
- obj ((butter bilberry) mod the)
- mod (at obj ((bread loaf) mod the)))
- (butter subj Maria
- obj ((bread loaf) mod the))
12MATADOR
- Generation
- Using statistical resources
- Surface n-gram model like in SMT, n-gram of
surface forms of words - Structural n-gram model relationship between
words in dependency representation without using
structure at phrase level - Linearization
- (OR (SEQ Maria (OR puts put) the (OR butter
- bilberry) (OR on into) (OR bread
loaf)) - (SEQ Maria (OR lays laid) the (OR butter
- bilberry) (OR at into) (OR bread
loaf)) - (SEQ Maria (OR butters buttered) the (OR
- bread loaf)))
13MATADOR Generation n-grams
- Maria buttered the bread -47.0841
- Maria butters the bread -47.2994
- Maria breaded the butter -48.7334
- Maria breads the butter -48.835
- Maria buttered the loaf -51.3784
- Maria butters the loaf -51.5937
- Maria put the butter on bread -54-128
14Context-based Machine Translation (Carbonell,
2006)
- Spanish to English
- Analysis
- source sentence is segmented in overlapping
n-grams - 3 lt n lt 9
- sliding window
- size of n can be based on number of non-function
words - Translation
- bilingual dictionary to generate candidate
translations (1.8M inflected forms) - full form dictionary generated by inflection
rules on lemma dictionary - cross-language inflection mapping table
(including tense mapping) - multi-word entries
- Generation
- search in very large target corpus (50 GB to 1 TB
via Web crawling) - multi layered inverted indexing to allow fast
identification of n-grams from component words - containing the maximal number of lexical
translation candidates in context - minimal number of spurious content words
- word order may differ
15Context-based MT
- Generation
- combining target n-gram translation candidates by
finding maximal left and right overlaps with
translation candidates of previous and following
n-grams - retained target n-grams are contextually anchored
both left and right - (near)-synonyms are generated
- unsupervised method for contextual clustering on
monolingual corpus - scored by overlap decoder
16METIS approach
- METIS-I Dologlou et al. (2003) - feasibility
- METIS-II Vandeghinste et al. (2006)
- Dutch, Greek, German, Spanish to English
- no parallel corpora
- no full parsers
- using techniques from RBMT, SMT and EBMT
- target language corpus
- Shallow source analysis
- PoS-tagging (statistical HMM tagger)
- lemmatization (lexicon rule-based
lemmatization) - chunks NPs, PPs, Verb Groups
- head marking
- clauses relative phrase, subordinate clause
- no functions (subj obj)
17METIS approach
- de grote hond blaft naar de postbode
- the big dog barks to the postman
- S(NP (tokde/lemmade/tagLID(bep))
- (tokgrote/lemmagroot/tagADJ(basis))
- (tokhond/lemmahond/tagN(ev)))
- (VG (tokblaft/lemmablaffen/tagWW(pv,met-t)))
- (PP (toknaar/lemmanaar/tagVZ)
- (NP (tokde/lemmade/tagLID(bep))
- (tokpostbode/lemmapostbode/
- tagN(ev))))
18METIS approach
- Translation
- lemma-based dictionary
- structural changes can be modeled through
dictionary - tag mapping between source tag set and target tag
set - S(NP (lemmathe/tagAT0)
- (lemmabig/tagAJ0 lemmalarge/tagAJ0
- lemmagrown-up/tagAJ0 lemmatall/tagAJ0)
- (lemmadog/tagNN1))
- (VG (lemmabark/tagVVZ))
- (PP (lemmato/tagPRP lemmaat/tagPRP
- lemmatoward/tagPRP)
- (NP (lemmathe/tagPRP)
- (lemmapostman/tagNN1
- lemmamailman/tagNN1)))
19METIS approach
- Generation
- bottom-up fuzzy matching of chunks with target
corpus - determine word/chunk order
- determine lexical selections which translation
alternatives are likely to occur together - when no perfect match is found PoS-tags can be
used as slots for words from bag - (the/AT0,big/AJ0,dog/NN1) gt the hot/AJ0 dog
- (dog/NP,bark/VG,to/PP)
- (dog/NP,bark/VG/at/PP)
- preprocessing of target corpus
- chunking, head detection, indexing
- weights for matching accuracy
- token generation from lemma pos tag gt token
- bark VVZ barks
big
20Data-oriented Translation (Hearne, 2005)
- Using a parallel treebank
- tree-to-tree alignment
- nodes are linked only when the substrings they
dominate represent the same meaning and could
serve as translation units outside the current
sentence context - provide explicit details about occurrence and
nature of translational divergences between
source and target language - Representation
- many different linguistic formalisms can be used
to annotate the parallel corpus - Fragmentation
- extraction of pairs of linked generalized
subtrees from the linked tree pairs contained in
the example base - Composition
- each pair of linked frontier nodes constitutes an
open substitution site fragments whose linked
source and target root nodes are of the same
syntactic category as the linked source and
target substitution site categories can be
substituted at these frontiers
21Data-oriented Translation
VPv
NPpp
treebank
N
N
V
N
N
PP
images
images
scanning
documents
numérisation
N
P
documents
de
VPv
NPpp
translation
N
N
V
N
N
PP
o
images
images
scanning
numérisation
N
P
de
scanning images
VPv
NPpp
V
N
N
PP
scanning
images
numérisation
N
P
images
de
22Parse Corpus-based MT (Vandeghinste, 2007)
- Dutch lt-gt English
- Dutch lt-gt French
- project starting now
- Analysis full parse dependency tree
- Translation
- structured dictionary / parallel corpus /
translation memory - manual entries (like METIS)
- automatic entries based on sub-sententially
aligned parallel data - weighted entries
- structure mapping (no lemmas)
- manual entries (like METIS)
- automatic entries based on sub-sententially
aligned parallel data - weighted entries
23Parse Corpus-based MT
- Generation
- match translation candidates with large target
treebank - surface form generation
- preprocessing of target corpus parsing
indexing - Post-editing
- human translator improves sentences
- human corrected sentences are inserted in
dictionary / parallel corpus / translation memory - weights in translation information are updated
24Automated metrics for MT evaluation
- Types of metrics
- Metrics using reference translations
- Metrics looking only at generated output Turing
test - Testing specific MT dificulties
- Usefulness of metrics is beyond any doubt
- Reliability of metrics is questionable
- how well do they correlate with human judgement
- tuning towards automated metrics
- When a system improves on a whole set of
different metrics, one can trust the improvement - When metrics disagree, not clear whether there is
progress
25MT Metrics using References
- metrics compare MT output with reference
translations - reference translations are human translations
- BLEU (Papineni et al., 2002)
- counts number of n-grams in a MT output sentence
which are in common with one or more reference
translations - MT1 It is a guide to action which ensures that
the military always obeys the commands of the
party. - MT2 It is to ensure the troops forever hearing
the activity guidebook that party direct. - R1 It is a guide to action that ensures that the
military will forever heed party commands. - R2 It is the guiding principle which guarantees
the military forces always being under the
command of the party. - R3 It is the practical guide for the army always
to heed the directions of the party.
26MT Evaluation Metrics BLEU
- Reference word or phrase should be considered
exhauseted after a matching candidate word or
phrase is identified modified n-gram precision - Candidate translations longer than references are
penalized by modified n-gram precision - Candidate translations which are shorter than
reference are not yet penalized Brevity penalty - Brevitiy penalty is computed over entire test
corpus to allow some freedom on sentence level - BLEU scores are between 0 and 1
- Correlating with human judgment
27MT Evaluation Metrics - NIST
- (Doddington, 2002)
- variant on BLEU
- arithmentic mean instead of geometric mean
- weights more heavily n-grams that are less
frequent informativeness - harder to interprete (no maximum score)
- score depends on a.o. average sentence length
- correlating with human judgment
28MT Evaluation Metrics
- SMT systems are often tuned to maximize the BLEU
score! - It is unclear that, when this is done, whether
the BLEU score still reflects translation
quality, still correlates with human judgment - nr of references influences the score
- METEOR variant on BLEU, using unigram matches on
words and stems (requires a TL stemmer)
29MT Evaluation Metrics TER
- Translation Edit Rate (Olive, 2005)
- measures nr of edits needed to change a
translation hypothesis so that it exactly matches
one of the references, normalized over the
average length of the references - TER nr of edits / avg nr of ref words
- edits are
- insertion
- deletion
- substitution
- edits performed on
- single words
- word sequences
- all edits have equal cost
- Modelling the cost for a post-editor to adapt the
sentence to the reference
30MT Evaluation Metrics Turing test
- More experimental metrics
- check whether machine generated sentences is
different from human generated sentences - X-score distribution of elementary linguistic
information in MToutput should be similar to
distribution in human output (Rajman Hartley,
2001) bad correlation with human judgment - Machine Translationness using the WWW as corpus,
to look up sentence and sentence fragments if
sentence is found, then translation is good
31MT Evaluation Metrics
- Methods that test specific MT difficulties
- Test set is not random sample, but contains nr of
difficult translation cases - For each difficult translation case, it is
checked whether MT system can treat this
difficulty
32MT Metrics - Conclusions
- BLEU, NIST, METEOR and TER are widely used
- scripts downloadable to calculate these metrics
- good to have these metrics to measure internal
progress between different versions - dangerous to compare scores for different systems
over different test sets - SMT systems are often tuned to maximize scores,
this does not imply better translations - The ultimate evaluation should be human
evaluation - Economic evaluation speed up in post-editing