Title: Chapter 21: Machine Translation
1Chapter 21 Machine Translation
- Heshaam Faili
- hfaili_at_ece.ut.ac.ir
- University of Tehran
2What is MT?
- Machine Translation (MT) means translation using
computers. - Machine-aided human translation (MAHT)
- Human-aided machine translation (HAMT)
- Fully automated machine translation (FAMT)
- Fully human translation
3Some definitions
- Machine translation (MT) is the application of
computers to the task of translating texts from
one natural language to another. EAMT - Machine Translation (MT) as it is generally
known --- the attempt to automate all, or part of
the process of translating from one human
language to another. Arnold D J. MACHINE
TRANSLATION An Introductory Guide - presumably means going by algorithm from
machine-readable source text to useful target
text, without recourse to human translation or
editing." ALPAC report, 1966
4An Example Translation between Chinese English
5Different tasks with MT
- Tasks which rough translation is adequate
- Tasks where a human post-editor is used
- Tasks limited to small sublanguage domains in
which fully automatic high quality translation
(FAHQT) is still achievable - Tasks with Software Localization
6Machine Translation History
- 1946-1954 Optimistic attitude towards the new
technologies in MT - 1949 Informal Memorandum
- Word-to-word translation especially
Russian-English - 1954 The demonstration of the Georgetown
University - Vocabulary 250 words, Grammar 6 rules, Corpus
a few simple Russian sentences
7Machine Translation History
- 1954-1966 Criticism on the subject of MT
- 1966 ALPAC-Report (Automatic Language Processing
Advisory Committee) - MT is slower, not very reliable and twice as
expensive as human translation
8Machine Translation History
- 1966-1975 Revision of the aims and goals of MT
- Definition of more realistic goals
- Limitation of the research to technical languages
- Syntactical analysis of the source text
- Development of different translation strategies
9Machine Translation History
- 1975-1989 Increasing interest and promotion
for MT - Rapid increase of the demand for translations
- Improvements in hard- and software
- The use of artifical intelligence methodes is now
possilbe
10Machine Translation History
- 1990-2000
- Development of comercial products based on
personal computers - Specialized supplementary information (medicine,
law, economics...) - Translation of spoken language (VERBMOBIL)
11Machine Translation History
- 2000-Now
- Statistical Approaches and Hybrid Models
- Google Translation Engine ( http//translate.goog
le.com ) - Yearly MT Official Evaluation race (
http//www.nist.gov ) - Automated MT Evaluation (NIST, BLEU)
12Machine Translation History
13What happened between ALPAC and Now?
- Need for MT and other NLP applications confirmed
- Change in expectations
- Computers have become faster, more powerful
- WWW
- Political state of the world
- Maturation of Linguistics
- Development of hybrid statistical/symbolic
approaches
14Language Similarities or Differences
- Universal some aspects which is true for every
language - Every Language has words referring to people, or
every language has nouns or verbs - Typology Study of systematic cross-linguistics
similarities and differences - Morphology Aspects
- isolating Vs. Polysynthetic
- Agglutinative Vs. fusion
- Syntactical Aspects
- SVO , SOV or VSO
- Syntactical-Morphological Aspects
- Head-Marking Vcs. Dependent-marking
- Specific differences Date Format and Standards,
verb tense differences, - Lexical Differences Different scenes
15Lexical Differences
English leg, foot, paw French etape, patte,
jambe, pied
16Different Machine Translation Systems
- Rule-based
- Statistical Approaches
- Hybrid Systems (Using Statistical approach in an
Rule-based Architecture or )
17Three MT Approaches Direct, Transfer, Interlingua
18Machine Translation Architectures
- Direct architecture
- Direct architecture was used for most MT systems
of the first generation - there are no intermediate stages in the process
of translation
19Direct Architecture, 4 Steps
20Machine Translation Architectures
- Characteristics of direct MT systems
- no complex linguistic theories or parsing
strategy - make use of syntactic, semantic and lexical
similarities between the source and the
target-language - based on a single language pair
- direct MT systems are robust, they even
translate sentences with incomplete information - dictionaries are the most important components of
the direct MT systems
21Machine Translation Architectures
- Transfer architecture
- It consists of three separate stages
- analysis
- Transfer (Syntactical or Lexical)
- synthesis/generation
22Transfer Architecture,
23Transfer Example eng-gtSpanishMary did not slap
the green witch
24Transfer English-gtJapanese
25Some Examples
26Persian Example
- I ate the apple ? ?? ??? ?? ?????
- VP ? V NP ? VP ? NP RA V
- I asked the man ? ?? ?? ??? ?????
- VP ? V NP ? VP ? AZ NP V
27Machine Translation Architectures
- Characteristics of transfer MT systems
- consist of complete linguistic conceptions, not
only single grammatical or syntactic rules - the analysis and generation components can be
used again for further language pairs, if the
components are exactly separated - the dictionaries of the transfer MT systems are
also separated
28Machine Translation Architectures
- Interlingua architecture
- The interlingua system consists of two stages
- The source text is analysed into an interlingual
representation from which the text of the target
language will be directly generated - Semantic Analyzer
29Interlingua Architecture
30Machine Translation Architectures
- Interlingua architecture
- Advantage
- The interlingua representation can be used for
any other language - Disadvantage
- It is difficult to create language-independent
representations
31Statistical Approaches
32Statistical Approach
- 3 stages
- Language model P(E)
- Translation model P(FE)
- Decoder
33SYSTRAN
- Developed in the late 1950s by Peter Toma
- Initial system for Russian-English translations
- Later adapted for US Air Force and NASA
- Adaptation for other languages
- Important because it had a big influence on many
Japanese MT systems
34SYSTRAN
- Rule-based System
- Using finite state grammar (ATN)
- Using a large knowledge-base
- Working on 23 languages specially UE languages
- Customers AltaVista, Lycos, AOL, Compuserve,
Terra, Google, Apple ?...
35AppTek TranSphere
- Rule-based System
- Using LFG (Lexical Functional Grammar)
- Analyze the semantic, morphological and syntactic
structures in English and produce their
equivalents in the target language - Utilize a general-purpose lexicon in addition to
special domain micro-dictionaries - Translate English to Arabic, Korean, Chinese,
Turkish, Persian/Dari and Pashto-English - Bi-Translate French, German, Italian, Portuguese,
Russian, Spanish, Ukrainian, Hebrew and Dutch
36MÉTÉO
- Development of an English-French translation
system by the TAUM Group to cope with the
bilingual policy of the Canadian government - 1975 Contract to develop a system to translate
public weather forecasts - 1984 Development of Météo 2
- This program proved to be more reliable, faster
and more cost-effective - 1989 Development of a French-English version
37Sakhr Enterprise Machine Translation
- Using transfer Architecture
- analysis on all linguistic levels morphological,
lexical, syntactic and semantic - Arabic - English
38CiyaTran MT
- English - Arabic-scripts languages
Arabic-Persian-Pashto - Analyzing the semantic, morphological and
syntactical structure of input text - Utilizing Fuzzy Logic and Statistical Analysis
- Using a general-purpose lexicon, as well as 85
domain-specific databases with over 3,000,000
words and phrases
39ARIANE (GETA)
- 1960-1970 Development of CETA System for three
language pairs - Change of the name to ARIANE (GETA) as the system
was changed into a Transfer system
40EUROTRA
- Developed for the translation requirements within
the European Community - A system designed to replace the Systran system
because of its several limitations - 3 phases in the development of the program
- One of the biggest MT project regarding
expenditure, organizations and people involved
41Google Translation
- Lunched on 2004
- Beta version on English ? Arabic and English ?
Chinese - Fully Statistical
- Commercial usage no technical document found
- On 2005, become the best translator on these two
language http//www.nist.gov
42Shiraz Project
- This project involved the creation of an
extensible research prototype of a Persian to
English machine translation system - Persian to English
- Transfer Based Translation
- Syntactic Analysis
- Unification Based context free grammar
- Stopped
43Moses statistical MT
- Open source with C
- allows you to automatically train translation
models for any language pair. - All you need is a collection of translated texts
(parallel corpus). - beam-search
- phrase-based
44PSMT (Prolog Statistical Machine Translation)
- Used Prolog to Translate simple structures
- 3 sections
- Language Model Learner
- Dictionary Learner
- Search Program
45Phramer Statistical Machine Translation
- Phrase-based
- Open-Source with Java
- Using Bayesian model
46EGYPT
- Statistical MT
- French-English
- Academic
- Some workshops related to EGYPT established
47MT Challenges Ambiguity
- Syntactic AmbiguityI saw the man with the
telescope
S
S
NP
VP
NP
VP
VP
PP
NP
V
I
I
PP
NP
V
With the telescope
NP
saw
With the telescope
saw
the man
the man
48MT Challenges Ambiguity
- Syntactic AmbiguityI saw the man on the hill
with the telescope - Lexical Ambiguity
- E book
- Semantic Ambiguity
- Homographyball(E) pelota, baile(S)
- Polysemykill(E), matar, acabar (S)
- Semantic granularityesperar(S) wait, expect,
hope (E)be(E) ser, estar(S)fish(E) pez,
pescado(S)
49How do we evaluate MT?
- Human-based Metrics
- Semantic Invariance
- Pragmatic Invariance
- Lexical Invariance
- Structural Invariance
- Spatial Invariance
- Fluency
- Accuracy
- Do you get it?
- Automatic Metrics Bleu
50BiLingual Evaluation Understudy (BLEU Papineni,
2001)
- Automatic Technique, but .
- Requires the pre-existence of Human (Reference)
Translations - Produce corpus of high-quality human translations
- Judge closeness numerically (word-error rate)
- Compare n-gram matches between candidate
translation and 1 or more reference translations
51Other Evaluation metrics
- Bleu, NIST, TER, Precision and Recall, Meteor,
- BLEU rank each MT output by a weighted average
of the number of N-gram overlaps with the human
translations
52BLEU metric
53BLEU bad results on 1-gram BLEU , better on
modified BLEU
54BLEU metric
Keep penalty for candidate which is short
Effective reference length