Title: Making machine translation work
1Making machine translation work
- By Stefan, Simon, Lisa, Nina and Dennis
2Making machine translation work
- Introduction
- Human versus Machine Translation
- Methods in Machine Translation
- Example-Based Machine Translation
3Making machine translation work
- Group work HT vs. MT
- Try to translate the following proverb
- ? Wer A sagt, muss auch B sagen.
- HT use your language knowledge
- MT Use Babel Fish (http//babelfish.altavista.com
/tr)
4Making machine translation work
HT MT
In for a penny, in for a pound. Who says A, also B must say.
In how far is such a translation
suitable/appropriate?
5Human and Machine Translation
- HT and MT differ in two main points
- 1. Mode of process
- 2. Mode of product
- based on different specifications and theoretical
positions - both modes are used for comparison
6Human and Machine Translation
- Mode of process
- By comparing the modes of process you
- gain knowledge about the respective stages and
intersections - can make decisions about choices of alternative
methods - and about new designs of translation methods
7Human and Machine Translation
- Mode of product
- By comparing the modes of product you
- check the appropriateness of the translation
- figure out the most efficient method
- ? the MT product must be usable in the same way
as the human product - ? secure a basis of equality
8Human and Machine Translation
- Another criterion for comparison
-
- text input must be a constant so that the
products are comparable - ? help to formulate guidelines for HT or MT texts
9Human and Machine Translation- translation
processes -
- Translation as problem solving
10Human and Machine Translation - translation
processes -
- Four major steps
- (a ) SL linguistic de-composition
- (b) Problem identification at the SL linguistic
and cognitive level - (c) Problem solution at the cognitive and TL
linguistic level (knowledge base) - (d) TL linguistic re-composition
11Human and Machine Translation - translation
processes -
- Characteristics of HT
- Knowledge base is flexible
- Problems can be transferred
- Intuition/experience of the translator
- Knowledge base expands constantly
12Human and Machine Translation - translation
processes -
- MT model of problem solving
13Human and Machine Translation - translation
processes -
- Characteristics of MT
- Knowledge base is relatively limited and rigid
- Has fixed and pre-established connections
- Limited possibility of transferring problems
- less semantic and pragmatic level experience
- Lack of essential world-knowledge
14Human and Machine Translation - translation
processes
Major levels of comparison
Human modules Machine modules
Comprehension Analysis
Matching Transfer
Writing Generation/Synthesis
15Human and Machine Translation - translation
processes
- Comprehension vs. Analysis
Human Machine
adapts innovations works retrospectively
high amount of interpretative capacity limited amount of interpretative capacity
interferencing
16Human and Machine Translation - translation
processes
Human Machine
compensation of items which cannot be matched in conventional ways equivalents cannot be pre-planned or incorporated
17Human and Machine Translation - translation
processes
- Writing vs. Generation/Synthesis
Human Machine
can respond to syntactic or lexical innovations or deviations works prospectively
can create equivalences
18Human and Machine Translation- translation
products -
- Products can be compared with regard to
- to the nature of the output language
- to the produced text
19Human and Machine Translation- translation
products -
- The nature of MT language
- MT language is constructed and artificial (the
computer cant produce sentences on its own) - it corresponds to the designers perception of SL
and TL - has no creative potential (it is not as flexible
and multifunctional as HT language) - They exclude emotive, aesthetic of other meanings
- ? each MT system produces its own language (i.e.
Weidner English or Atlas English)
20Human and Machine Translation- translation
products -
- The nature of MT language
- MT systems are one-way converter (they only
recognize words that belong to the system) - MT language often needs post-editing
21Human and Machine Translation- translation
products -
- Flexibility vs. rigidity in text types
- MT lang. is conceived on the sentence level
- ? no distinctions on the text type possible
- ? MT systems can only handle text types they have
been programmed for - ? unknown text types cause unacceptable output
22Human and Machine Translation- translation
products -
23Human and Machine Translation- translation
products -
- Challenge for MT language
- construction of user-friendly articifial language
- optimum transfer of information from SL/NL to AL
- to convince users that AL is equally efficient as
NL
24The Pragmatic Circumstances of Automation in
Translation
- Methods of MT
- Linguistic approach
- Semantic approach
- Users of MT systems
- Some MT systems
- Functional types of MT
25Methods of MT Linguistic approach
- three strategies
- Analysis of the source text
- Mode of transfer
- Generation of target text
26Linguistic approachThree main subtypes
- a) Language-pair-specific direct systems
- Earliest type of system
- Reflects the design philosophy of the 1950s and
1960s - Exploited direct correspondences between two
languages
27Linguistic approachThree main subtypes
- b) Interlingual systems
- SL text transformed into a semantic and syntactic
representation (equivalent of the transfer phase)
which is common to at least two languages - That text in an other language can be generated
from this representation - transform from a source language A into a
target language B, using rules expressed in a
third language C. (Cherry. 1966) - Two phases 1. Analysing in terms of the
interlingual representation 2. TL sentences are
produced from this representation. -
28Linguistic approachThree main subtypes
- c) Transfer systems
- Analysis phase SL text is processed to the depth
required by the rules of its grammar - Transfer phase based on the target language
transforming into a representation for the
generation of a target language text - Generation phase the transfer representation is
then transformed into a text in the TL without
any further back-reference to the results of
analysis.
29The semantic approach
- Semantic processes only operate after the
identification of syntactic structures. - Chief components are semantic parsing, i.e.
analysis of semantic features instead of, or in
addition to, grammatical categories. - The system does understand the SL text, before
translation begins.
30Users of MT systems
- The translator as producer
- Machine to provide cheaper, faster and a larger
volume of production, without significant loss of
quality - Clearly seen as a industry product
31Users of MT systems
- The writer as translation producer
- Writers gain a certain degree of independence
from translators, who exclusively determined form
and quality of the end product - Writers may want to develop bi- or multilingual
texts directly rather than write a text for
subsequent translation
32Users of MT systems
- Readers of translation
- to be able to by-pass the time-consuming and
costly human translation circuit, and instead
obtain instant translations produced by an MT
system
33Users of MT systems
- The information supplier
- possibilities of providing translated versions
automatically as part of the general information
supply, e.g. multilingual versions of electronic
journals or databases
34Some MT systems
- ATLAS
- Japanese system, based on structural transfer,
for specialised technical texts - CULT
- Interactive system, for on-line translation of
texts in the field of mathematics from Chinese
into English
35Some MT systems
- METEO
- The Canadian Federal Government system for the
production of bilingual French-English weather
reports - SYSTRAN
- Oldest commercially available MT system, of
un-edited output, for post-editing use, for
restricted-language document input and for
general use in the French Minitel system - Largest number of language pairs, all EC languages
36Function types of machine translation
- Two possible modes of viewing automatic
translation - See the computer as an aid to human translation
- Accept that the computer provides a translation
service sui generis which is not comparable to
the human variety
37MT as human translation aid
- MT as aids to translators
- Intended to accelerate the human process of
translation - Output is artificial to the extend that it does
not conform to certain expectations - End user still wants a human product, but will
accept MT as long as it is either cheaper or
produced more quickly
38MT as human translation aid
- Systems are greatly improved by concentrating on
particular text types and ranges of vocabulary - Systems offer subject-specific modules of
vocabulary and phraseology that can be switched
into the process
39Machine assisted human translation
40Machine assisted human translation
- Check text against an automated dictionary
- Ignores common words and function words
- Looks up translation equivalents for special
vocabulary items - Speed up the process
41Machine assisted human translation
42Machine assisted human translation
- Text is pre-translated automatically
- Output not adequate for direct use or
post-editing - Offers words and expressions
- Translator reduce the time for dictionary look-up
- Save the time of actually typing the found
translation equivalents
43Machine assisted human translation
44Machine assisted human translation
- MT produces artificial language (AL2)
- Post-editing efforts must be less than that
required for a full human translation
45Machine assisted human translation Three-stage
machine assistance
46Machine assisted human translation
- Text is prepared for MT by human pre-editing
- System produces output in AL2 which post-editors
can convert into a NL2 document - Final document is not distinguishable from a
human translation
47Machine assisted translation
- These models of MT hide the true nature of MT
- Rather an aid than an alternative to human
translation - Application is limited
- simplest and the most difficult types of MT
systems to design - Examples ALPS, ATLAS, WEIDNER, SYSTRAN
48Translation by reference to existing models
- System scans existing documents by
text-deconstruction method of text comparison - Identifies similar passages and offer these to
the translator as models for the new task
49MT as text-type specific independent systems
- automatic in the sense that human intervention
is not required between input and output - Is used
- Without the intervention of a human translator
- As a text-production system for previously edited
50MT as text-type specific independent systems
- Three forms of output
- Raw translation in AL2 suitable for post-editing
and possible conversion to NL2 - A final AL2 version which can be used almost in
same way as natural language text, has been
pre-editing - Unedited final translation, i.e. an artificial
language, which is acceptable for readers
51Reader-oriented MT
- Readers accept difficult-to-read texts if they
are cheap and above all fast
52Reader-oriented MT
- Output is machine-produced and therefore by
definition an artificial product which may be
easier or more difficult to understand than a NL
text - Not comparable to a human translation
- L2 reader receive a text in L1
- Submit the text to MT in full knowledge that the
output is a machine-translated text
53Writer-oriented MT
- Writer know better than anybody else what they
want to say - Translators have to interpret what writers have
said - Machine asks questions about elements which it
cannot analyse
54Writer-oriented MT
55Writer oriented editing of pre-translated text
- System offer menus of existing SL text segments
which are pre-translated - E.g. business letters choice of type of letter,
separate menus within the types
56EBMT
57Definition
- Man does not translate a simple sentence by
doing deep linguistic analysis, rather, man does
translation, first, by properly decomposing an
input-sentence into other language phrases, and
finally by properly composing these fragmental
translations into one long sentence. The
translation of each fragmental phrase will be
done by the analogy translation principle with
proper examples as its reference. (Nagao)
58Model EBMT
59- EMBT does not presuppose an analytic translation
- ? it is an analog translation system
- Founded on
- 1) translation by decomposing
- 2) translation of phrases
- 3) composing fragments into long section
60- EBMT consists of a bilingual corpus
- 1) a fixed corpus (how much is) of
sentence-pairs - Example How much is the bread? Wie teuer ist
das Brot? - How much is the car? Wie teuer ist das Auto?
- ?question varies by just one element (minimal
pair the bread/the car)
61- Often linked with translation memory (TM)
- it must in fact be possible to produce a
programme, which would enable the word processor
to remember whether any part of a new text
typed into it had already been translated. - ? T9-typing within mobile phones
62History
- Until the 80s ? rule-based translations
- Research dominated by corpus-based approaches
- 1) statistical machine translation
- 2) EBMT
- first suggested by Nagao Makoto in 1984
- soon attracted the attention of scientists in the
field of natural language processing.
63Matching
- First task in an EBMT system
- Searched for a word or phrase that closely
matches the source language - ? Example Where is the plate
- ? Correct translation Wo ist der Teller
- ? and not Wo ist die Platte
- most appropriate word is inserted
64Problems
- Long passages ? low probability of complete match
- Short passages ? probability of ambiguity
- Sentences are not translated completely but are
divided into smaller sections - ? often incoherent translation results
65- Problem of the size of the example database
- Some of the systems are more experimental than
others - Adding examples improves translation performance
- No improvement after an amount of examples, which
is too broad
66Problem suitability of examples
- Some examples have identical translation
- Same phrase may have two different translations
caused by inconsistency - Too big variety of examples may cause problems
with the choice of the exact word - Ambiguity
- Can lead to overgeneralization
67Problem storage of examples
- Normally words are stored with no further
information - To avoid ambiguity and to limit the choice
- ?Expansion of examples by adding contextual
markers - ?context is regarded in order to help finding the
right word
68Suitable translation problems
- EBMT best suited for sublanguage translation
- EBMT is often more suitable than MT
- Antidote to structure-preserving translation as
first choice
69Adaptability
- Most difficult step in EBMT process
- Appropriate fragments have to be extracted from
the text - Problem 1) words have to find its correspondence
to the matched portions - 2) find the correct
recombination, which is appropriate and
grammatical
70Boundary Friction Problem of inflection
- ?Example I ate the apple
- Translation Ich aß der Apfel
- Example II The apple is on the table
- Translation Der Apfel liegt auf dem Tisch.
- To solve the problem the translation system had
to contain a grammatical system of the target
language
71- Examples should be similar in internal and
external context - Example-retrieval can be scored on two counts
- ? closeness of the match between the input text
and the example - ? the adaptability of the example, on the basis
of the relationship between the representations
of the example and its translations
72Recombination
- New generation of the target text
- Last action in translation-process
- Often not possible to put translated phrases
together - ?Example Its raining outside Es regnet nach
draußen - Recombination has to make sure that the phrases
are put together conformly - ?Example Its raining outside Es regnet
draußen
73Computational Problems
- Huge costs in terms of
- ?storage
- ?creation
- ?matching/retrieval algorithms
- SPEED as a main issue
- ? A computer-translation has to be as fast as a
speech-translation
74Flavours of EBMT
- Used as a component in a MT-system
- EBMT can be used
- ?with other engines
- ?for certain problems
- ?when some other component cannot deliver a
result - EBMT bitter rival to the existing engines
75Example-based transfer
- Examples are stored as trees or other complex
structures as example-based transfer systems. - ? In these systems, source language input
strings are analysed into structured
representations in a conventional manner, only
transfer is on the basis of examples rather than
rules, and then generation of the target language
output is again done in a traditional way. (H.
Somers)
76Generalization
- Syntactic category
- Example
- ?play baseball yakyu o suru
- ?play tennis tenisu o suru
- ?play the piano piano o hiku
- ?play the violine baiorin o hiku
- Different vocabulary for play in Japanese,
engine has to distinguish whether an instrument
or sport is meant - Play x (NP/sport) x (NP) o suru
- Play x (NP/instrument) x (NP) o hiku
77Generalization
- Syntactic category
- Example
- ?play baseball yakyu o suru
- ?play tennis tenisu o suru
- ?play the piano piano o hiku
- ?play the violine baiorin o hiku
- Different vocabulary for play in Japanese,
engine has to distinguish whether an instrument
or sport is meant - Play x (NP/sport) x (NP) o suru
- Play x (NP/instrument) x (NP) o hiku
78- Semantic category
- A word must be chosen first
- Word is generated
- Word-level rule is made up
- The quality of the translation rules depends on
the quality of the thesaurus - Works best with non-idiomatic texts
79- Automatic category
- A simpler approach
- Less initial analysis of the corpora
- ?I am coming geliyorum
- ?I am going gidiyorum
- ?I am comeing gelHyoryHm
- ?I am going gidHyoryHm
- - I am stays fixed, while come and go differ
80Multi-engine system
- EBMT two other techniques knowledge based MT
and lexical transfer engine - Multi-engine system combines EBMT with
rule-based and corpus-based approaches - User can
- ?modify the results
- ?intervene in the choice of translation
- ?edit the output
81Conclusion
- What counts as EBMT?
- Use of a bilingual corpus
- Use of a reference corpus
- What is the aim of EBMT?
- ?to generalize the examples as much as possible
- What is the problem of EBMT?
- ? Some translations are suitable, some are not
82- Advantages of EBMT
- Examples are real language data overgeneration
is reduced - ?Linguistic knowledge can be more easily enriched
by adding more examples - ?can be quickly developed
- ?not as a rival but as an alternative
83Literatur
- Somers, H. (2003). An overview of EBMT. In
Michael Carl and Andy Way (eds) Recent advances
in Example-Based Machine Translation, Dordrecht
Kluwer, 3-57. - Sager, J. (1994). Language engineering and
translation consequences of translation.
Amsterdam. 267-292