Title: Machine Translation MT
1Machine Translation MT Computer-Assisted
Translation CAT
2Machine Translation
- Introduction to MT systems
- Generations of MT systems
- Different types of MT systems
- Construction of MT systems
- Knowledge representation
- Knowledge processing
- MT engines
- New directions of MT systems
- Evaluation of MT CAT systems
3Machine Translation Introduction
- Machine translation ( MT) is a long-term
scientific dream of enormous social, political
and commercial importance. - It was one of the earliest applications suggested
for digital computers, but turning this dream
into reality has turned out to be a much harder. - Despite different problems and difficulties, some
degree of Machine translation is now a daily
reality and it is likely that in the future, the
bulk of routine technical and business
translation will be done with some kind of
machine translation tools.
4Machine Translation History
- The history of MT research has gone through a
number of phases in which certain frameworks have
dominated. - First generation From the late 1960s the
syntactic orientation was dominant with syntactic
transfer approaches. - Second generation In the 1980s the AI
orientation was popular and more attention was
paid to semantics. - Third generation from 1990s the corpus-based
model with example-based methodologies is the
focus of much translation activity. (e.g. old
versions of Electronic Dictionaries) - Forth generation from 2000s research on spoken
translation has developed into a major focus of
MT activity. (e.g. latest versions of Electronic
Dictionaries ) - Last ten years research on Computer-Assisted
Translation CAT has developed into a major focus
of translation activity.
5How To Construct an MT system
6Knowledge Representation
- Different kinds of knowledge are generally needed
for Machine translation and must be represented
in such a way it can be processed automatically
by MTs - Knowledge of the source language
- Knowledge of the target language
- Knowledge of the various correspondences between
source language and target language (at least
knowledge of how individual words can be
translated) - Knowledge of the culture, social conventions,
etc. - Etc.
- Several kinds of linguistic knowledge are usually
distinguished - Phonological knowledge
- Morphological knowledge
- Syntactic knowledge
- Semantic knowledge
7Knowledge Representation Dictionary
- The central and largest component of an MT system
is Dictionary. - The size and quality of the dictionary limits the
scope and coverage of a system and the quality of
translation that can be expected. - Electronic dictionaries of MT must at least
represent the information we can find in paper
dictionaries in an appropriate fashion.
8Knowledge Representation Paper
Dictionaries
9Knowledge Representation Electronic Dictionaries
- Entries in MT monolingual-dictionary will be
equivalent to collection of attributes and
values, like the following - Lex button, catn, ntypecommon, numbersing,
humanno, concreteyes. - Lexbutton, catv, vtypemain, finite, person,
number,. - Entries can be implemented as records in a
database. - Entries in MT bilingual-dictionary are generally
represented by translation rules, like the
following - Button ?? ?? / ???? / ???/ ??? ... ???
- This allows the replacement of certain source
language oriented information with corresponding
target language information.
10Knowledge Representation Morphology
- Morphology is concerned with the internal
structure of words and how words can be formed. - MT CAT systems must add a morphological
components that can recognize different word
formation processes - Inflection a word is derived from another word
form by maintaining the same part of speech or
category walk ? walks - Describe regular inflections by general rules,
like - Lex walks, catv, finite, person3rd,numbersing
, tensepres) ?? Vs - Describe irregular inflections by explicit rules,
like - Lexbe, catv, finite, person3rd,numbersing,
tensepres) ?? is -
- Derivation a word of a different category is
derived from another word or word stem by
application of a process involving stems and
affixes grammar, ?grammatical, arrive ?arrival - regular derivational processes can be described
by rules - Irregular derivations can be solved simply by
listing all derived words - Compounding a new word or unit is formed by
combination of two or more words
11Knowledge Representation Syntax and Grammars
- Syntax is concerned with how sentences can be
made up out of words. - To describe syntax, a grammar (set of rules) is
generally used in MT CAT. - For the first kind of information, programmers
and developers with consultations of linguists
have to represent the concerned divisions of the
sentence into their constituent parts and the
categorization of these parts as nominal, verbal,
and so on. - Consider that in English a sentence consists of
noun phrase followed by an auxiliary verb
followed by a verb phrase. Noun phrase consists
of etc. We can represent these knowledge by
the following grammar - S ? NP (AUX) VP
- NP ? (DET) (ADJ) N PP
- VP ? V (NP) PP
- PP ? P NP
- N ? user printer
- V ? clean
- AUX ? should
- DET ? the a
- P ? with
- a user should clean the printer is a sentence
in the above grammar
12Knowledge Representation Meaning
- Knowledge about the meaning of sentences are an
important part of the translation process and
allow MT CAT systems to produce better results. - Three useful kinds of knowledge relating to the
meaning can be distinguished - Semantic knowledge meaning of words and
sentences independently of the context they
appear in. - Pragmatic knowledge meaning of expressions in
situations - Real world or common sense knowledge
- It is useful to represent these kind of knowledge
in MT CAT systems in order to increase their
performance. Accomplishing this goal proved to be
the most difficult task in the developing the MT
CAT systems.
13Knowledge Processing
- We give now an idea of how knowledge can be
manipulated automatically by MT systems - This can be done in two stages parsing and
generation - Parsing is the process of taking an input string
of expressions and producing representations
appropriate to the translation - Generation is the process of taking an
appropriate representation and producing the
corresponding sentence - A graphical representation will be used for
parsing and generation processes. However, the
internal representations are lists (very useful
data structures).
14Knowledge Processing Parsing
- The task of a parser is to take a formal grammar
and a sentence and - Check if it is indeed grammatical
- Show how the words are combined into phrases
- Different parsing methods exist and are
subdivided into two categories Top-Down parsing
method and Bottom-Up parsing method. - Examples of parsing using grammars defined in the
previous sections and sentence the user should
clean the printer are given bellow.
15Parsing Bottom-Up algorithm
16Parsing Top-Down algorithm
17Latest Engines in MT
- Speech Recognition MT trying to apply to MT
techniques which have been highly successful in
Automatic Speech Recognition. - Computer-assisted Translation the idea is to
collect a bilingual corpus of translation pairs
and then use a best match algorithm to find the
closest example to the source phrase in question.
Ex Trados, Worsfast etc.
18What is a CAT Tool?
- CAT stands for "Computer Aided Translation Tool".
The terms "Translation Memory" and "TM" are
sometimes used to refer to the same type of tool.
A CAT tool is a computer program that helps a
translator to work efficiently. This is
achieved through three main functions - A CAT tool breaks texts into segments (sentences
or sentence fragments) and presents the segments
in a convenient way, to make translating easier
and faster. In some tools, for example Tardos ,
each segment is presented in a special box, and
the translation can be entered in another box
right below the source text.
19- The translation of each segment is saved together
with the source text. Source text and translation
will always be treated and presented as a
translation units (TU). You can return to a
segment at any time to check the translation.
There are special functions which help to
navigate through the text and to find segments
which need to be translated or revised (quality
control). - The main function of a CAT tool is to save the
translation units in a database, called
translation memory , so that they can be re-used
for any other text, or even in the same text.
Through special "search" features. The search
functions of CAT tools can also find segments
which do not match 100. This saves time and
effort and helps the translator to use consistent
terminology.
20Evaluation of MT CAT Systems
- The evaluation of MT CAT systems is a complex
task. This is not only because many different
factors are involved, but because measuring
translation performance is itself difficult. - Clarity a traditionally way of assessing the
quality of translation is to assign scores to
output sentences. - Accuracy It is important to check whether the
meaning of the source is preserved in the
translation. - Error Analysis tries to establish how seriously
errors affect the translation output. - Test Suite running the system on a large corpus
of test texts will reveal different possible
problems.
21How to start using Trados?
- Steps to follow for creating, opening and
exporting a translation memory, and further basic
features of the software. -
- You have to take into account that these steps
correspond to SDL Trados 2006, so some menus can
be different in other Trados versions.
22To create a translation memory
- 1. Go to Windows / Start /All programas/ SDL
Internacional SDL Trados 2006 / Translators
Workbench. The software will start running and
will request the user name. - 2. Go to File / New.
- 3. A window will show where you have to choose
the source and target language by clicking on
Add. Then, click on Create. - 4. A window will display where you have to enter
the name for the TM and browse where to save it. - NoteNext time you open Translators Workbench,
the last memory used will be opened by default.
23(No Transcript)
24(No Transcript)
25To open a translation memory
- There are two ways of opening a translation
memory - 1. You can double click on the icon of the TM you
wish to open, - or open Trados Translators Workbench.
- 2. Go to File / Open
- 3. A window will be displayed, were you have to
look for the TM you want to open, and once found,
click on it.
26- The Trados TM will provide an existing equivalent
sentence in the TL if it matches 100. - The Trados TM will provide suggested words or
phrases in different colors if the equivalent
sentence does not match 100. - Easily select from the possible suggestions
offered by the MT. - Confirm these suggestions offered by MT or
simply type your own words or phrases. - click Ctrl Enter to confirm and move on to
another sentence. - Once finished translating the whole text, click
File . Save as .. rename the file . Saving is
accomplished .
27Some pieces of advice
- Dont press Enter when you are inside a
translation unit since you can break it. - Using the commands from the keyboard speeds up
the job. - If you have any problems go to Help/Help Topics
in Translators Workbench.
28Creating a Multiterm Termbase to Use in SDL
Trados Studio
- TWO IMPORTANT NOTES BEFORE YOU GET STARTED 1.
Multiterm is a separate program, it's not part of
Trados or Studio. It needs to be downloaded and
installed separately, and it appears as a
standalone program in your SDL folder in your All
Programs list in Windows. If you don't see it
there, make sure to go to your SDL account and
download and install the program from the My
Downloads page.2. Termbases cannot be created
in Trados or Studio. The "Create New Termbase"
you see in the SDL Trados main page or the
"Terminology Management" button in the Studio
home page are merely links that will take you to
Multiterm, if it's installed in your computer.
29Creating a simple Multiterm termbase
- Multiterm can be as simple or as complex as you
want it to be. In this example, the simplest kind
of termbase will created - source term
target term. - No other index fields will be included.1. Open
SDL Multiterm Desktop, Go to File, then select
Create Termbase then Save your termbase in the
dialog box that opens
30- Click Next on Step 1 to 5 of the Termbase Wizard
.choose your languages..Click Finish, - In this case the termbase has been created but
it's empty.
31- To manually add terms, click on the Terms tab on
the bottom left of your screen, click F3 or click
on the Add New Entry icon right under the Edit
menu. You will see the Entry screen, as shown
below.
32- Double click on the little box next to the pencil
icon and enter the term for each entry.
33- Press F12 to save the changes.
- The term is now part of your termbase and
therefore will be available when you use the
termbase in Studio.This concludes the basics of
termbase creation.