Title: Translating Subtitles using Machine Translation
1- Translating Subtitles using Machine Translation
- Practices, Problems, Methodology
- Elsa Sklavounou, Ph. D.
- Linguist, Co-funded Projects Technical
Coordinator - SYSTRAN
2SYSTRAN MT Customization MethodologyOverview
- A customization project involves three different
customization levels that provide incremental
higher translation quality - Basic Terminology
-
- Complex Terminology
- Linguistic Rules
-
-
-
-
3SYSTRAN MT Customization MethodologyOverview
- Basic Terminology
- The first step entails the creation of a User
Dictionary that covers most of the noun
terminology in the corpus, and various simple
adjective and verb terms. - Complex Terminology
- The second level concerns the coding of complex
terminological entries such as the coding of
complex verbs with their complements (subject,
object) and their translations. - Linguistic Rules
- The third level involves language-specific code
modifications in the SYSTRAN linguistic modules. -
-
-
-
4SYSTRAN MT Customization MethodologyLevel 1
Level 2
- Customization level 1 and 2 focuses on the
implementation in the systems of specialized
terminology from the corpus. Level 1 and 2 tasks
include - Simple and complex terms extraction
- Simple and complex terms translations
- Simple and complex terms coding
- Simple and complex terms review
5SYSTRAN MT Customization MethodologyLevel 1
Level 2
- Step 1 Corpus installation and analysis
- Prerequisite 1 a formatted corpus
- Step 2 Term extraction
- Simple terms (nouns and noun expressions)
- Complex terms (verb patterns)
- DNT (Do Not Translate) integration
6SYSTRAN MT Customization MethodologyLevel 3
- Customization level 3 focuses on the
implementation of linguistic rules uniquely
adapted to language-specific syntactic and
semantic issues found in translations taken from
the corpus. Level 3 tasks include -
- Detailed linguistic evaluations and the
development of a comprehensive customization
plan - Implementation of customized rules
- Regression tests
- Correction of linguistic translation errors
- Acceptance testing before release
7SYSTRAN MT Customization MethodologyQuality
Levels
- Estimate of the quality levels that may be
achieved for each customization level.
8SYSTRAN MT Customization MethodologySoftware
Tools
-
- The process for coding simple and complex terms
and related dictionary maintenance is managed by
the SYSTRAN Linguistics Platform that integrates
the following two tools, required to complete
customization levels 1 and 2. -
-
9SYSTRAN MT Customization MethodologySoftware
Tools
-
- SYSTRAN Dictionary Manager
- The SYSTRAN Dictionary Manager (SDM) enables
translators to build and manage multilingual
dictionaries. SDM includes preparation steps for
dictionary coding tasks, an online dictionary
lookup (via an HTML interface), and a compiler
for runtime machine translation dictionaries. It
is composed of three main components a database,
HTML query form (dictionary lookup, reports,
logs, import and export) and a Windows client
(interactive coding tool). -
10SYSTRAN Customization Methodology Software
Tools
- The SYSTRAN Review Manager (SRM) is a
productivity tool used for - the review
- quality assessment and
- maintenance of linguistic resources used combined
with a SYSTRAN system.
11SYSTRAN Customization MethodologyPrerequisite
1 a formatted grammatical corpus
- Grammar Writing Rules
- Using Articles
- Avoiding Speech Ambiguity
- Using Enumeration
- Ensuring Subject-Verb Agreement
- Using Prepositions
- Using Infinitives at the Beginning of Sentences
- Using Imperatives
- Observing Punctuation Rules
- Using Main Clauses
- Using Subordinate Clauses
- Using Relative Clauses
- Avoiding Multiple Stacking
- Using Compound Words
- Using Capitalization
- Using Spelling Variations
- Lexical Ambiguities
- Disambiguation of Product Names and Menus
- Avoiding Lexical Ambiguities
- Using Compounds
- Format and Typographical Issues
- Segmentation
12SYSTRAN Customization Methodologyfor MUSA
- Two-process fully-automatically generated Corpus
- Speech Recognition (KU Leuven),
- Automatic Sentence Compression (CNTS)
- First priority
- Subtitles Constraints
- Second Priority
- The least possible ambiguous content
- Lesson learned No prerequisite
13SYSTRAN MT Customization MethodologyUpgraded
Software Tools (Client Tools v5)
14SYSTRAN Translation Project Manager Terminology
ReviewNot Found Words Extraction
- Reviewing Terminology and Sentences
- The Terminology Review tab in the Review window
lets you identify expressions such as Not Found
Words or Terminology extracted by the software.
15SYSTRAN Translation Project Manager Terminology
ReviewNot Found Words ExtractionExamples
- SRC_Id
- these parents know measles can be dangerous, but
they don't want their child to have MMR, the
triple vaccine which protects them from measles,
mumps and rubella. - Raw MT
- ces parents savent la rougeole peut être
dangereuse, mais ils ne veulent pas que leur
enfant a MMR, le vaccin triple qui les protège
contre la rougeole, les oreillons et la rubéole.
16SYSTRAN Translation Project ManagerAlternative
Meanings
- Alternative Meanings
- shows alternative translations based on different
meanings of a source word or expression. - The Alternative Meanings tab in the Review window
shows alternative meanings for expressions in
SYSTRAN or User Dictionaries
17SYSTRAN Translation Project ManagerAlternative
MeaningsExamples
- SRC_Id
- they'd rather pay for single vaccines at 60
pounds a shot, even though the government insists
MMR is safe. - Raw MT
- ils payeraient plutôt les vaccins uniques à 60
livres un coup de feu, quoique le gouvernement
exige que MMR est sûr. - Customized MT
- ils payeraient plutôt les vaccins uniques à 60
livres une injection, quoique le gouvernement
exige que MMR est sûr.
18SYSTRAN Dictionary Manager User Dictionaries
(UDs)
- User Dictionaries (UDs) let you increase the
quality of source language analyses, which also
increases the - translation output for all associated target
languages. UDs can be used for a number of
functions, including - Automatically translating Not Found Words in the
SYSTRAN dictionary. - Overriding the target-language meaning of a word
or expression in the SYSTRAN dictionaries, a
capability that lets you customize translation
output to fit specific needs. - Ensuring that an expression is always treated as
a unit by SYSTRAN analysis programs.
19SYSTRAN Dictionary Manager User Dictionaries
(UDs)Metrics
- Type of Dictionary
- ENFR
- ENEL
- Do Not Translate Words
- 3532 entries (enxx)
- Proper Nouns
- 1495 entries (enfr)
- 1495 entries (enel)
- MUSA Terminology
- 1443 entries (enfr)
- 5228 entries (enel)
-
-
-
20SYSTRAN Dictionary Manager User Dictionaries
(UDs)Examples
- SRC_ID
- Andrew Wakefield ignited the debate over MMR by
announcing the findings of research into a group
with autism and bowel disease. - Raw MT
- Andrew Wakefield a enflammé la discussion
au-dessus de MMR en annonçant les résultats de la
recherche dans un groupe avec la maladie d'autism
et d'entrailles. - Customized MT
- Andrew Wakefield a enflammé la discussion
au-dessus de MMR en annonçant les résultats de la
recherche dans un groupe avec autisme et maladie
d'entrailles. -
-
-
21SYSTRAN Translation Project Manager Source
AnalysisInteractive Disambiguation
- The Source Analysis tab in the Review window
shows how the software handled source ambiguities
and allows you to override the software
selections.
22SYSTRAN Translation Project Manager Source
AnalysisInteractive DisambiguationExamples
- ID 523
- At first we thought it was parts of the building
but it was people, literally people falling all
around us. - Raw MT
- D'abord nous avons pensé que ce faisait partie du
bâtiment mais c'était les gens, peuplent
littéralement la chute tout autour de nous. - Customized MT
- Dabord nous avons pensé que cetait des
fragments du bâtiment, mais cétait des gens,
littéralement des gens qui tombaient autour de
nous.
23SYSTRAN Dictionary Manager Normalization
Dictionaries (NDs)
- Normalization Dictionaries (NDs)
- There are two types of Normalization Dictionaries
(NDs) source normalization and target
normalization. - Source normalization normalizes source document
before translation. - Target normalization adapts translation output to
user needs in term of terminology consistency. - It can also provide a way to replace expressions
chosen by the softwares translation engine with
user-defined expressions.
24SYSTRAN Dictionary Manager Normalization
Dictionaries (NDs)Examples
- SRC_IDs
- we did n't know she had measles but we do.
- I mean I ca n't help...
- Raw MT
- nous avons fait le n't savons qu'il a eu la
rougeole mais nous faisons. - Je veux dire l'aide de n't d'I ca
- Customized MT via SRC Normalization
- nous n'avons pas su qu'il a eu la rougeole mais
nous faisons. Je veux dire que je ne peux pas
aider -
25SYSTRAN Translation Project Manager Sentence
Reviewfor Translation Memory Construction
- The Sentence Review tab in the Review window
compares sentences in the source and target. - You can then check the sentences you want to send
to User Dictionaries, where you can work with
them further in order to post-edit them and
construct Translation Memories.
26SYSTRAN Dictionary Manager Translation Memories
(TMs)
- Translation Memory (TM)
- A set of translated and validated sentences that
can be integrated into the translation process.
Translation Memories (TMs) are databases of
aligned pre-translated sentences. - Unlike Dictionaries, TM
- entries can be formatted (for example, italic or
bold) and are used by the translation engine to
perform - matches on full sentences in the source document.
TMs are not usually created manually, but are
built using - SYSTRANs Translation Project Export or from TMX
files.
27SYSTRAN Dictionary Manager Translation Memories
(TMs)Examples
- ID 370
- Now people kind of started panicking and said
we've got to leave no matter what. - Raw MT
- Maintenant sorte de personnes de panique
commencée et dite nous avons pour laisser
n'importe ce que. - Customized MT
- Les gens maintenant avaient lair de paniquer
disant quils devaient à tout prix partir.
28SYSTRAN Dictionary Manager Translation Memories
(TMs)
- Translation Memory Import/Export
- Already existent Tmx standard translation memory
exchange files can be imported/exported via
SYSTRAN Dictionary Manager .