Translating Subtitles using Machine Translation - PowerPoint PPT Presentation

About This Presentation
Title:

Translating Subtitles using Machine Translation

Description:

Translating Subtitles using Machine Translation. Practices, Problems, Methodology ... an HTML interface), and a compiler for runtime machine translation dictionaries. ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 29
Provided by: pell6
Category:

less

Transcript and Presenter's Notes

Title: Translating Subtitles using Machine Translation


1
  • Translating Subtitles using Machine Translation
  • Practices, Problems, Methodology
  • Elsa Sklavounou, Ph. D.
  • Linguist, Co-funded Projects Technical
    Coordinator
  • SYSTRAN

2
SYSTRAN MT Customization MethodologyOverview
  • A customization project involves three different
    customization levels that provide incremental
    higher translation quality
  • Basic Terminology
  • Complex Terminology
  • Linguistic Rules
  •  
  •  

3
SYSTRAN MT Customization MethodologyOverview
  • Basic Terminology
  • The first step entails the creation of a User
    Dictionary that covers most of the noun
    terminology in the corpus, and various simple
    adjective and verb terms.
  • Complex Terminology
  • The second level concerns the coding of complex
    terminological entries such as the coding of
    complex verbs with their complements (subject,
    object) and their translations.
  • Linguistic Rules
  • The third level involves language-specific code
    modifications in the SYSTRAN linguistic modules.
  •  
  •  

4
SYSTRAN MT Customization MethodologyLevel 1
Level 2
  • Customization level 1 and 2 focuses on the
    implementation in the systems of specialized
    terminology from the corpus. Level 1 and 2 tasks
    include
  • Simple and complex terms extraction
  • Simple and complex terms translations
  • Simple and complex terms coding
  • Simple and complex terms review

5
SYSTRAN MT Customization MethodologyLevel 1
Level 2
  • Step 1 Corpus installation and analysis
  • Prerequisite 1 a formatted corpus
  • Step 2 Term extraction
  • Simple terms (nouns and noun expressions)
  • Complex terms (verb patterns)
  • DNT (Do Not Translate) integration

6
SYSTRAN MT Customization MethodologyLevel 3
  • Customization level 3 focuses on the
    implementation of linguistic rules uniquely
    adapted to language-specific syntactic and
    semantic issues found in translations taken from
    the corpus. Level 3 tasks include
  •  
  • Detailed linguistic evaluations and the
    development of a comprehensive customization
    plan
  • Implementation of customized rules
  •   Regression tests
  •   Correction of linguistic translation errors
  •   Acceptance testing before release

7
SYSTRAN MT Customization MethodologyQuality
Levels
  • Estimate of the quality levels that may be
    achieved for each customization level.

8
SYSTRAN MT Customization MethodologySoftware
Tools
  •  
  • The process for coding simple and complex terms
    and related dictionary maintenance is managed by
    the SYSTRAN Linguistics Platform that integrates
    the following two tools, required to complete
    customization levels 1 and 2.
  •  
  •  

9
SYSTRAN MT Customization MethodologySoftware
Tools
  •   
  • SYSTRAN Dictionary Manager
  • The SYSTRAN Dictionary Manager (SDM) enables
    translators to build and manage multilingual
    dictionaries. SDM includes preparation steps for
    dictionary coding tasks, an online dictionary
    lookup (via an HTML interface), and a compiler
    for runtime machine translation dictionaries. It
    is composed of three main components a database,
    HTML query form (dictionary lookup, reports,
    logs, import and export) and a Windows client
    (interactive coding tool).
  •  

10
SYSTRAN Customization Methodology Software
Tools
  • The SYSTRAN Review Manager (SRM) is a
    productivity tool used for
  • the review
  • quality assessment and
  • maintenance of linguistic resources used combined
    with a SYSTRAN system.

11
SYSTRAN Customization MethodologyPrerequisite
1 a formatted grammatical corpus
  • Grammar Writing Rules
  • Using Articles
  • Avoiding Speech Ambiguity
  • Using Enumeration
  • Ensuring Subject-Verb Agreement
  • Using Prepositions
  • Using Infinitives at the Beginning of Sentences
  • Using Imperatives
  • Observing Punctuation Rules
  • Using Main Clauses
  • Using Subordinate Clauses
  • Using Relative Clauses
  • Avoiding Multiple Stacking
  • Using Compound Words
  • Using Capitalization
  • Using Spelling Variations
  • Lexical Ambiguities
  • Disambiguation of Product Names and Menus
  • Avoiding Lexical Ambiguities
  • Using Compounds
  • Format and Typographical Issues
  • Segmentation

12
SYSTRAN Customization Methodologyfor MUSA
  • Two-process fully-automatically generated Corpus
  • Speech Recognition (KU Leuven),
  • Automatic Sentence Compression (CNTS)
  • First priority
  • Subtitles Constraints
  • Second Priority
  • The least possible ambiguous content
  • Lesson learned No prerequisite

13
SYSTRAN MT Customization MethodologyUpgraded
Software Tools (Client Tools v5)
14
SYSTRAN Translation Project Manager Terminology
ReviewNot Found Words Extraction
  • Reviewing Terminology and Sentences
  • The Terminology Review tab in the Review window
    lets you identify expressions such as Not Found
    Words or Terminology extracted by the software.

15
SYSTRAN Translation Project Manager Terminology
ReviewNot Found Words ExtractionExamples
  • SRC_Id
  • these parents know measles can be dangerous, but
    they don't want their child to have MMR, the
    triple vaccine which protects them from measles,
    mumps and rubella.
  • Raw MT
  • ces parents savent la rougeole peut être
    dangereuse, mais ils ne veulent pas que leur
    enfant a MMR, le vaccin triple qui les protège
    contre la rougeole, les oreillons et la rubéole.

16
SYSTRAN Translation Project ManagerAlternative
Meanings
  • Alternative Meanings
  • shows alternative translations based on different
    meanings of a source word or expression.
  • The Alternative Meanings tab in the Review window
    shows alternative meanings for expressions in
    SYSTRAN or User Dictionaries

17
SYSTRAN Translation Project ManagerAlternative
MeaningsExamples
  • SRC_Id
  • they'd rather pay for single vaccines at 60
    pounds a shot, even though the government insists
    MMR is safe.
  • Raw MT
  • ils payeraient plutôt les vaccins uniques à 60
    livres un coup de feu, quoique le gouvernement
    exige que MMR est sûr.
  • Customized MT
  • ils payeraient plutôt les vaccins uniques à 60
    livres une injection, quoique le gouvernement
    exige que MMR est sûr.

18
SYSTRAN Dictionary Manager User Dictionaries
(UDs)
  • User Dictionaries (UDs) let you increase the
    quality of source language analyses, which also
    increases the
  • translation output for all associated target
    languages. UDs can be used for a number of
    functions, including
  • Automatically translating Not Found Words in the
    SYSTRAN dictionary.
  • Overriding the target-language meaning of a word
    or expression in the SYSTRAN dictionaries, a
    capability that lets you customize translation
    output to fit specific needs.
  • Ensuring that an expression is always treated as
    a unit by SYSTRAN analysis programs.

19
SYSTRAN Dictionary Manager User Dictionaries
(UDs)Metrics
  • Type of Dictionary
  • ENFR
  • ENEL
  • Do Not Translate Words
  • 3532 entries (enxx)
  • Proper Nouns
  • 1495 entries (enfr)
  • 1495 entries (enel)
  • MUSA Terminology
  • 1443 entries (enfr)
  • 5228 entries (enel)
  •  
  •  
  •  

20
SYSTRAN Dictionary Manager User Dictionaries
(UDs)Examples
  • SRC_ID
  • Andrew Wakefield ignited the debate over MMR by
    announcing the findings of research into a group
    with autism and bowel disease.
  • Raw MT
  • Andrew Wakefield a enflammé la discussion
    au-dessus de MMR en annonçant les résultats de la
    recherche dans un groupe avec la maladie d'autism
    et d'entrailles.
  • Customized MT
  • Andrew Wakefield a enflammé la discussion
    au-dessus de MMR en annonçant les résultats de la
    recherche dans un groupe avec autisme et maladie
    d'entrailles.
  •  
  •  
  •  

21
SYSTRAN Translation Project Manager Source
AnalysisInteractive Disambiguation
  • The Source Analysis tab in the Review window
    shows how the software handled source ambiguities
    and allows you to override the software
    selections.

22
SYSTRAN Translation Project Manager Source
AnalysisInteractive DisambiguationExamples
  • ID 523
  • At first we thought it was parts of the building
    but it was people, literally people falling all
    around us.
  • Raw MT
  • D'abord nous avons pensé que ce faisait partie du
    bâtiment mais c'était les gens, peuplent
    littéralement la chute tout autour de nous.
  • Customized MT
  • Dabord nous avons pensé que cetait des
    fragments du bâtiment, mais cétait des gens,
    littéralement des gens qui tombaient autour de
    nous.

23
SYSTRAN Dictionary Manager Normalization
Dictionaries (NDs)
  • Normalization Dictionaries (NDs)
  • There are two types of Normalization Dictionaries
    (NDs) source normalization and target
    normalization.
  • Source normalization normalizes source document
    before translation.
  • Target normalization adapts translation output to
    user needs in term of terminology consistency.
  • It can also provide a way to replace expressions
    chosen by the softwares translation engine with
    user-defined expressions.

24
SYSTRAN Dictionary Manager Normalization
Dictionaries (NDs)Examples
  • SRC_IDs
  • we did n't know she had measles but we do.
  • I mean I ca n't help...
  • Raw MT
  • nous avons fait le n't savons qu'il a eu la
    rougeole mais nous faisons.
  • Je veux dire l'aide de n't d'I ca
  • Customized MT via SRC Normalization
  • nous n'avons pas su qu'il a eu la rougeole mais
    nous faisons. Je veux dire que je ne peux pas
    aider

25
SYSTRAN Translation Project Manager Sentence
Reviewfor Translation Memory Construction
  • The Sentence Review tab in the Review window
    compares sentences in the source and target.
  • You can then check the sentences you want to send
    to User Dictionaries, where you can work with
    them further in order to post-edit them and
    construct Translation Memories.

26
SYSTRAN Dictionary Manager Translation Memories
(TMs)
  • Translation Memory (TM)
  • A set of translated and validated sentences that
    can be integrated into the translation process.
    Translation Memories (TMs) are databases of
    aligned pre-translated sentences.
  • Unlike Dictionaries, TM
  • entries can be formatted (for example, italic or
    bold) and are used by the translation engine to
    perform
  • matches on full sentences in the source document.
    TMs are not usually created manually, but are
    built using
  • SYSTRANs Translation Project Export or from TMX
    files.

27
SYSTRAN Dictionary Manager Translation Memories
(TMs)Examples
  • ID 370
  • Now people kind of started panicking and said
    we've got to leave no matter what.
  • Raw MT
  • Maintenant sorte de personnes de panique
    commencée et dite nous avons pour laisser
    n'importe ce que.
  • Customized MT
  • Les gens maintenant avaient lair de paniquer
    disant quils devaient à tout prix partir.

28
SYSTRAN Dictionary Manager Translation Memories
(TMs)
  • Translation Memory Import/Export
  • Already existent Tmx standard translation memory
    exchange files can be imported/exported via
    SYSTRAN Dictionary Manager .
Write a Comment
User Comments (0)
About PowerShow.com