Leaving the Last Century - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Leaving the Last Century

Description:

Translation Memories are the extreme of data driven methods ... Words not in lexicon, spelling mistakes, and sentence (translation unit) not covered by rule. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 45
Provided by: EST6
Category:

less

Transcript and Presenter's Notes

Title: Leaving the Last Century


1
Leaving the Last Century
  • New Solutions for TM and MT

ESTeam AB Dublin 9/1/ 2002
2
Rule Based Data Driven Methods
  • Translation Memories are the extreme of data
    driven methods
  • Systran contains an old fashioned lexicon
    structure
  • Logos, Globalink, PTrans, etc. are rule based MT
    solutions

3
Rule based systems in MT
  • 30 years of proven failure
  • 30 years of commercial failure
  • 30 years of still being used since there was no
    alternative

4
Rule Based MT
  • Development time 2 years minimum per language
    pair
  • Development cost 1 mil. per language pair
  • Equally low quality in all domains
  • Systran has cost the EU 200 MY and 200 Mil.
    Euro for 20 language pairs

5
Where is the problem ?
  • Rules do not work everywhere in fact the times
    a parse succedes on real data is about 1 in 100
    why?
  • Words not in lexicon, spelling mistakes, and
    sentence (translation unit) not covered by rule.
  • Link between real texts and grammar missing
  • Peter doesnt love Mary we just think he does

6
Where is the problem? cont...
  • The biggest failure in Translation Solutions of
    all times is Eurotra 220 Mil. Euro spent on
    writing analysis rules
  • All analysis is MONOLINGUAL but Translation
    isnt, it is the relationship between 2
    languages, the source and its translation, and
    the world (domain) that they represent

7
Data Driven Solutions not new
  • In the 70ies corpus based research
  • Computers were too slow and too small
  • Ideas were there but couldnt be applied

8
The First Data Driven Solutions
  • Translation Memories date back to ca 1980 and
    have been in use since ca. 1985.
  • In the translation tool market there is only one
    commerical success Trados
  • Moving beyond TM

9
Data Driven Methods in MT
  • Several tests during the last 10 years by IBM,
    Sharp Labs and many more
  • Counting alone doesnt do it.
  • Credible solutions come from the merge with
    Translation Memory methods (a type of example
    based MT)

10
Integrating TM and MT
  • Where do we select TM and where MT?
  • Issues
  • How deep can we go with TM and not lose the
    sentence structure
  • MT is always low quality how can we improve it
    by using the TM
  • Is there a difference between Sub-sentence TM and
    MT?

11
Requirements for Success
  • Translation Memories
  • Internet data
  • Lexicons
  • Any monolingual language material
  • Good computers and serious data storage and
    access tools

12
Building Resources for TMT
  • Global structure of domain
  • Monolingual issues in each domain
  • Sentence Phrase Word distribution
  • Word context statistics
  • etc.
  • Translation issues in each domain
  • Sentence Phrase Word alignment

13
State of the Art
  • MT Rule based
  • Disambiguation on word class
  • Cannot be tuned to a domain
  • Language pair based (source
    language
  • dependent because of analysis)
  • TM Sentence based
  • Single user
  • Language pair based
  • Project based

14
ESTeam Translator
  • Multi-domain
  • Multi-User (Client Server)
  • Multi-lingual
  • TM on Sentence and Sub-Sentence levels
  • MT on the remaining parts of the sentence Uses
    Rules and Statistical Methods
  • Improves MT translations in a domain by
  • Statistical disambiguation filters
  • Structuring lexicons automatically
  • Post-Editing MT (Target Language Verification)

15
ESTeam Goals
  • Maximum reuse of data resources
  • Highest possible translation quality
  • Optimal control of data
  • Full operational control (Workflow)
  • Multiple usage with the same tools
  • Translation Support
  • Information Browsing

16
Client Example
  • Translation Agency with 11 subsidiary translation
    companies and freelance translators in more than
    11 countries
  • Human Translation supported optimally at all
    levels
  • ESTeam Translation Workflow controls operations

17
Translation Tools
  • ESTeam Translator
  • Translation Memory
  • Machine Translation
  • Term Tool
  • Translation Memory Admin Tool
  • Concordancer
  • Aligner
  • Translation Tools Administration

18
Translation Memory
  • Unique on the market
  • Multi-lingual
  • Multi-domain
  • Multi-level
  • sentence
  • subsentence
  • Client Server

19
Link Architecture
  • Greek
  • Italian Spanish
  • German French
  • Danish
    English
  • Dutch
    Finnish
  • Portuguese Swedish
  • New language

20
Multilingual Linking in TM and MT
FR EN IT EL ES FI PT SV DE DA NL
NL DA DE SV PT FI ES EL IT
21
2Level TM
  • Sentence
  • Chemical and pharmaceutical products, all
    intended for industrial purposes.

Sub-Sentence
,
and
Chemical
pharmaceutical products
.
all intended for industrial purposes
22
Creating Sub-Sentence Resources
  • Analysing segmentation points
  • Aligning sentence TM data
  • Automatic statistical processing
  • Manual intervention
  • Loading TM with sub-sentence data

23
Existing Data Resources
  • Available TM Sentence Data
  • Language pairs Number of Sentences
  • English to French 191.680
  • English to Portuguese 167.287
  • Portuguese to FR 3.740
  • French to PT 42.910

24
TM Multilingual Linking Effect
  • ESTeam Multi-lingual and Multi-level Approach
    with the same data
  • Language Combinations Sentences Sub-sent.
  • English to/from French 191.680 61.848
  • English to/from Portuguese 167.287 60.695
  • French to/from Portuguese 150.048 56.562

25
MT and TM Integration
  • Theoretical assumptions
  • MT is erroneous, thus the less MT the better
  • Conclusions
  • Cost-effective MT development
  • Minimize MT compared to TM
  • Tailor TM to work for information
  • browsing as well as translation support

26
Lexical Machine Translation
  • Information in the lexicon
  • Domain, Frequency and Category info on Source and
    Target
  • Disambiguation through shared info and
    reliability on the link
  • Lexical/Category based rules that are part of the
    database
  • Multi-lingual
  • As in TM all languages CAN be linked to each
    other by translating into a previously existing
    language

27
Target Language Verification
  • Assumption MT is erroneous - Data is correct
  • Example
  • Source t??f?d?t??? µpata????
  • MT chargers batteries
  • TLV battery chargers

28
Languages in MT and TM
  • Current All EU Languages into all and Norwegian
    (TM is Unicode any language can be catered for)
  • Goal for 2002 Start work on all EU Minority
    Languages and Icelandic - linking them to each
    other and all EU languages
  • Eastern European Languages to be integrated by
    request (Approximate development duration 6
    months for each new language linked to all
    current EU languages)

29
Developing New Languages
  • Semi-Automatic Lexicon Building
  • Manual entry of Special Words and Morphology for
    TLV and TM Fuzzy
  • Re-using translators work for automatic solutions
  • Sub-Sentence TM
  • Phrases in Lexicon
  • Automatic Alignment
  • sentence
  • phrase/word

30
Term Tool
  • User friendly interface to all the functionality
    of the terminology and translation lexicons
  • For Information Browsing when translating
  • For building new languages and enhancing existing
    lexicons

31
TM Admin Tool
  • User friendly interface to all the functionality
    of the Translation Memory including editing and
    viewing
  • For Information Browsing when translating
  • For enhancing, correcting and building new
    translation memories

32
Concordancer
  • Bilingual support tool for freelance translators
  • Easy to install and use
  • Directly linked to interface and stores the
    translations carried out by the translator
  • Database MySQL

33
Aligner
  • Goal to find as many good matches as possible to
    build an application and discard the rest
  • Uses the lexicon to verify alignment
  • Manual intervention minimal

34
Translation Tools Admin
  • Specify how to run a translation
  • Operate TM
  • select sentence or sub-sentence or both
  • set fuzzy threshold
  • etc
  • Operate MT
  • activate rules
  • activate statistics
  • set quality threshold
  • etc

35
Translation Workflow
  • Local Workflow
  • Automatic Translation
  • On-line Manual Translation / Revision
  • Subsidiary Workflow
  • Local Distribution / Collection
  • Off-line Manual Translation

36
Local Workflow Features
  • Roles
  • Workflow Manager
  • Translators / Revisers
  • Enterprise Integration
  • Address book personnel and clients
  • Distribution of work, availability
  • Administration Data
  • Archiving of processed files

37
Local Workflow (pt. 1)
38
Subsidiary Workflow
39
Local Workflow (pt. 2)
40
What is different?
  • Multi-lingual
  • Multi-level TM
  • Single Global TM (client server and structure of
    database)
  • Data used to decide on selection in MT
  • Data used to change MT output

41
Effect
  • For information browsing
  • cost savings serious both in development and
    production
  • speed of development for new languages
  • For translation support
  • Client example using TM data from another system
    provided 6 translation saving using TM only

42
Technical details
  • All Tools are
  • Client Server
  • Unicode compliant
  • developed in C and Oracle
  • running on Unix and Windows
  • All Interfaces are developed in Java
  • Workflow is developed in C, Java and Lotus Domino

43
ESTeam AB
  • Developer of Automatic Translation Solutions
  • Legal Residence Gothenburg, Sweden, since 1995
  • Development Site in Athens, Greece
  • Marketing site in UK
  • Products
  • ESTeam Translator? (2000)
  • ESTeam Translation WorkFlow ? (2001/2)
  • ESTeam BTR? (1996)

44
Contact Info
  • ESTeam AB
  • Markou Botsari 15
  • 145 61 Athens
  • Greece
  • Email esteam_at_otenet.gr
  • Tel 30 10 8085704
Write a Comment
User Comments (0)
About PowerShow.com