xml:tm - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

xml:tm

Description:

Rigorous control of grammar and terminology can produce very good results ... Allows multiple output formats - PDF, XHTML, WAP. Translating XML Documents ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 29
Provided by: xmli
Category:
Tags: xhtml | xml

less

Transcript and Presenter's Notes

Title: xml:tm


1
xmltm
  • XML Text Memory
  • Using XML technology to reduce the cost of
    translating XML documents

2
Computational Linguistic Methodologies
  • Machine Translation
  • Translation Memory
  • Hybrid Linguistic Inferencing Engines
  • Terminology

3
Automating Translation
  • Machine translation
  • 40 year history
  • Rigorous control of grammar and terminology can
    produce very good results
  • Enormous amount of work left to achieve free
    format translation.

4
Translation Memory
  • Align source and target text
  • Look up new text against memory
  • Relatively primitive technology
  • No advance over past 30 years
  • Need for proofing
  • Proprietary translation memory formats

5
Translating XML Documents
  • XML inherently easier to translate
  • Separation of form and content
  • Support for Unicode and other international
    encoding formats.
  • Allows multiple output formats - PDF, XHTML, WAP

6
XML Translation Standards
  • LISA - Localization Industry Standards
    Association http//www.lisa.org
  • OASIS - Organization for the Advancement of
    Structured Information Standards
    http//www.oasis-open.org
  • W3C - World Wide Web Consortium
    http//www.w3c.org
  • OLIF Consortium http//www.olif.net

7
LISA Standards
  • TMX - Translation Memory Exchange format
    http//www.lisa.org/tmx
  • TBX - Termbase Exchange format
    http//www.lisa.org/tbx
  • SRX - Segmentation Rules Exchange format
    http//www.lisa.org/srx
  • GMX - GILT Metrics Exchange format
    http//www.lisa.org/gmx

8
OASIS L10N Standards
  • XLIFF - XML Localization Interchange File
    Format http//www.oasis-open.org/committees/tc_ho
    me.php?wg_abbrevxliff
  • TransWS - Translation Web Services
    http//www.oasis-open.org/committees/tc_home.php?w
    g_abbrevtrans-ws
  • DITA Darwin Information Technology Architecture
    http//www.oasis-open.org/committees/tc_home.php?w
    g_abbrevdita

9
W3C and OLIF
  • W3C ITS
  • http//www.w3.org/International/
  • http//www.w3.org/International/its
  • OLIF - Open Lexicon Interchange Format
    http//www.olif.net

10
XML namespace
  • Major feature of XML
  • Allows the mapping of different ontological
    entities onto the same representation
  • Allows different ways to look at the same data
  • Namespaces can be made transparent

11
xmltm
  • XML based text memory
  • Revolutionary approach to translating XML
    documents
  • First significant advance in translation memory
    technology
  • Uses XML namespace to transparently embed
    contextual information

12
xmltm namespace
  • Text Memory namespace
  • Can be mapped onto any XML document
  • Vertical view of document in terms of text
    segments
  • Can be totally transparent

13
xmltm namespace
Example of the use of tm namespace in an XML
document
ltdocument xmlnstm"urnxml-Intl-tm" gt lttmtmgt
ltsectiongt ltparagt lttmtegt
lttmtugt Namespace is very flexible.
lt/tmtugt lttmtugt It is very
easy to use. lt/tmtugt lt/tmtegt
lt/paragt
14
xmltm namespace
original document view
tm namespace view
doc
tm
title
te
text
tu
text
section
section
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
para
text
te
sentence
sentence
tu
tu
15
xmltm namespace
original document view
text
ltparagt
Namespace is very simple. It is easy to use.
lt/paragt
tm namespace view
sentence
sentence
tu
te
tu
ltparagt
lttmte ide1gt
lttmtu idu1.1gt
lt/tmtugt
Namespace is very simple.
lttmtu idu1.2gt
lt/tmtugt
It is easy to use.
lt/tmtegt
lt/paragt
16
xmltm Text Memory
  • Author memory
  • Maintain memory of source text
  • Authoring statistics
  • Authoring tool input
  • Translation memory
  • Automatic alignment
  • Maintain perfect link of source and target text
  • Reduce translation costs

17
xmltm DOM differencing
Source Document
Updated Source Document
tu id1
tu id1
tu id2
deleted
tu id3
tu id3
tu id4
tu id4
origid5
tu id7
tu id5
modified
tu id6
tu id6
tu id8
new
18
xmltm Author Memory
  • Namespace aware DOM differencing
  • Identify changes from the previous version
  • Unique text unit identifiers are maintained
  • Modification history
  • Text units can be loaded into a database
  • Authoring environment integration

19
xmltm Translation Memory
  • The tm namespace can be used to create XLIFF
    files
  • Automatic alignment of source and target
    languages
  • Allows for more focused translation matching
  • Perfect matching
  • Leveraged matching from document - identical text
  • Leveraged matching from database
  • Modified text unit matching
  • Linguistically enhanced fuzzy matching
  • Non translatable text unit identification

20
xmltm translation
Translated Document
XLIFF Document
Source Document
trans-unit id1
tu id1
tu id1
tu id2
trans-unit id2
tu id2
tu id3
tu id3
trans-unit id3
tu id4
trans-unit id4
tu id4
trans-unit id5
tu id5
tu id5
tu id6
trans-unit id6
tu id6
21
xmltm translated document
translated document view
translated tm namespace view
doc
tm
title
te
tekst
tu
tekst
section
section
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
para
tekst
te
zdanie
zdanie
tu
tu
22
xmltm perfect alignment
Source Document
Translated Document
Perfect alignment
tu id1
tu id1
tu id2
tu id2
tu id3
tu id3
tu id4
tu id4
tu id5
tu id5
tu id6
tu id6
23
xmltm matching
Perfect Matching
Updated Source Document
Matched Target Document
tu id1
tu id1
requires no translation
tu id2
tu id2
non translatable
non trans
tu id3
tu id3
tu id4
tu id4
requires translation
tu id7
tu id7
fuzzy match
tu id6
tu id6
requires proofing
doc leveraged match
tu id8
newsame
tu id8
requires proofing
tu id9
tu id9
DB leveraged match
DB
24
Traditional Translation Scenario
Publishing
Translation
Extracted text
source text
tm process
Prepared text
Translated text
Translate
QA
25
xmltm Translation Scenario
Publishing
leveraged matching
xml source text
Extracted text
Prepared text
tm process
Automatic Process
Web service/ interface
Translator
Translate
Web
QA
xml target text
Automatic Process
26
xmltm benefits
  • Enterprise level scalability
  • Totally integrated within the XML framework
  • Source text is automatically extracted and
    matched
  • Word counts are controlled by the customer
  • Text can be presented for translation via the web
  • Online composition
  • The most up to date translation is held by the
    customer
  • Data is merged automatically at end of
    translation cycle
  • All memory operations are totally automated
  • Can be used transparently for relay translations
  • Much cheaper to run
  • More accurate better matching

27
xmltm
  • Fully specified XML based standard
  • http//www.xml-intl.com/docs/specification/
  • xml-tm.html
  • Maintained by xml-intl.com
  • http//www.xml-intl.com/dtd/tm.dtd
  • http//www.xml-intl.com/dtd/tm.xsd
  • Detailed article on www.xml.com
  • Offered for consideration as a Lisa standard

28
  • Any questions?
Write a Comment
User Comments (0)
About PowerShow.com