Title: xml:tm
1xmltm
- XML Text Memory
- Using XML technology to reduce the cost of
translating XML documents
2Computational Linguistic Methodologies
- Machine Translation
- Translation Memory
- Hybrid Linguistic Inferencing Engines
- Terminology
3Automating Translation
- Machine translation
- 40 year history
- Rigorous control of grammar and terminology can
produce very good results - Enormous amount of work left to achieve free
format translation.
4Translation Memory
- Align source and target text
- Look up new text against memory
- Relatively primitive technology
- No advance over past 30 years
- Need for proofing
- Proprietary translation memory formats
5Translating XML Documents
- XML inherently easier to translate
- Separation of form and content
- Support for Unicode and other international
encoding formats. - Allows multiple output formats - PDF, XHTML, WAP
6XML Translation Standards
- LISA - Localization Industry Standards
Association http//www.lisa.org - OASIS - Organization for the Advancement of
Structured Information Standards
http//www.oasis-open.org - W3C - World Wide Web Consortium
http//www.w3c.org - OLIF Consortium http//www.olif.net
7LISA Standards
- TMX - Translation Memory Exchange format
http//www.lisa.org/tmx - TBX - Termbase Exchange format
http//www.lisa.org/tbx - SRX - Segmentation Rules Exchange format
http//www.lisa.org/srx - GMX - GILT Metrics Exchange format
http//www.lisa.org/gmx
8OASIS L10N Standards
- XLIFF - XML Localization Interchange File
Format http//www.oasis-open.org/committees/tc_ho
me.php?wg_abbrevxliff - TransWS - Translation Web Services
http//www.oasis-open.org/committees/tc_home.php?w
g_abbrevtrans-ws - DITA Darwin Information Technology Architecture
http//www.oasis-open.org/committees/tc_home.php?w
g_abbrevdita
9W3C and OLIF
- W3C ITS
- http//www.w3.org/International/
- http//www.w3.org/International/its
- OLIF - Open Lexicon Interchange Format
http//www.olif.net
10XML namespace
- Major feature of XML
- Allows the mapping of different ontological
entities onto the same representation - Allows different ways to look at the same data
- Namespaces can be made transparent
11xmltm
- XML based text memory
- Revolutionary approach to translating XML
documents - First significant advance in translation memory
technology - Uses XML namespace to transparently embed
contextual information
12 xmltm namespace
- Text Memory namespace
- Can be mapped onto any XML document
- Vertical view of document in terms of text
segments - Can be totally transparent
13xmltm namespace
Example of the use of tm namespace in an XML
document
ltdocument xmlnstm"urnxml-Intl-tm" gt lttmtmgt
ltsectiongt ltparagt lttmtegt
lttmtugt Namespace is very flexible.
lt/tmtugt lttmtugt It is very
easy to use. lt/tmtugt lt/tmtegt
lt/paragt
14xmltm namespace
original document view
tm namespace view
doc
tm
title
te
text
tu
text
section
section
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
te
sentence
sentence
tu
tu
para
text
para
text
te
sentence
sentence
tu
tu
15xmltm namespace
original document view
text
ltparagt
Namespace is very simple. It is easy to use.
lt/paragt
tm namespace view
sentence
sentence
tu
te
tu
ltparagt
lttmte ide1gt
lttmtu idu1.1gt
lt/tmtugt
Namespace is very simple.
lttmtu idu1.2gt
lt/tmtugt
It is easy to use.
lt/tmtegt
lt/paragt
16 xmltm Text Memory
- Author memory
- Maintain memory of source text
- Authoring statistics
- Authoring tool input
- Translation memory
- Automatic alignment
- Maintain perfect link of source and target text
- Reduce translation costs
17 xmltm DOM differencing
Source Document
Updated Source Document
tu id1
tu id1
tu id2
deleted
tu id3
tu id3
tu id4
tu id4
origid5
tu id7
tu id5
modified
tu id6
tu id6
tu id8
new
18 xmltm Author Memory
- Namespace aware DOM differencing
- Identify changes from the previous version
- Unique text unit identifiers are maintained
- Modification history
- Text units can be loaded into a database
- Authoring environment integration
19 xmltm Translation Memory
- The tm namespace can be used to create XLIFF
files - Automatic alignment of source and target
languages - Allows for more focused translation matching
- Perfect matching
- Leveraged matching from document - identical text
- Leveraged matching from database
- Modified text unit matching
- Linguistically enhanced fuzzy matching
- Non translatable text unit identification
20 xmltm translation
Translated Document
XLIFF Document
Source Document
trans-unit id1
tu id1
tu id1
tu id2
trans-unit id2
tu id2
tu id3
tu id3
trans-unit id3
tu id4
trans-unit id4
tu id4
trans-unit id5
tu id5
tu id5
tu id6
trans-unit id6
tu id6
21 xmltm translated document
translated document view
translated tm namespace view
doc
tm
title
te
tekst
tu
tekst
section
section
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
te
zdanie
zdanie
tu
tu
para
tekst
para
tekst
te
zdanie
zdanie
tu
tu
22 xmltm perfect alignment
Source Document
Translated Document
Perfect alignment
tu id1
tu id1
tu id2
tu id2
tu id3
tu id3
tu id4
tu id4
tu id5
tu id5
tu id6
tu id6
23 xmltm matching
Perfect Matching
Updated Source Document
Matched Target Document
tu id1
tu id1
requires no translation
tu id2
tu id2
non translatable
non trans
tu id3
tu id3
tu id4
tu id4
requires translation
tu id7
tu id7
fuzzy match
tu id6
tu id6
requires proofing
doc leveraged match
tu id8
newsame
tu id8
requires proofing
tu id9
tu id9
DB leveraged match
DB
24Traditional Translation Scenario
Publishing
Translation
Extracted text
source text
tm process
Prepared text
Translated text
Translate
QA
25xmltm Translation Scenario
Publishing
leveraged matching
xml source text
Extracted text
Prepared text
tm process
Automatic Process
Web service/ interface
Translator
Translate
Web
QA
xml target text
Automatic Process
26xmltm benefits
- Enterprise level scalability
- Totally integrated within the XML framework
- Source text is automatically extracted and
matched - Word counts are controlled by the customer
- Text can be presented for translation via the web
- Online composition
- The most up to date translation is held by the
customer - Data is merged automatically at end of
translation cycle - All memory operations are totally automated
- Can be used transparently for relay translations
- Much cheaper to run
- More accurate better matching
27xmltm
- Fully specified XML based standard
- http//www.xml-intl.com/docs/specification/
- xml-tm.html
- Maintained by xml-intl.com
- http//www.xml-intl.com/dtd/tm.dtd
- http//www.xml-intl.com/dtd/tm.xsd
- Detailed article on www.xml.com
- Offered for consideration as a Lisa standard
28