Title: Localization Enablers
1Localization Enablers
Swaran Lata, Director slata_at_mit.gov.in
Technology Development for Indian Languages
(TDIL) Programme Department of Information
Technology, Ministry of Communication
Information Technology Govt of India
Elitex-2008, January 17, 2008
2Globalization of IT
3GLOBALIZATION
4- Focus of National Knowledge Commission of India
- The National Knowledge Commission focuses on
the objective of - transforming India into a knowledge society.
- It has concentrated five focus areas of the
knowledge paradigm - Access
- Creation
- Concepts
- Application
- Services
- Information Technology applications , services ,
tools and resources based on natural language
processing techniques would be key enabler for
the above five knowledge paradigm.
5Local Area Portals for Gloabalizing local
knowledge digitize earlier existing communities
6Internationalized application to be Localized
Localized Application
7Quality Assurance
- Testing methodologies
- Metrics for Linguistic Testing
- Certification by Government for
- linguistic compliance
Language Technologies
Training
- Machine Translation
- Optical Character Recognition
- Speech Technologies
- Cross Lingual Information Retrieval
- Certified Localization professionals
- PG Specialization in Localization
- PhD Programmes
Locale Data
- Presentation of dates, times, numbers, lists, and
other values. - Collation and sorting
- Alternate calendars, which may include holidays,
work rules, weekday/weekend. - Currency
- Tax or regulatory regime
Standards
- Encoding Standards
- Multimodal input device standards
- Fonts Rendering Engines
- Transliteration Translation
Education Outreach
- Guidelines
- Best Practices
- Case Studies
- Consultancy
- Showcasing of Tools
- Technologies
Localization Tools
- Project Management
- Translation Memory
- Translation Tools
- Natural language for text processing parsing,
spell checking, and grammar checking etc - Automatic Testing Tools
Shipping issues
Linguistic resources
- Minimizing Time lag
- Benchmarking w.r.t. English version
- Political sensitivity
- Pricing issues
- Parallel Corpora
- Speech Corpora
- Lexical resources
- Ontologies
- Dictionaries
- Thesaurus
- Reference Terminologies
The Tree of Localization Complexities
8Guidelines for enhancing the Localizability
- Design and develop information and applications
in a way that meets the needs of the
international user - Design that allows for easy localization at the
point of need - Means to reduce the cost and length of
localization - Checklists grouped by task, and supported by
backup examples and explanations for example
Browser feature applicability charts - What browsers and browser versions supported
which i18n features (eg. ruby, bidi, utf-8, Lang
attribute, lang, white-space handling,
writing-modelr-tb, etc, etc.) This would help us
implement pages that used the most up-to-date
internationalization features appropriate to our
audience without the pain of trial and error (or
perhaps more likely erring too far on the side of
caution). - Use of constructs in existing markup languages
(eg. (x)HTML) to either enable interoperability
in a globalised system or improve the
localizability of data for example avoid the use
of deprecated tags of HTML
9- I18n considerations applicable to document and ui
design also includes such things as navigation,
screen space and layout, implementing graphics,
creating source text, designing interoperable
systems, choosing and implementing fonts and
complex script rendering, multimedia design,
handling data format conventions, supplying data
for translation/localization etc - For example Standard Icons
- Allowing for regional variation for point to a
list of (or link to) country/language site
selections - Text based approaches can be problematic in two
ways - they may not be understood - that's often why you
are going to the selection list (eg. how would
the average American find the 'global sites' link
on a page in Arabic or Japanese - not made up
examples!) - they may make the user feel like his/her needs
are secondary. - Separation of localizable data from style sheets
and templates for example use of CSS for
separating presentation aspects from the content
while designing websites. - Guidelines focussing on content development, DTD
design and stylesheet development relating to
implementation in XHTML, XML, XSL, XSLT, CSS,
XForms, SVG, and other similar specifications.
10- Guidelines for developing internationalized DTDs
such as - white space handling, use of markup vs. Unicode
control characters, use of alternative content or
entities for different markets, provision of meta
data to describe document structure for
localization tools, provision of information
about available space and other aspects of
content affected by localization, the ability to
tag terminology and semantics within content - Language Tags
- rfc3066 for 'language tagging' in XML and HTML
has inherent difficulties in distinguishing
between language and dialect, as well as
historical variations. To devise a way of
expanding the language tag concept to adequately
cover the locale and script oriented needs of the
localization community, incorporation of markup
to support international script features (such as
ruby and Arabic directionality) - Internationalization tag set
- Develop a set of tags that others could use for
creating DTDs - In the form of a namespace for inclusion in a
schema, or simply a partial DTD and set of
recommendations. - Methodology for identifying non-translatable
content for automatic identification by the
localization tool
11Internationalized data formats
- Time and date formats are just two of many ways
in which people represent the same or similar
information differently. Other examples include
numbers, currencies, temperatures, weights,
dimensions, addresses, telephone numbers,
personal names, paper sizes, etc. - It would be great if there was a way of capturing
this information in a non-culturally-specific way
and rendering and (more difficult) recognising it
automatically in a culture-specific format, that
could be used by people implementing web based
communication - be it web page forms or exchange
of information between machines. - The work involved in this is not trivial, but it
is desperately needed. Whether the W3C should
attempt to produce this or work with others to
achieve it is for discussion, but either way I
believe it would be very useful.
12??????? Thank You