Title: Internationalization Localization
1Internationalization Localization Unicode
- Karunesh Arora
- Vijay Gugnani
- C-DAC Noida
2Everyone has the right... to seek, receive and
impart information and ideas through any media
regardless of frontiers -- Universal
Declaration of Human Rights
3Internationalization
- Internationalization, which is often referred
as i18n, depicts the practice of designing and
developing a application, product or document in
a way that makes it easily localizable for target
audiences that vary in culture, region, or
language.
4Why Internationalization?
- To remove barriers to local and international
access - Adaptation to local, regional, linguistic or
cultural needs. - To provide global reach
- ROI, Revenue generation
5Internationalization Vs. Localization
- Localization is the actual adaptation to meet
the language, cultural, and other requirements
for specific target audience. - While internationalization gives us the
technology and tools to target a given audience,
its the act of localization that makes it
accessible.
6What goes with localization?
- Localization is much more than translation.
- Specifically, localization refers to adaptation
to other language, which involves appropriate - Language Translation
- Locale transformation and Cultural aspects
7Language Translation
Languages and Countries
- Most languages are used in many countries, not
just those where they are dominant or official - People migrate and take languages with them
- Over enough time, most languages evolve
differently in different locations
8Scripts and Languages
Language Translation
- A script may be defined as collection of
related characters - It is common for several languages to share most,
but not all characters from a given script - Scripts are often given the same name as one of
the languages that uses them - Arabic script, but Arabic, Farsi, Urdu,
languages - Scripts are also given common name for a group of
languages - Devanagri script for Hindi, Marathi, Nepali,
Konkani etc.
9Language Translation
Some Points to consider
- Identify Translatable and Non-translatable
strings - Gender and number agreement, ordering of segments
in a sentence - e.g. Page number -gt
- e.g. Number of pages -gt
- Many languages can take at least 30 more
space Tool - ????? (HI) ?????? - customer (EN)
- Design should be compatible, or else the UI may
have to be redesigned - Narrow columns often cannot accommodate long
Target language equivalent words
10Language Translation
Some Points to consider Contd.
- Avoid ambiguous phrases
- Display options
- Options of the display -- as Noun Noun
- Show the options (all of them) as Verb Noun
- Proverbs and metaphors may not have equivalents
in target language - Keep Web pages and paragraphs short.
- Avoid text in graphics.
- Use simple grammatical structures.
- Use everyday language.
- Provide clues.
11Language Translation
Some Points to consider Contd.
- Follow source language conventions.
- Avoid acronyms.
- Abbreviations may have to be expanded when
translated - Check spelling and grammar.
- The more compact the source writing, the longer
the Translation - Brief translators about the purpose and target
audience - All items in a menu or set of check boxes should
have the same grammatical structure
12Locale
- Set of parameters that define the users
language, country and cultural preferences
13Different aspects of locale
- Names Titles
- Calendars,
- Numeric, Date and Time formats, Addresses,
- Currencies, Paper size, Weights measures
- Input Mechanism,
- Language Selection,
- Oral Pronunciation
14Titles and Names
- In India, it is required to specify
etc.) - these titles do not necessarily translate
- Family name is not always last (In South West
part of country) - Sorting can be based on last name or first
- Salutations in letters (e.g. Dear) are different
in different locales e.g.
15Titles and Names
Source Delhi Press Prakashan
16Calendars
- The Gregorian calendar should not always be
assumed - Proper localization of some software requires the
use (at least as an option) of calendars distinct
to a culture - E.g. Vikram Samvat/ Saka / Hijri calendar in
India - Calendars of various religions where year 0 was
not 2006 years ago - Fiscal-year based calendars vary widely
- Some have 13 months (364/28) or 53 weeks
17Date formats
- Date separators depend on locale /, -, .
- am and pm are not used universally (many
cultures use 24 hour clock) - ISO standard dates are unambiguous yyyy-mm-dd
hhmmss - Non ISO date 01-03-02 means different things in
different locales. - If not using ISO, then display dates in the
locale of the user - Preferably use a long form with the month
spelled out (in the correct language)
18Formatting Numbers
- locale dependent, not the language of application
- Group separation
- Number of digits in a group
- In English and ISO it is 3 while for Indic
languages its different 1,23,456 i.e.
,,, - Group separator
- In English ,, but ISO uses space, and some
locales use . or none - Decimal separator ., ., ,
- Negative symbol -, , ()
19Currency
- Use the currency symbol of the data
- i.e. INR doesnt automatically translate to or
when the locale changes - Format depends on the users locale, not the
currency - Differences in formats
- Symbol
- Position (before or after the currency)
- Blanks separating the symbol from the data
20Currency contd
- Different ways of expressing Rs. 1000
- Rs.1000 OR Rs. 1000/- or Rs.1,000/- or Rs.
1000.00 - INR 1000
- 1000 Rupees 1000 ?????
- Strong currencies like Indian need decimal
precision (e.g. 2 digits after the decimal point
for paisa)
21Language selection
- Avoid using national flags to choose preferred
language - Multiple countries use the same language
- Display of language selection order?
- Language of displaying languages ?
- In the language itself, or with a translation in
the default language of the operating system
22Pronunciation
- Important for Speech based systems
- Higher recognition accuracy can be obtained by
tailoring voice input to regional dialects - Voice output in the wrong dialect can make an
application sound foreign - Applications supported with regional dialects
have better impact
23Culture
- Culture is a complex collection of experiences
which condition daily life - It includes
- history,
- social structure,
- geographical effects,
- religion,
- traditional customs and everyday usage.
24Cultural issues
- Icons, symbols and images
- Colors, myths, beliefs and feelings
- Humour
- Geographical environmental effects
- Customs traditions
- Social Security Numbers
25Icons Symbols
- Icons that are a play on words do not translate
- e.g.
- A dust bin for dumping files
- A rocket for launching an application
- A scissors for cutting in edit operation
- B, I, U
- Some concepts have been found extremely hard to
represent as an icon - E.g. Sorting (A-gtZ is not universal)
- Images of people or body parts such as hands
- Considered inappropriate in some cultures
- What skin color do you use?
- People Images need to be localized for each
country
26Colors Humour
- The color white may represent purity and green
prosperity in the Indian context, but it may not
be the same in another culture. - Humour generally does not get translated
- People are sensitive to different things in
different cultures - Jokes/cartoons can be offensive
27Customs Traditions
- In the Indian culture, people show respect to
their elders and renowned personalities by
addressing them in plural. - e.g. Dr. Manmohan Singh is the prime minister
of India. - ??. ?????? ???? ???? ?? ???????????? ????
- Similarly, in social relationships, there are
several words to address a relation - e.g. for uncle - ????, ???, ????
28Unicode?
Unicode provides a unique number for every
character, no matter what the platform,no
matter what the program,no matter what the
language.
Source http//unicode.org
29Universal Character Encoding
- Unique number for every character
30Unifies all Languages
- 96 thousand characters, so far
- All characters accessible at the same time, in
the same document - ?, ?, ?,
31Wide Spread Support
- Developed supported by industry leaders
- Apple, HP, IBM, JustSystem, Microsoft, Oracle,
SAP, Sun, Sybase, Unisys, - Supported in standards
- XML, HTML, Java, ECMAScript (JavaScript), LDAP,
CORBA 3.0, WML, Perl, etc. - Implemented in
- All modern operating systems, browsers, and other
products
32IDN
33Information about Unicode
- www.unicode.org
- Online Standard
- Technical Reports
- FAQs
- General Information
- Discussion Forums, Conferences
34Resources Availability
- System APIs
- Windows, Java, Unix, Oracle, DB2, Sybase, Mac,
Linux, - Languages
- Java, JavaScript, C, Perl 5.6.0, C, C, SQL,
- Cross-platform libraries
- ICU, Rosette,
35Indic Support in Unicode
- ISCII the basis for characters and allocation
- DIT is member of Consortium
- Reports have been submitted on missing
characters, clarifications or corrections of usage
36ISCII Similarities
- Within script, layout and contents nearly
identical - Independent dependent vowels
- Halant model for representing conjuncts
- conjuncts / half-forms not directly encoded
- represented by sequences instead
- Phonetic sequence order in syllables
37ISCII Differences
- Unicode is stateless
- No shifting to get different scripts
- Each character has a unique number
- Unicode is uniform
- No extension bytes necessary
- All characters coded in the same space
38Advantages
- Accessible Information across the globe
- Seamless multilingual documents
- Opens up software export market, beyond English
- Connects India to the world
39The Future
- The world is moving rapidly to Unicode
- Unicode makes India open to the world
- The world comes to you, and
- You go to the world
40Multiple Forms
- UTF-8 maximal compatibility with 8-bit systems
- UTF-16 good storage, interoperability with
Windows/Java - UTF-32 simplest processing
- Fast, lossless conversion
41W3C Internationalization Activity
42Some Issues under discussion in IL
- Presentation / Styling issues
- Styling of first character
- If some styling feature is to be applied to the
starting character, then whether it will be
applied to a single character, conjunct
character, a syllable or a Grapheme cluster. - e.g.
?????? (Position) ???????? (Departure)
???? (Vowel) ??? (Dictionary)
????? (Hindi)
?????? (Hindi)
????????? (Regional)
43Some Issues under discussion in IL
- Presentation / Styling issues
- Styling of first character
-
44Some Issues under discussion in IL
- Presentation / Styling issues
- In Cursive Text
- like Arabic and Urdu
- the styling is applied
- to whole word
-
Saabiq -gt Former
Urdu
Source Rashtriya Sahara
45Some Issues under discussion in IL
- Presentation / Styling issues
- Vertical arrangement of characters
- If some string is written in vertical mode,
then writing each character on a new line may not
be suitable
http//www.w3.org/International/notes/firstletter.
html
46Some Issues under discussion in IL
- Presentation / Styling issues
- Horizontal spacing
- e.g.
47Some Issues under discussion in IL
- Presentation / Styling issues
- Bullets and numbers
- Number schemes to be supported in Indian
languages also.
48Some Issues under discussion in IL
- Presentation / Styling issues
- Collation
- A means to search and order data in a way that
makes sense in their particular culture - Myths - One collation is good enough
- Unicode enabled sorting is already
covered -
49Some Issues in Indian Languages
- Presentation / Styling issues
-
50Some Issues under discussion in IL
- Presentation issues
- Underlining of the characters
- ???? ?????? ??? ?? ??????
51Some Issues
- Searching issues
- Problem in searching in languages sharing same
script and some words being same but semantically
different
52Issues on presentation on other devices
- Addressing Input mechanism, predictive input for
vernacular languages - Handling display issues in Hand held devices with
smaller screen, in cases of translation - Standardizing encoding issues in communication
for taking care of cost of bandwidth (ISCII /
Unicode / Compressed Unicode), connectivity and
on-the-fly conversion of encodings
53References and acknowledgements
- http//www.w3.org/international
- Articles by Richard Ishida, Felix Sasaki, W3C
- http//macchiato.com/slides/UnicodeAndIndia.ppt ,
Presentation by Mark Davis - www.site.uottawa.ca/ftppub/courses/Winter/csi5122/
coursenotes/5122Internationalization.ppt
54