An Introduction to W3C

1 / 30

About This Presentation

Title:

An Introduction to W3C

Description:

Codes for Representation of Names of Languages ... CDAC centres under the WII project are to provide feedback on various RFCs ... – PowerPoint PPT presentation

Number of Views:198

Avg rating:3.0/5.0

Slides: 31

Provided by: TDIL

more less

Transcript and Presenter's Notes

Title: An Introduction to W3C

1
Web Internationalization InitiativeManoj
JainDepartment of Information TechnologyMinistry
of Communication and ITGovernment of
IndiaAugust 3, 2006
2
W3C DIT
Department of Information Technology became a
member of World Wide Web Consortium (W3C) to
provide adequate representation of Indian
languages/ scripts in the various Web Technology
Standards being evolved by W3C Consortium.
3
The project Web Internationalization Initiative
With the above mentioned objective DIT initiated
a project web Internationalization Initiative
for Indian languages in which C-DAC regional
units (Pune, Noida, Kolkata and Trivandrum) and
Industry Consortium for Language Technologies
(CoILTech-MAIT) are participating in various
activity working groups for evolving
specifications, guidelines, test suites,
developing translations and interoperable
technologies for the cluster of assigned
languages and organize sensitization workshops in
the region and promote participation of local
industries.
4

Indian Languages/ Scripts
There are 22 constitutionally recognized
languages in India. Apart from these 22
languages, many more dialects are also spoken in
various regions of the country.
These 22 languages are using 12 different
scripts. Some languages are written using one
script e.g. Hindi, Sanskrit, Marathi, Konkani,
Sindhi, Maithili, Nepali Dogri languages use
Devanagari script.
Some languages are writen in more than one script
such as Urdu, Sindhi, Manipuri and Santhali.

5
WII ProjectImplementing agencies their
assigned languages
6
Web Internationalization Initiative

These centres to participate in various W3C
activities.
The present focus of the project is to
participate in the Internationalization/
Localization related activities.
Encoding issues with respect to these languages
are being addressed in Unicode forum.

7
Web Internationalization Initiative

CDAC, Pune is working on the Tag Set.
CDAC, Noida has initiated a web based Discussion
Board to build consensus among the experts.
CDAC Kolkata has proposed a three tier Linguistic
markup. This will help in Translation of tags.
This is under discussion at W3C forum.
CDAC Trivandrum is participating in Device
Independence and XML related activities.

8
.Web Internationalization Initiative

MAIT-COILTech has been assigned the
responsibility to interact with the Indian IT
industry to get feedback on various issues and
sample implementations.
It also includes interaction with various
browsers and other web tool manufacturers to
ensure adequate support of Indian Languages in
these tools applications.

9
Localization Standards

Encoding Standards
Input Standards/ Keyboard Managers
Fonts Rendering
Locale Data
Database storage Retrieval

10
Internationalization/ Localization Some
important issues

Content language It is very important to declare
language in the content so that it can easily be
searched/ rendered/ displayed.
Presentation of the content Presentation of the
content should be in such a way that it should
reflect the cultural and traditional values of
that region.
Images Animation Examples in the Content Uses
of the regional images, animation and example
really makes the content viewer/ user friendly.
Internationalized product should be able to
handle this aspect.

11
... Internationalization/ Localization Some
important issues

Forms Databases and scripts that receive data
from FORMs on pages in multiple languages must
also be able to support the characters for all
those languages simultaneously.
This is very much relevant to the e-Gov
applications being developed for Indian
languages.

WII Project Tasks undertaken
Character Encoding Issues
Locale Specific Data
Text Formatting Issues
Font Rendering Issues
Indian Language Tag Set
Inputs for Mobile Web Initiative
RFC-3066 for Identification of Languages
Feedback on RFC-3490 (Internationalizing Domain
Names in Applications (IDNA))
RFC-3491, RFC-3492 RFC-3987 (PunyCode,
Stringprep Profile and Handling path for
Internationalized Domain Names (IDN))
Reference Implementations of the draft standard
Speech Synthesis Grammar

13
General Formatting Issues

Absolute/relative positioning, Layering, and
Transparency
Copyfitting
Cropping and Scaling of Images
Hyphenation
Non-rectangular Areas

14
Text Formatting Indic specific issues

Alignment of scripts and baseline shifts
Support for automatic alignment of text from
multiple scripts with different alignment rules.
Ability to handle sub-script and super-scripts.
Justification/Word and Letter Spacing
Justification/spacing policy controls.
Sorting/Collating/Data processing
Support for sorting and collating data (for
example in index entries, but more generally
wherever it is required for proper presentation).
Support for other sorts of data-processing
functions may be required as well.

15
...Text Formatting Indic specific issues

Fonts
Indic languages are script-based languages, some
of other issues with formatting of a document
with these languages are
Prefix, suffix, and stand-alone glyph variants
No hyphenation (?)
Justification (how to accomplished through the
stretching of letters or syallables).
Vowel relocation and/or resequencing

16
ISO 639.1 ISO 639.2Codes for Representation of
Names of Languages

ISO 639.1 ISO 639.2 are Two or Three letter
Codes for Representation of Names of Languages.
ISO 639.1 is a two letter code
For example hi for Hindi and kn for Kannada
ISO 639.2 is a three letter code
For example mar for Marathi and san for Sanskrit
There are few more Indian languages which need to
be assigned the code such as Bodo, Apbhransh
and Bundelkhandi etc.

17
Language Tags RFC 3066bis

Language Tags are used to help identify languages
whether spoken, written, signed or otherwise
signaled for the purpose of communication.
Applications, protocols or specifications that
use language tags are often faced with the
problem of identifying sets of content that share
certain language attributes.
A Language Tag consists of a Primary Language
subtag and a series of subsequent subtags, each
of which refines or narrows the range of language
identified by the overall tag.

18
Internationalized Tag Set

This is a set of elements and attributes, these
can be used with Document Type Definition (DTDs)
/ Schemas to support the internationalization /
localization.

19
IRI URI

IRI and URI are important activity towards
internationalization / localization of the web.
The e-infrastructure division of DIT is working
towards Internationalization of domain names.
CDAC centres under the WII project are to provide
feedback on various RFCs issued by IETF, IDNA and
IANA etc, so that these recommendations ensure
Indian languages support adequately.

20
RFC 3987Internationalized Resource Identifiers

A Uniform Resource Identifier is a sequence of
characters chosen from a limited subset of the
repertoire of US- ASCII characters.
The RFC3987 defines a new protocol element called
Internationalized Resource Identifiers (IRI) by
extending the syntax of URIs to a much wider
repertoire of characters to cover all the written
scripts of the world.
Indian scripts are complex in nature. Study of
IRI may be done from Indian languages
perspective.

21
RFC 3491

RFC 3491 specifies processing rules that will
allow users to enter internationalized domain
names (IDNs) into applications.

22
RFC 3454

RFC 3454 specifies a framework of processing
rules for Unicode text. This RFC mainly relates
to the Internationalized Domain Names.

23
RFC3492Punycode Encoding of Unicode for IDNA

Puny code is a transfer encoding syntax designed
for use with Internationalized Domain Names in
applications. It uniquely and reversibly
transforms a Unicode string into an ASCII string.
This is important for the implementation of the
IDN in non-Latin scripts/ languages such as
Indian Languages.

24
Numeric Character References (NCRs)

Escapes such as NCRs and entities are ways of
representing any Unicode Character in Markup
using only ASCII characters.
For Example
Character a in X/HTML as XE1 or 225 or
aacute.
These are useful for clearly representing
ambiguous or invisible character and prevent
problems with syntax characters such as
ampersands and angle brackets. NCRs can be used
for unsupported characters.

25
Mobile Web Initiative

W3C Group on Mobile Web Initiative
In India, many people have started using mobile
devices to access the web.
Standard Keyboard Layout for inputting various
Indian languages content on mobile devices are
being evolved.

26
Voice Browser SSML

Voice Browser for Indian Languages.
Speech Synthesis Markup Language to ensure Indian
languages representation.

27
Reference Implementations of the draft standard

The project Web Internationalization
Initiative, envisages implementation of the
draft W3C standards for Indian languages/
scripts.

28
Others issues

Display Font Rendering Issues
Keyboard Issues
Transliteration Issues

???????.........
Thank you

30
Upcoming event.

Bangalore being the major IT Hub in India a
workshop on Internationalization/ Localization is
also planned by during August 24-25, 2006 in
Bangalore. More details at www.mait.com

Write a Comment

User Comments (0)