An Introduction to W3C

1 / 30
About This Presentation
Title:

An Introduction to W3C

Description:

Codes for Representation of Names of Languages ... CDAC centres under the WII project are to provide feedback on various RFCs ... – PowerPoint PPT presentation

Number of Views:198
Avg rating:3.0/5.0
Slides: 31
Provided by: TDIL

less

Transcript and Presenter's Notes

Title: An Introduction to W3C


1
Web Internationalization InitiativeManoj
JainDepartment of Information TechnologyMinistry
of Communication and ITGovernment of
IndiaAugust 3, 2006
2
W3C DIT
Department of Information Technology became a
member of World Wide Web Consortium (W3C) to
provide adequate representation of Indian
languages/ scripts in the various Web Technology
Standards being evolved by W3C Consortium.
3
The project Web Internationalization Initiative
With the above mentioned objective DIT initiated
a project web Internationalization Initiative
for Indian languages in which C-DAC regional
units (Pune, Noida, Kolkata and Trivandrum) and
Industry Consortium for Language Technologies
(CoILTech-MAIT) are participating in various
activity working groups for evolving
specifications, guidelines, test suites,
developing translations and interoperable
technologies for the cluster of assigned
languages and organize sensitization workshops in
the region and promote participation of local
industries.
4
  • Indian Languages/ Scripts
  • There are 22 constitutionally recognized
    languages in India. Apart from these 22
    languages, many more dialects are also spoken in
    various regions of the country.
  • These 22 languages are using 12 different
    scripts. Some languages are written using one
    script e.g. Hindi, Sanskrit, Marathi, Konkani,
    Sindhi, Maithili, Nepali Dogri languages use
    Devanagari script.
  • Some languages are writen in more than one script
    such as Urdu, Sindhi, Manipuri and Santhali.

5
WII ProjectImplementing agencies their
assigned languages
6
Web Internationalization Initiative
  • These centres to participate in various W3C
    activities.
  • The present focus of the project is to
    participate in the Internationalization/
    Localization related activities.
  • Encoding issues with respect to these languages
    are being addressed in Unicode forum.

7
Web Internationalization Initiative
  • CDAC, Pune is working on the Tag Set.
  • CDAC, Noida has initiated a web based Discussion
    Board to build consensus among the experts.
  • CDAC Kolkata has proposed a three tier Linguistic
    markup. This will help in Translation of tags.
    This is under discussion at W3C forum.
  • CDAC Trivandrum is participating in Device
    Independence and XML related activities.

8
.Web Internationalization Initiative
  • MAIT-COILTech has been assigned the
    responsibility to interact with the Indian IT
    industry to get feedback on various issues and
    sample implementations.
  • It also includes interaction with various
    browsers and other web tool manufacturers to
    ensure adequate support of Indian Languages in
    these tools applications.

9
Localization Standards
  • Encoding Standards
  • Input Standards/ Keyboard Managers
  • Fonts Rendering
  • Locale Data
  • Database storage Retrieval

10
Internationalization/ Localization Some
important issues
  • Content language It is very important to declare
    language in the content so that it can easily be
    searched/ rendered/ displayed.
  • Presentation of the content Presentation of the
    content should be in such a way that it should
    reflect the cultural and traditional values of
    that region.
  • Images Animation Examples in the Content Uses
    of the regional images, animation and example
    really makes the content viewer/ user friendly.
    Internationalized product should be able to
    handle this aspect.

11
... Internationalization/ Localization Some
important issues
  • Forms Databases and scripts that receive data
    from FORMs on pages in multiple languages must
    also be able to support the characters for all
    those languages simultaneously.
  • This is very much relevant to the e-Gov
    applications being developed for Indian
    languages.

12
  • WII Project Tasks undertaken
  • Character Encoding Issues
  • Locale Specific Data
  • Text Formatting Issues
  • Font Rendering Issues
  • Indian Language Tag Set
  • Inputs for Mobile Web Initiative
  • RFC-3066 for Identification of Languages
  • Feedback on RFC-3490 (Internationalizing Domain
    Names in Applications (IDNA))
  • RFC-3491, RFC-3492 RFC-3987 (PunyCode,
    Stringprep Profile and Handling path for
    Internationalized Domain Names (IDN))
  • Reference Implementations of the draft standard
  • Speech Synthesis Grammar

13
General Formatting Issues
  • Absolute/relative positioning, Layering, and
    Transparency
  • Copyfitting
  • Cropping and Scaling of Images
  • Hyphenation
  • Non-rectangular Areas

14
Text Formatting Indic specific issues
  • Alignment of scripts and baseline shifts
  • Support for automatic alignment of text from
    multiple scripts with different alignment rules.
    Ability to handle sub-script and super-scripts.
  • Justification/Word and Letter Spacing
  • Justification/spacing policy controls.
  • Sorting/Collating/Data processing
  • Support for sorting and collating data (for
    example in index entries, but more generally
    wherever it is required for proper presentation).
    Support for other sorts of data-processing
    functions may be required as well.

15
...Text Formatting Indic specific issues
  • Fonts
  • Indic languages are script-based languages, some
    of other issues with formatting of a document
    with these languages are
  • Prefix, suffix, and stand-alone glyph variants
  • No hyphenation (?)
  • Justification (how to accomplished through the
    stretching of letters or syallables).
  • Vowel relocation and/or resequencing

16
ISO 639.1 ISO 639.2Codes for Representation of
Names of Languages
  • ISO 639.1 ISO 639.2 are Two or Three letter
    Codes for Representation of Names of Languages.
  • ISO 639.1 is a two letter code
  • For example hi for Hindi and kn for Kannada
  • ISO 639.2 is a three letter code
  • For example mar for Marathi and san for Sanskrit
  • There are few more Indian languages which need to
    be assigned the code such as Bodo, Apbhransh
    and Bundelkhandi etc.

17
Language Tags RFC 3066bis
  • Language Tags are used to help identify languages
    whether spoken, written, signed or otherwise
    signaled for the purpose of communication.
  • Applications, protocols or specifications that
    use language tags are often faced with the
    problem of identifying sets of content that share
    certain language attributes.
  • A Language Tag consists of a Primary Language
    subtag and a series of subsequent subtags, each
    of which refines or narrows the range of language
    identified by the overall tag.

18
Internationalized Tag Set
  • This is a set of elements and attributes, these
    can be used with Document Type Definition (DTDs)
    / Schemas to support the internationalization /
    localization.

19
IRI URI
  • IRI and URI are important activity towards
    internationalization / localization of the web.
    The e-infrastructure division of DIT is working
    towards Internationalization of domain names.
  • CDAC centres under the WII project are to provide
    feedback on various RFCs issued by IETF, IDNA and
    IANA etc, so that these recommendations ensure
    Indian languages support adequately.

20
RFC 3987Internationalized Resource Identifiers
  • A Uniform Resource Identifier is a sequence of
    characters chosen from a limited subset of the
    repertoire of US- ASCII characters.
  • The RFC3987 defines a new protocol element called
    Internationalized Resource Identifiers (IRI) by
    extending the syntax of URIs to a much wider
    repertoire of characters to cover all the written
    scripts of the world.
  • Indian scripts are complex in nature. Study of
    IRI may be done from Indian languages
    perspective.

21
RFC 3491
  • RFC 3491 specifies processing rules that will
    allow users to enter internationalized domain
    names (IDNs) into applications.

22
RFC 3454
  • RFC 3454 specifies a framework of processing
    rules for Unicode text. This RFC mainly relates
    to the Internationalized Domain Names.

23
RFC3492Punycode Encoding of Unicode for IDNA
  • Puny code is a transfer encoding syntax designed
    for use with Internationalized Domain Names in
    applications. It uniquely and reversibly
    transforms a Unicode string into an ASCII string.
  • This is important for the implementation of the
    IDN in non-Latin scripts/ languages such as
    Indian Languages.

24
Numeric Character References (NCRs)
  • Escapes such as NCRs and entities are ways of
    representing any Unicode Character in Markup
    using only ASCII characters.
  • For Example
  • Character a in X/HTML as XE1 or 225 or
    aacute.
  • These are useful for clearly representing
    ambiguous or invisible character and prevent
    problems with syntax characters such as
    ampersands and angle brackets. NCRs can be used
    for unsupported characters.

25
Mobile Web Initiative
  • W3C Group on Mobile Web Initiative
  • In India, many people have started using mobile
    devices to access the web.
  • Standard Keyboard Layout for inputting various
    Indian languages content on mobile devices are
    being evolved.

26
Voice Browser SSML
  • Voice Browser for Indian Languages.
  • Speech Synthesis Markup Language to ensure Indian
    languages representation.

27
Reference Implementations of the draft standard
  • The project Web Internationalization
    Initiative, envisages implementation of the
    draft W3C standards for Indian languages/
    scripts.

28
Others issues
  • Display Font Rendering Issues
  • Keyboard Issues
  • Transliteration Issues

29
  • ???????.........
  • Thank you

30
Upcoming event.
  • Bangalore being the major IT Hub in India a
    workshop on Internationalization/ Localization is
    also planned by during August 24-25, 2006 in
    Bangalore. More details at www.mait.com
Write a Comment
User Comments (0)