Title: CENTRAL INSTITUTE OF INDIAN LANGUAGES
11st International Conference
In association with
CIIL-Mysore, IIT-Mumbai, IIIT-Hyderabad
2Words unite people. Words can divide nations
they indulge in war of words
- Word-smiths fashion texts
- Word-mongers talk nineteen to the dozen
- Word-lords dont tell you that they
double-speak - Word-poets open the inner abyss of lanes
bye-lanes of meaning - And so do WordNets
Which is why we are all here!
3Welcome to 1st Global WordNet Conference
MY ADDRESS HAS TWO PARTS
- First, I shall tell you a little about what the
Indian linguistic scene is like, and what we at
CIIL have been doing
- Then, we will offer our suggestions on what we in
India could do in WordNet
4CENTRAL INSTITUTE OF INDIAN LANGUAGES
- maVr mfm gñWmZejm dmJ, maV gaHma
- Initiatives in
- LANGUAGE TECHNOLOGY
5CIIL in the first three decades
- Equipping
- Language
- teachers and
- Analysts
- technologically
61. An Apex Institution under Languages Division,
MHRD
- In July 2001, 32 years completed
- This 287-people institution works for development
of Indian languages. - CIIL has five Centers with Research Groups (16)
and Service Groups (6). - 7 Regional Language Centers are at Bhubaneswar,
Guwahati, Lucknow, Mysore, Patiala, Pune,
Solan.
72. Four Main Objectives
- 1. Develops languages by creating content,
corpus, techniques and technologies. - 2. Protects Documents Minority Tribal
languages - 3. Creates linguistic harmony by teaching 15
Indian tongues to non-native learners. - 4. Above all, advices both Central and State
governments on matters related to language.
83. Functionality and Multi-disciplinarity
- Although the mainstay are Indian Languages
Linguistics, the focus of all projects and
programmes is on developing materials products
in print, audio, video and computational.
- In addition, there is enough interest in Comp.
Lit, Education, Language Technology NLP,
Folklore, Geography, Statistics
Psychology,Sociology Translation
94. Coverage of CIIL - sizable
- Archived 118 lgs data
- Creating Voice Corpora
- Studied 80 Tribal lgs
- 35 grammars on-line soon
- Published 490 books
- Cassette Courses in Assamese, Urdu, Bengali
Kashmiri Marathi - Radio courses in Hindi through Kannada
105. Major Publications 490 books all produced
in-house
- 22 Grammars
- 30 Intensive Courses
- 24 2nd Lg Textbooks
- 5 Common Vocab.
- 18 Dictionaries
- 49 Apni Boli (KVS)
- 15 Pictorial Glossaries
- 16 Literacy Books
- 12 Folklore
- 9 Bibliographies
12 Rhymes/Lg Games 16 Proceedings
116. The Challenge before CIIL Enormous
12A truly plural world of languages
- 1,576 rationalized mother-tongues
- 1,796 other mother-tongues
- 114 languages with 10,000 speakers
- Large variation Hindi (337 m) to Maram of
Manipur with 10,144 - Large non-scheduled lgs - Bhili (6 m) and Santali
(5 m) - 146 radio lgs/69 school lgs /35 lg dailies.
137. Programs - Modes of Delivery
- 10 months L2 teaching 8000 teachers trained
- Distance Courses in Tamil/Telugu/Bengali/Urdu
- On-line Programs in 15 Indian languages
- Kannada for officials in Karnataka
- Radio courses with AIRs collaboration
- 3-months Courses in Communication
- Orientation for Mother-tongue teachers
- Refresher Courses in Linguistics
- NLP Training modules
148. Language Technology Further Goals
- Enlargement of 3-million word Corpora
- 100 m word corpora for Hindi-Urdu
- Multilingual multidirectional E- Dictionaries
- On-line Administrative Glossaries
- Lexical databases for MT Programs
- Tagging Corpus Tools
- E-Zines and E-Journals
- Language Information Services
- Anukriti Web-based Translation services
159 Indian Lgs IT at CIIL
- 132-node LAN set up
- V-SAT through STPI
- Brousing centre
- Has 2400 E-Journals 350 paper journals.
- Collaborating with Schoolnet for electronic
materials - New generation Lg Labs
- Focus Visual Phonetics
1610. LIS-India Website
- Type Language Name
- Type Area Name
- Home or http//www.ciil.org/
- General Information
- Language/ Area Profile
- Geolinguistic Sociolinguistic Cultural
Literary - Language/Area History
- Genealogical Archaeological Cultural
Textual - Language Vitality
- Attitudinal Utilitarian Socio-political
Referential - Grammatical Information
- Phonetic Graphemic Phonological
Morphological Lexical - Syntactic Semantic Stylistic
- Biblio search
1711. Anukriti A Translation with NBT/SA
- Electronic lexicon
- Corpus tools
- Parallel corpora
- Cultural Glossaries
- Thesauri
- Word finders
- WordNets
- WEB-BASED SERVICE SITE called ANUKRUTI.
- To be maintained with NBT/Sahitya Akademi
- E-journals
- Technological Tools
1812. Bhasha Bharati Project
To be set up in collaboration with
- Sahitya Akademi
- Sangeet Natak Academy
- All India Radio
- Doordarshan
- National Library
- National Archive
- National Book Trust
- Major TV Channels
- Films Division
- Major Newspaper houses
- Numerous Foundations
- Individual writers
- Heirs of writers
- Personal libraries
- Little magazines
- This rich manuscriptorium will display plural
literary and linguistic landscape of India.
1913. Doctoral Programs under planning
- Already available through 22 Universities
- Linguistics Psychology
- Now being planned in
- NLP
- Folklore/Communication
- Translation
- Indian Gram.Tradition
2014. Future Programs
- Dip in Experimental Phonetics
- Masters by Research in Field Linguistics
- Courses in Statistical Linguistics
- Diploma in Translation Studies
- Dip in Folklore/Comp. Lit. Semiotics
- Internship in Linguistic Geography
- Internship in NLP Corpus Linguistics
21WHAT COULD WE DO TO CREATE AN
22India has already had a strong lexicographical
tradition
- Working on WordNet, therefore, should come
naturally to us. - Efforts have already begun as we see in Hindi,
Tamil, Oriya and a few other languages. - There does not seem to be any academic
coordination, however.
- Early 20th century Indian linguistics was
dominated by studies on sound-system and
etymologies - Mid-20th C focussed on word-formation patterns
- Late 20th C emphasized on syntax
23We havent so far worked seriously on Lexical
Semantics
- While Sociolinguistics was a favourite, serious
Psycholinguistics was almost absent - Formal Syntax was highly valued, but intricacies
of Semantics were not so attractive. - Making of Dictionaries continued throughout, but
major concerted efforts in each language were
highly individualistic or had happened long ago. - While writing softwares or applying them means
money, and is hence a crowded field, Language
Technology has so far been neglected.
24So, what do we need to do now?
- Create an Indian WordNet Association
- Work coordinatedly
- Remember to focus on areal semantic features
because with so much linguistic cultural
diversity, India is ideal to test and validate
the concept of WordNet.