Title: Sanskrit and Natural Language Processing
1Sanskrit and Natural Language Processing
- Dr.Srinivasa Varakhedi
- Center for Advanced Studies and Research in
Shabdabodha and NLP - RASHTRIYA SANSKRIT VIDYAPEETHA
- DEEMED UNIVERSITY
- Tirupati(A.P)
2Dream of a bee..
- úÉÊjÉ MÉÊɹªÉÊiÉ ÉÊɹªÉÊiÉ ºÉÖÉÉÉiÉÉÂ
- ÉɺÉÉxÉ näù¹ªÉÊiÉ ½þʺɹªÉÊiÉ ÉRÂóEòVÉÉÒ
- ltilÉÆ ÊÉÊSÉxiɪÉÊiÉ EòÉäÉMÉiÉä Êuùäúäò
- ½þÉ ½þxiÉ ½þxiÉ xÉÊÉxÉÓ MÉVÉ VVɽþÉú
3Present situation of Sanskrit
- Sanskrit colleges are like 'zoo'!
- No Govt. support unless we are productive
- Humanities and Languages are being neglected
- How far this support will continue ?
- Great tradition of learning is being lost
- No scope for novel research
4Innovation is the key
- Sanskrit Shastras are competent enough to enter
the science world - Move out of Humanities and get merged with
science - Analogy Maths, psychology, Logic.
- We must find practical approach for these
Sanskrit Sciences.
5we have lost 80
- Meemamsa - No practical approach !
- Nyaya - No use in modern dialectics ?
- Vyakarana No application ??
- What to do ?
6Relevance of Sanskrit Shastras in Modern
Technology
- fortunately these shastras are found relevent in
todays technology - Computing ideas in Panini
- Text processing principles in Meemamsa
- Formal languages in Nyaya
- we lack the technology and application area
- Story of Babbage!!!
7Massage of Acharya Shankara Bhagavatpada
- avidyayaa mrtyum tiirtvaa..
- vidyayaa amrtamashnute.. - Ishavasya
Uapanishad - Sri Shankara Bhagavatpada comments on this ..
- avidyaa karma vidyaa knowledge
8Opportunity
- Emerging Info technology has provided a great
oportunity to survive - MÉÞà þÒªÉÉiÉ ÊiÉxiÉÞhÉÒÉÉJÉÉÆ ÊÉOÉÖÉÉJÉÉOɽäþhÉ
ÊEòÉÂ ? - Solve a major contemporary problem like MT basing
on the shastras - Get new openings for Sanskritists
- Open a new avenue for research
9Know How
- Ultimate aim finding appropriate place for
sanskrit Shastras - Method solutions to contemporory problems
adopting modern technology - Resource needed Adequate manpower, who act as a
bridge between modern scientists and
technologists one side and sanskrit scholars on
the other side.
10Change the scenario
- Technology
- Western Theories
- INDIAN THEORIES
11Opportunities missed
- Industrial revolution
- We missed this with some hasty decisions
- IT revolution
- Indians are serving in the level of coding not
in designing level ! - Knowledge Revolution
- we should take this advantage
12Need of the hour
- we need
- to understand how technology works
- to understand the contempomporary problems
- Then
- we will be able to give solutions in the light of
sashtras and show the relevence of Indian theories
13History and Progress
- Conference held at Bangalore in Dec 1987 on
Knowledge Representation and Sanskritam
generated tremendous interest - Nothing much has been archived, except some
efforts and projects here and there in small
scale that too in technical institutions - Time running out ! What progress has been made
since then?
14Complexity of the problem
- Different Goal Two disciplines Technology and
Shastras - are developed in different context - Paradigm difference Modern Scholars are
accustomed to visual teaching method, Traditional
Pandits on the other hand prefer oral tradition - Language Barrier Both of them do not understand
each others language ! - The tuning in of the dialogue will take time
15Who would bell the cat ?
- It needs a long interaction between technologists
and Traditional Sanskrit Scholars - Technical institutions are always ready for such
activities - There is NO much interest is seen in Sanskrit
Institutions - It is we Sanskritists should to bell the cat
16Long process like extraction of ghee from milk
- Nothing miracle happens in the initial stage
- Its a big challenge, one OR two persons are not
enough - We need hundreds of dedicated persons to achieve
a small goal - A person can climb a small hill Team can climb
the Everest
17Identifying the problem
- Analogy- Braman in Upanishads
- what is Brahman?
- we can NOT show it as it is impercievable.
- we can NOT describe it as it is beyond words.
- Hence ,
- we can direct you towards that by way of negating
what we know. - (ÉÉä½þ) - ÉÉJÉÉSÉxpùɺÉÉûxvÉiÉÒxªÉɪÉ
18Platform For Innovation
- To achieve this Rashtiya Sanskrit Vidyapeetha has
set up a view Innovative centre for advanced
study and research in shabdabodha and language
technology - Center has faculty from shabdabodha (Nyaya
Vyakarana Meemamsa), NLP and computer science - Center has full-fledged computer lab
19Possible areas
- Machine Translation
- Speech Processing
- Summary Extraction from huge texts
- Indo Wordnet as a base for IL-wordnets
- Developing Tools for IL Researchers
- Knowledge Representation schemes
20Machine Translation
- English To Indian Languages
- Word sense disambiguation
- Karaka Syntax Relation
- Word-grouping
- Idiomatic Expression
- Shabdasutra
- MT among Indian Languages
- Bi-language Electronic Dictionaries
- Karaka Vibhakti Relation
21Major MT systems
- India
- Angla-Bharati, IIT Kanpur
- Shakti, IIIT Hyderabad
- Mantra, CDAC Pune
- SaHiT (Sanskrit Hindi Translator), CSS, JNU
- Anusaaraka (RSV, HCU, IIIT)
22Major MT systems
- Outside India
- UNITRAN
- BabelFish AltaVista (Systran)
- ATR (bimodal, Japan)
- JANUS (bimodal, US-Germany)
- SLT (SRI, Cambridge)
- VERBMOBIL (Germany)
- DIPLOMAT (Carnegie-Mellon)
- Get a 125 page directory of available MT systems
at - http//ourworld.compuserve.com/homepages/WJHutchin
s/Compendium-11.pdf
23Summary Extraction
- Meemamsa Principles applied to extract the
summary of a text - Upakramaadi Tatparya Lingas are used to extract
the summary of a text in Indian Institute of
Science, Bangalore, in our consultancy.
24 Wordnet / Concept-net based on NN ontology
- Wordnet is an electronic lexical reference
resource system designed on the basis of semantic
relations of words - Synonymy Graha, nivaasa,.
- Hypernymy Amra, vriksha, vanaspati
- Antonnymy Shreemaan, akinchana
- Mecronymy nAsika, mukha, shariira..
- Gradation Shushka,tara,.tama
25Sanskrit Corpus
- Annotating the relation in Sanskrit Texts
- Tagging Samasas
- Identifying the topics of the texts
- Make available Sanskrit Texts along with Simple
translations on web and CD R form - Statistical analysis of Sanskrit Texts
26Knowledge Engineering
- Representation
- For Data representation, several databse
management systems are available. - For representing and retrieving useful
information, there are various worked out
methodologies - Finally Knowledge Representation needs special
treatment where Indian Knowledge systems can be
applied
27Knowledge and its importance in AI
- AI researchers are interested in building
Intelligent systems - Web technologies looking forward to Semantic webs
instead of syntactic web - Knowledge is more valuable than data and
Information - Data simple DoB. Info Age calculated.
- Knowledge the judgment about suitability for
job at hand etc. This requires a lot of inputs
from various K- sources.
28Computational Linguistics and Paninis Grammar
- The structure of Paninian Grammar is nothing but
a computer program Babbage ! - It has captured the base of universal principles
of all languages - CL requires formal rules for analysis and
generation of language - Slowly Chomsky and others are turning towards
Panini
29The System of Panini
- Phonetic component
- Phonemes
- pratyahara
- Rule base
- Vidhi (operations)
- Samjna
- paribhasha (metarules)
- adhikara (headings)
- atide?a (extension)
- niyama (restriction)
- Lexicon
- Dhatupaatha
- Ganapaatha
- Lists
- Affixes
- Rule specific items
30Paninian Model for Sentence Analysis
- Action Central theme
- Karakas Syntactico-semantic roles
- Visheshana-Visheshyabhava
- Concept of anabhihitein switching to different
voice - Vivakshaa Intention of speaker
- Form and meaning
31Navya Nyaya -gt AI ?
- Classify Nyaya into five parts ..
- 1. Ontology
- 2. Epistemology
- 3. Technical Language
- 4. Semantics
- 5. Art of debate and fallacies
32Ontology
- Includes
- Categories - Substance, Quality etc.,
- Relations SamavAya, SvarUpa
- Universals Types or classes
- Ontology helps to various areas like NLP, K-Repr,
K-Engg, especially in Cognitive sciences.
33Epistemology
- Deals with
- Cognitive process
- Cognitive structure
- It helps to solve the problems of cognitive
sciences and K-repr.
34Technical Language
- NNL is a Restricted Language that has both the
features power of mechanism of Artificial
Languages and power of of expression of Natural
Languages. - The basic ideas behind this language will be
helpful in Knowledge Represenation.
35Semantics
- Way of analysis of semantics shown by Navya
Naiyayikas has been crucially found helpful in
NLP and Machine Translation - Eg. Classification of words rUdha, yoga
- Syntactical analysis
- Power of definitions
- KR NN
36Semantics in MT
- Lexicography
- Word/concepts nets based NN ontology
- Classification of padas (words)
- Rudha word has convention I.e names
- Yougik word has etymological meaningcook,
driver, - Yoga-rudha which has etymology as well as
conventionCD-driver
37WSD using different techniques
- Definitions of Karaka relation without any
overlap - Kartrtvam kriyAnukUlakritimattvam
- Karmattvam para-samaveta-kriyA-janya-phala-Ashra
yatvam - Going Rama and Forest
- Who is going where ?
- Result contact is possible in Rama too..
- To avoid such overlap, this def. Is useful
38Refinement of karaka Relations
- Classification of Karma
- Karma Reachable, understandable so on.
- Analysis of root semantics
- Leave He left the place / left from the place
- Analysis of expectancy (AkAnkshA)
- Rats killed cats
39To infinity relation
- I stand up to speak
- I want o speak
- He goes to London to study law
- He wants to study law in London
- To walk in mornings is good for health
40Computer as a Tool
- story of Greek research
- not only sciences, but humanities subjects are
also benefited by the aid of computers - we can use computers
- to improve our education method
- to improve the quality in research
41 Power of computers
- Memory store any amount of data in discs
- Speed processing access it fast
- Search
- Replace / Edit/ Add
- Get statistical info
- Create hyperlinks
- Present it in a better way
- produce it several times less cost
- Distribute in easy ways
42Sansk - Net
- an online gigantic electronic library of Sanskrit
works - more than 500 works(3,00,00,000 pages of
E-content) - www.sansknet.ac.in
- Dhathuratnakara is available on web. It can be
accessed through web http/sanskrit.nic.ac.in
43CD R Production
- Paniniya Udaharanakosha is now available in CD
form - 'koshas' will be made available in CD form.
Vachaspathyam, Sabdakalpadruma - Dhaturatnakara All the forms of all roots will
be made available on CD R. - Morphological analyzer for Sanskrit
44Vatmikiramayana on NET
- - Vatmikiramayana moolam in all Indian scripts
- -Audio recording
- -Transalation in five foriegn languages.
- -Eight Sanskrit commentories
- -English transalation and commentories
- -Summary, Glossary
- -Beautiful picture gallary
- http//www.rsvpramayana.ac.in
45Machiene translation
- English to Sanskrit
- Circular translation
- English Sanskrit dictionaries
- Sanskrit wordnet
46Sanskrit readers (accessors)
- Ramayana accessor
- Bhagavathgeeta reader
- Nyaya Classics Reader
- Vyakarana Reader
47Sanskrit language processing tools
- Sandhi concator - Ready
- Morphological analyser Hosted on web
- Sandhi spliter (Under progress)
- Samasa tag interpretor - Ready
48Future Projects
- Text to speach for Sanskrit texts
- High quality search engine for Sanskrit E-library
- Hypertext archive for Sanskrit Literature
49Dream Projects
- Paninian Grammar for English (MT)
- Ground work is done
- A national Symposium conducted
- Validity checking of Paninian system through
computing - Basing teaching material is ready
- Sanskrit Wordnet
- Prototype project is undertaken by a student
50Namaste!
Thank you
- Special thanks to
- The authorities of
- Sri Chandrashekharendra Sarasvati
Vishvamahavidyalaya - Kanchipuram