Resource Creation for SanskritHindi, Maithili Automatic Translation - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Resource Creation for SanskritHindi, Maithili Automatic Translation

Description:

Sanskrit Karaka analyzer. Sanskrit POS Tagger. Annotated corpora ... Paninian Karaka techniques ... The Paninian Karaka approach has been used in Anusaaraka ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 9
Provided by: ritu8
Category:

less

Transcript and Presenter's Notes

Title: Resource Creation for SanskritHindi, Maithili Automatic Translation


1
Resource Creation for Sanskrit-Hindi, Maithili
Automatic Translation
  • Project Proposal
  • Submitted to
  • The Department of Information Technology
  • Ministry of Communication Information
    Technology,
  • Govt. of India
  • by
  • Dr. Girish Nath Jha
  • Special Center for Sanskrit Studies
  • Jawaharlal Nehru University, New Delhi-110067.

2
Summary
  • Title of Project
  • Resource Creation for Sanskrit-Hindi,
    Maithili Automatic Translation
  • Proposer
  • Girish Nath Jha, Asstt. Professor,
  • Institution
  • Jawaharlal Nehru University, New Delhi 110
    067
  • Language/Language pair
  • Sanskrit-Hindi, Maithili
  • Nature of Project
  • Application oriented Research, Design and
    Development having production potential.

3
components to be implemented
  • Sanskrit Karaka analyzer
  • Sanskrit POS Tagger
  • Annotated corpora for Sanskrit
  • Multilingual online Amarakosha
  • Lexical resources for Sanskrit-Hindi
  • Lexical resources for Sanskrit-Maithili
  • Text generator for Hindi
  • Text generator for Maithili
  • Paradigm sets for Maithili
  • Paradigm sets for Hindi

4
techniques to be used
  • Paninian Karaka techniques for analysis
  • Sanskrit lexicography and morphological
    approaches for POS Tagging
  • Ontological lexicography (Amarakosha) and word
    net approaches
  • Lexical resources creation techniques for
    Sanskrit, Hindi and Maithili
  • Paradigm Text generation techniques for Hindi and
    Maithili

5
Previous Project experience
  • Online Multilingual Amarakosa (2005-) As
    Director funded by JNU under UPOE program of
    UGC
  • Database system for dialects of Hindi
    (2002-2003) as co-director
  • ALED (Active Living Everyday) 2001-2002 at Human
    Kinetics, IL, USA
  • BISS Databse develoment project at the University
    of Illinoiss Beckman Institute for Advanced
    Science and Technology 1998-1999
  • Analyzing schizophrenic speech at Dept. of
    Psychology, University Of Illinois,
    Urbana-Champaign, 1997
  • Developing technical terminology for Hindi
    (participated in a series of workshops organized
    by Scientific and Technical Terminology
    Commission, Min of HRD, 1994-1996) and
    contributed in developing technical terminology
    for Hindi
  • Computer Assisted Sanskrit Teaching Learning
    Environment (DoE sponsored project at JNU
    1991-1994) research investigator

6
workforce
  • JNU is a leading university with strong
    multidisciplinary approach
  • Large number of students from the linguistic and
    Sanskrit centers are taking courses in
    computational linguistics and are available for
    project work
  • 4 Ph.D students
  • 4 M.Phil students
  • 30 MA students

7
performance of these techniques in other languages
  • The Paninian Karaka approach has been used in
    Anusaaraka (Sanskrit-English)
  • Morphological analysis approaches are generally
    used in POS tagging for Indian languages
  • Ontological principles are embedded in the word
    net approaches (e.g. for Hindi by IIT Mumbai)
  • Lexical and data oriented approaches are used in
    most statistical or hybrid systems (e.g. Shakti
    system of IIIT)
  • Paradigm and text generation techniques have been
    used in AnglaBharati

8
work done so far
  • The English-Hindi lexical semantic component
  • Sanskrit Subanta generator
  • English SQL NLI
  • Online Multilingual Amarakosha
  • Sanskrit Analysis System
  • Sandhi analyzer
  • Subanta analyzer
  • Dhatu database
  • POS tagging
  • Tagged MW Dictionary
  • Karaka analyzer
  • E-corpora (sandhi free, with sandhi)
  • Sample Online lexicon based tagger
  • Sample online lexicon based translator
Write a Comment
User Comments (0)
About PowerShow.com