CrossLingual Information Retrieval EnglishTamil - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

CrossLingual Information Retrieval EnglishTamil

Description:

Domain-specific Enhancement of Bilingual Dictionary (English-Tamil) ... Involves Parallel corpus based matching / UNL to Tamil deconverter. Techniques Used : ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 13
Provided by: tdilM
Category:

less

Transcript and Presenter's Notes

Title: CrossLingual Information Retrieval EnglishTamil


1
Cross-Lingual Information Retrieval
(English-Tamil)
  • Proposers Dr. T.V.Geetha
  • Dr. Ranjani Parthasarathi
  • Department of Computer Science Engg.
  • CEG, Anna University
  • Chennai 600025

2
Proposed Components Resources
  • Components
  • Morphological Analyser
  • Keyword translator
  • IE engine (POS,Shallow Parser, NER, UNL)
  • English Tamil Transfer (UNL/Template)
  • Summary Generator
  • Lexical Resources
  • Domain-specific Enhancement of Bilingual
    Dictionary (English-Tamil)
  • Domain-specific Parallel corpus

3
Components
4
Morphological Analyser (Tamil)
  • Task Enhancement of existing MA
  • Technique Used Rule-based
  • Expected performance
  • gt90 for the chosen domains (standard corpus)
  • Domains Chosen
  • Popular science
  • Evaluation metrics
  • Cross-check robustness with standard corpus/MA

5
Keyword translator
  • Task Translation of keywords with sense
  • Technique Used
  • Statistical based and Language model based word
    sense disambiguation
  • Expected performance
  • gt 90 for the chosen domains
  • Domains Chosen
  • Popular science
  • Evaluation metrics
  • Accuracy and precision (Corpus-based evaluation)

6
Information Extraction for Tamil
  • Task Extract info from search results into
    Template / UNL representation
  • Involves POS, Shallow Parser, Named Entity
    Recognizer, Template matching, Tamil to UNL
    enconverter
  • Techniques Used
  • Statistical and rule based
  • Expected performance
  • gt 85 precision and recall for the chosen domains

7
English Tamil Transfer (UNL/Template)
  • Task Transfer info from English search engine
    results in Template / UNL representation to Tamil
  • Involves Parallel corpus based matching / UNL to
    Tamil deconverter
  • Techniques Used
  • Statistical and rule based
  • Expected performance
  • gt 85 for the chosen domains

8
Summary Generator
  • Task From Template to Summary
  • Involves Morphological generation, multi-sentence
    fusion
  • Techniques Used
  • Statistical and rule based sentence generation,
    and reference resolution
  • Expected performance
  • gt 85 precision for the chosen domains

9
Lexical Resources
10
Domain-specific Enhancement of Bilingual
Dictionary (English-Tamil
  • Existing Dictionary (50000 root words generic)
    to be enhanced
  • 20000 domain specific words to be added from a
    chosen corpus of 15K sentences
  • Expected size 70000 root words
  • Other Indian languages 20K-50K words (being
    developed)
  • Evaluation metrics Corpus-based evaluation

11
Domain-specific Parallel corpus
  • Expected outcome 15K sentences in the chosen
    domain fully tagged (manually/ semi-automatically)
    in both languages
  • Not available for many languages
  • 10K sentences at the end of phase 1, and the
    remaining 5K at the end of phase 2
  • Evaluation metric Accuracy
  • Indirect metric precision of search results

12
Thank you
Write a Comment
User Comments (0)
About PowerShow.com