Title: English to Indian Language
1English to Indian Language Machine
Translation System
Rajeev Sangal Dipti Misra Sharma IIIT Hyderabad
2Components and Modules
Title English to Indian Language MT System
Proposer Rajeev Sangal, Dipti M Sharma, Soma
Paul, Lakshmi Bai
Institution International Institute of
Information Technology, Hyderabad
Language pairs English-Hindi, English-Bengali
Name the components English Parser
Transfer Grammar Engine
Sentence generator Engine
Hindi word generator Bengali
word generator
3 English Parser
Language English
Technique that will be used English dependency
parser Parser will contain two modules 1)
Collins/Charniak model for phrase structure
parsing, 2) Phrase structures to dependency trees
convertor
Performance of these techniques in other
languages? English Phrase structure parsers
88-90
Estimate of the expected performance -
Parser 88 Converter 1st
year ? 85, 2nd year ? 95
Domain for which the performance will be
optimized To be decided.
Possibly Business or Popular Science
Evaluation Matrics Accuracy
4Transfer Grammar Component
Language/Language pair Generic Engine
Technique Rule-based
Performance of these techniques in other
languages (see below)
Estimate of the expected performance -
(see below)
Domain for which the performance will be
optimized To be decided.
Most probably Business or Popular Science
Evaluation Metrics No quantitative evaluation
is available for this specific component
nationally or internationally. We will try to
evolve a metrics for this.
5 Word Generators
Language Hindi, Bengali
Technique that will be used Paradigm based
Performance of these techniques in other
languages 100
Estimate of the expected performance - 1st
year 96-98 2nd year 100
Domain for which the performance will be
optimized General purpose
Evaluation Matrics Accuracy, coverage
6Sentence Generator Engine
Language Generic (Horizontal)
Technique that will be used Rule-based
Performance of these techniques in other
languages No quantitative
figures available
Estimate of the expected performance -
Domain for which the performance will be
optimized To be decided.
Most probably Business or Popular Science
Evaluation Matrics Naturalness
7Lexical Resources
Language pair English ? Hindi, English-Bengali
Name the lexical resources that will be built
English Hindi Dictionary
English Bengali Dictionary Transfer
Grammar Rules for English Hindi
Transfer Grammar Rules for English
Bengali English Hindi dictionary is
already available. It will be enhanced.
English Bengali and English Hindi dictionaries
will be parallel dictionaries with aligned
senses.
8Bilingual Dictionary
Language pair English Bengali
Final size 25k root words
Average size of such resources in other languages
20k
Estimate of the expected size 1st year
12k 2nd year 25k
Domain for which the performance will be
optimized To be decided.
Most probably Business or Popular Science
Evaluation Matrics Quality, coverage
will be sense aligned with
English Hindi dictionary
9Transfer Grammar Rules
Language pair English Hindi, English Bengali
Final size 150-200 rules
Estimate of the expected size 1st year
150 2nd year 200
Domain for which the performance will be
optimized To be decided.
Most probably Business or Popular Science
Evaluation Matrics comprehensibility