Title: Agro Explorer
1Agro Explorer UNL
CS 671 ICT For Development 19th Sep 2008
- Vishal Vachhani
- CFILT and DIL,
- IIT Bombay
2Agro Explorer A Meaning Based Multilingual
Search Engine
3Introduction to aAqua
- Web-site for Indian farmers
- Farmers can submit their problems related to
their crops - Queries are answered by Agricultural Experts at
KVK, Baramati - Languages supported Marathi, Hindi, English
4Why Need Multilingual Search
- Vast Amount of Information available on the Web
- Almost 70 of the Information is in English
- The Indian rural populace is not
English-Literate - ? A Big Language Barrier
- Information has to be made available to them in
their local languages.
5Why Need Meaning Based Search
- Most of the current Search Engines are Keyword
Based. - They do not consider the semantics of the query
- The result set contains a large number of
extraneous documents. - Search based on the Meaning of the query will
help narrow down on the desired information
quickly.
6Multilinguality
7Meaning Based Search
Same Keywords Different Semantics
Moneylenders Exploit Farmers
Farmers Exploit Moneylenders
Found 1 Result
Found 0 Result
8Agro Explorer System
- Provides both
- Meaning Based Search
- Cross-Lingual Information Access
9System Architecture
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15 Conclusion
- Provides two independent features
- Multi-Linguality
- Meaning Based Search.
- Because of UNL both multi-lingual and meaning
based properties can be incorporated together
rather than using separate language translators
in search engines. - The scheme admits itself to Integration of
multiple languages in a seamless, scalable
manner.
16UNL Universal Networking Language
17UNL System
Hindi
English
French
UNL
Tamil
Marathi
18Approaches of MT System
- Direct translation
- - translation will be done directly
- - N(N-1) translator are needed for
N - languages translation.
- Intermediate Language
- - intermediate language will be
used - for language translation
- - Only 2N translators are
required.
19UNL Interlingua
- UNL is an acronym for Universal Networking
Language. - UNL is a computer language that enables computers
to process information and knowledge across the
language barriers. - UNL is a language for representing information
and knowledge provided by natural languages - Unlike natural languages, UNL expressions are
unambiguous.
20UNL Interlingua
- Although the UNL is a language for computers, it
has all the components of a natural language. - It is composed of Universal Words (UWs),
Relations, Attributes. - Knowledge semantic graph
- Nodes ? concepts
- Arcs ? relation between concepts
21Universal Words (UWs)
- A UW represents simple or compound concepts.
There are two classes of UWs - unit concepts
- compound structures of binary relations grouped
together ( indicated with Compound UW-Ids) - A UW is made up of a character string (an
English-language word) followed by a list of
constraints. - ltUWgtltHead WordgtltConstraint Listgt
- example
- state(iclgtexpress)
- state(iclgtcountry)
22Relations
- A relation label is represented as strings of 3
characters or less. - The relations between UWs are binary.
- rel (UW1, UW2)
- They have different labels according to the
different roles they play. - At present, there are 46 relations in UNL
- For example, agt (agent), ins (instrument), pur
(purpose), etc.
23Attribute Labels
- Attribute labels express additional information
about the Universal Words that appear in a
sentence. - They show what is said from the speakers point
of view how the speaker views what is said.
(time, reference, emphasis, attitude, etc) - _at_entry, _at_present, _at_progressive, _at_topic, etc.
24UNL Interlingua
- Example
- Ram eats rice.
- unl
- agt(eat._at_entry._at_present, Ram)
- obj(eat._at_entry._at_present, rice(iclgteatable))
- /unl
25UNL as graph
eat
plc
agt
rice
Ram
26UNL Interlingua
- Example
- The boy who works here went to school.
- unl
- agt(go(iclgtmove)._at_entry._at_past, 01)
- plt(go(iclgtoccur)._at_entry._at_past,school(iclgtinstitu
tion)) - agt01(work(iclgtdo), boy(iclgtperson._at_entry))
- plc01(work(iclgtdo),here)
- /unl
27UNL as graph
28Intermediate Language
Enconvertor
Source language
Intermediate Language
Deconvertor
target language
29DeConverter
- Its a Language Independent Generator
- It can deconvert UNL expressions into a variety
of native languages, using a number of linguistic
data such as Word Dictionary, Grammatical Rules
of each language. - The DeConverter transforms the sentence
represented by a UNL expression into Natural
language sentence.
30DeConverter Block Diagram
31Block diagram of the natural language generator
Dictionary
Syntax Planning Rules
Case Marking Rules
Morphology Rules
HindiDoc
UNLDoc
UNL Parser
Case Marking Module
Morphology Module
Syntax Planning Module
Language dependent Module
Language Independent Module
32UNL Parser
- UNL parser module will do following tasks
- Check input format of UNL document
- Separate attributes form UWs
- Separate attributes form dictionary entries
- Replace UWs with Hindi root words
33Case Marking Module
- Category of morpho-syntactic properties which
distinguish the various relations that a noun
phrase may bear to a governing head. - ??, ?? ,??, ??, ??,etc.
- A rule base based on
- UNL attributes
- lexical attributes from dictionary
34Case Marking Module.
- Case marking is implemented using rules.
- We analyze all UNL as well as dictionary
attributes and decide next and previous case
marker. - Also we use relation with parent to extract the
right case mark. -
35Rule for Case Marking
- agtnullnullnull??_at_pastVVINTNnull
- Structure
- relName
- parent previous case marker
- parent next case marker
- child previous case marker
- child next case marker
- the rest four are in form of
- attr'REL'relationname
- and attr will be separated by
- also relation name are separated by
36Morphology Module
- What is Morphology
- Study of Morphemes
- Their formation into words, including inflection,
derivation and composition
37Types of Morphology
- Noun, Verb and Adjective Morphology
- Depends on the phonetic properties of the Hindi
word - Noun Morphology
- Depends on gender, number and vowel ending of the
noun - Adjective Morphology
- ????? ????, ????? ????, ????? ????
- adjective ???? changes, lexical attribute AdjA
- Verb Morphology
- Depends upon tense, gender, number , person etc.
38Verb Morphology
- Verbs are categorized by
- Tense (past,present,future)
- Gender(male,female)
- Person (1st , 2nd , 3rd )
- Number (sg,pl)
- Example
- Ladaka khana kha raha hai.
- It contains present continuous tense,male, sg,
and 3rd person
39Syntax planing
- Arranging word according to the language
structure - Rule based module
- It is priority based graph traversal
40General strategy
- Algorithm for Syntax Planning
- 1) Start traversing the UNL graph from the entry
node. - 2) If node has no children then add this node to
final string. - 3) If there is more than one child of one node
then sort children based - on the priority of the relations. Relation
having highest priority will be - traversed first.
- 4) Mark that node as visited node.
- 5) Repeat steps 3 and 4 until all the children of
that node get visited. - 6) If all the children of that node get visited
then add that node to final - string.
- 7) Repeat steps 2 to 4 until all the nodes get
traversed.
41Example
- Also, spray 5 Neemark solution.
42Flow
43Flow
44Flow
45Flow
46Flow
47Flow
48Flow
49Flow
50Flow
51Flow
Output 5
52Flow
Output 5 percent
53Flow
Output 5 percent Neemark
54Flow
Output 5 percent Neemark solution
55Flow
Output 5 percent Neemark Solution also
56Flow
Output 5 percent Neemark Solution also spray
57General strategy (continued)
- Output
- 5 percent Neemark solution also spray
- 5 ??????? ??????? ??? ?? ??????
- 5 ??????? ??????? ??? ?? ??????
58Output at each stage
Input sentence Its roots are affected by
bacterial infection.
Module
Output
Input
Its roots are affected by bacterial infection.
UNL parser
???? ???????? ?????????
????????
Case marking
???? ???????? ????????? ???????? ??
Morphology
???? ????? ????????? ???????? ???? ??? ???????
??
Syntax Planning
????????? ??????? ?? ???? ????? ???????? ????
???
Output ????????? ??????? ?? ???? ?????
???????? ???? ???
59References
- UNL 2005 Specifications http//www.undl.org/unls
ys/unl/unl2005/ - S.Singh, M.Dalal, V.Vachhani, P.Bhattacharrya and
O.Damani Hindi generation from interlingua
MTsummit 2007 - (www.cse.iitb.ac.in/vishalv
) - Mrugank Surve, Sarvjeet Singh, Satish Kagathara,
Venkatasivaramasastry K, Sunil Dubey,
Gajanan Rane, Jaya Saraswati, Salil Badodekar,
Akshay Iyer, Ashish Almeida, Roopali Nikam,
Carolina Gallardo Perez, Pushpak Bhattacharyya,
AgroExplorer Group AgroExplorer a Meaning Based
Multilingual Search Engine, International
Conference on Digital Libraries (ICDL), New
Delhi, India, Feb 2004. - Agro Explorer http//agro.mlasia.iitb.ac.in
- aAQUA http//www.aaqua.org
-