Agro Explorer - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Agro Explorer

Description:

languages translation. Intermediate Language - intermediate language will be used. for language translation - Only 2*N translators are required. Vishal Vachhani. 18 ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 60
Provided by: vis645
Category:

less

Transcript and Presenter's Notes

Title: Agro Explorer


1
Agro Explorer UNL
CS 671 ICT For Development 19th Sep 2008
  • Vishal Vachhani
  • CFILT and DIL,
  • IIT Bombay

2
Agro Explorer A Meaning Based Multilingual
Search Engine
3
Introduction to aAqua
  • Web-site for Indian farmers
  • Farmers can submit their problems related to
    their crops
  • Queries are answered by Agricultural Experts at
    KVK, Baramati
  • Languages supported Marathi, Hindi, English

4
Why Need Multilingual Search
  • Vast Amount of Information available on the Web
  • Almost 70 of the Information is in English
  • The Indian rural populace is not
    English-Literate
  • ? A Big Language Barrier
  • Information has to be made available to them in
    their local languages.

5
Why Need Meaning Based Search
  • Most of the current Search Engines are Keyword
    Based.
  • They do not consider the semantics of the query
  • The result set contains a large number of
    extraneous documents.
  • Search based on the Meaning of the query will
    help narrow down on the desired information
    quickly.

6
Multilinguality
7
Meaning Based Search
Same Keywords Different Semantics
Moneylenders Exploit Farmers
Farmers Exploit Moneylenders
Found 1 Result
Found 0 Result
8
Agro Explorer System
  • Provides both
  • Meaning Based Search
  • Cross-Lingual Information Access

9
System Architecture
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15

Conclusion
  • Provides two independent features
  • Multi-Linguality
  • Meaning Based Search.
  • Because of UNL both multi-lingual and meaning
    based properties can be incorporated together
    rather than using separate language translators
    in search engines.
  • The scheme admits itself to Integration of
    multiple languages in a seamless, scalable
    manner.

16
UNL Universal Networking Language
17
UNL System
Hindi
English
French
UNL
Tamil
Marathi
18
Approaches of MT System
  • Direct translation
  • - translation will be done directly
  • - N(N-1) translator are needed for
    N
  • languages translation.
  • Intermediate Language
  • - intermediate language will be
    used
  • for language translation
  • - Only 2N translators are
    required.

19
UNL Interlingua
  • UNL is an acronym for Universal Networking
    Language.
  • UNL is a computer language that enables computers
    to process information and knowledge across the
    language barriers.
  • UNL is a language for representing information
    and knowledge provided by natural languages
  • Unlike natural languages, UNL expressions are
    unambiguous.

20
UNL Interlingua
  • Although the UNL is a language for computers, it
    has all the components of a natural language.
  • It is composed of Universal Words (UWs),
    Relations, Attributes.
  • Knowledge semantic graph
  • Nodes ? concepts
  • Arcs ? relation between concepts

21
Universal Words (UWs)
  • A UW represents simple or compound concepts.
    There are two classes of UWs
  • unit concepts
  • compound structures of binary relations grouped
    together ( indicated with Compound UW-Ids)
  • A UW is made up of a character string (an
    English-language word) followed by a list of
    constraints.
  • ltUWgtltHead WordgtltConstraint Listgt
  • example
  • state(iclgtexpress)
  • state(iclgtcountry)

22
Relations
  • A relation label is represented as strings of 3
    characters or less.
  • The relations between UWs are binary.
  • rel (UW1, UW2)
  • They have different labels according to the
    different roles they play.
  • At present, there are 46 relations in UNL
  • For example, agt (agent), ins (instrument), pur
    (purpose), etc.

23
Attribute Labels
  • Attribute labels express additional information
    about the Universal Words that appear in a
    sentence.
  • They show what is said from the speakers point
    of view how the speaker views what is said.
    (time, reference, emphasis, attitude, etc)
  • _at_entry, _at_present, _at_progressive, _at_topic, etc.

24
UNL Interlingua
  • Example
  • Ram eats rice.
  • unl
  • agt(eat._at_entry._at_present, Ram)
  • obj(eat._at_entry._at_present, rice(iclgteatable))
  • /unl

25
UNL as graph
eat
plc
agt
rice
Ram
26
UNL Interlingua
  • Example
  • The boy who works here went to school.
  • unl
  • agt(go(iclgtmove)._at_entry._at_past, 01)
  • plt(go(iclgtoccur)._at_entry._at_past,school(iclgtinstitu
    tion))
  • agt01(work(iclgtdo), boy(iclgtperson._at_entry))
  • plc01(work(iclgtdo),here)
  • /unl

27
UNL as graph
28
Intermediate Language

Enconvertor
Source language
Intermediate Language
Deconvertor
target language
29
DeConverter
  • Its a Language Independent Generator
  • It can deconvert UNL expressions into a variety
    of native languages, using a number of linguistic
    data such as Word Dictionary, Grammatical Rules
    of each language.
  • The DeConverter transforms the sentence
    represented by a UNL expression into Natural
    language sentence.

30
DeConverter Block Diagram
31
Block diagram of the natural language generator
Dictionary
Syntax Planning Rules
Case Marking Rules
Morphology Rules
HindiDoc
UNLDoc
UNL Parser
Case Marking Module
Morphology Module
Syntax Planning Module
Language dependent Module
Language Independent Module
32
UNL Parser
  • UNL parser module will do following tasks
  • Check input format of UNL document
  • Separate attributes form UWs
  • Separate attributes form dictionary entries
  • Replace UWs with Hindi root words

33
Case Marking Module
  • Category of morpho-syntactic properties which
    distinguish the various relations that a noun
    phrase may bear to a governing head.
  • ??, ?? ,??, ??, ??,etc.
  • A rule base based on
  • UNL attributes
  • lexical attributes from dictionary

34
Case Marking Module.
  • Case marking is implemented using rules.
  • We analyze all UNL as well as dictionary
    attributes and decide next and previous case
    marker.
  • Also we use relation with parent to extract the
    right case mark.

35
Rule for Case Marking
  • agtnullnullnull??_at_pastVVINTNnull
  • Structure
  • relName
  • parent previous case marker
  • parent next case marker
  • child previous case marker
  • child next case marker
  • the rest four are in form of
  • attr'REL'relationname
  • and attr will be separated by
  • also relation name are separated by

36
Morphology Module
  • What is Morphology
  • Study of Morphemes
  • Their formation into words, including inflection,
    derivation and composition

37
Types of Morphology
  • Noun, Verb and Adjective Morphology
  • Depends on the phonetic properties of the Hindi
    word
  • Noun Morphology
  • Depends on gender, number and vowel ending of the
    noun
  • Adjective Morphology
  • ????? ????, ????? ????, ????? ????
  • adjective ???? changes, lexical attribute AdjA
  • Verb Morphology
  • Depends upon tense, gender, number , person etc.

38
Verb Morphology
  • Verbs are categorized by
  • Tense (past,present,future)
  • Gender(male,female)
  • Person (1st , 2nd , 3rd )
  • Number (sg,pl)
  • Example
  • Ladaka khana kha raha hai.
  • It contains present continuous tense,male, sg,
    and 3rd person

39
Syntax planing
  • Arranging word according to the language
    structure
  • Rule based module
  • It is priority based graph traversal

40
General strategy
  • Algorithm for Syntax Planning
  • 1) Start traversing the UNL graph from the entry
    node.
  • 2) If node has no children then add this node to
    final string.
  • 3) If there is more than one child of one node
    then sort children based
  • on the priority of the relations. Relation
    having highest priority will be
  • traversed first.
  • 4) Mark that node as visited node.
  • 5) Repeat steps 3 and 4 until all the children of
    that node get visited.
  • 6) If all the children of that node get visited
    then add that node to final
  • string.
  • 7) Repeat steps 2 to 4 until all the nodes get
    traversed.

41
Example
  • Also, spray 5 Neemark solution.

42
Flow
43
Flow
44
Flow
45
Flow
46
Flow
47
Flow
48
Flow
49
Flow
50
Flow
51
Flow
Output 5
52
Flow
Output 5 percent
53
Flow
Output 5 percent Neemark
54
Flow
Output 5 percent Neemark solution
55
Flow
Output 5 percent Neemark Solution also
56
Flow
Output 5 percent Neemark Solution also spray
57
General strategy (continued)
  • Output
  • 5 percent Neemark solution also spray
  • 5 ??????? ??????? ??? ?? ??????
  • 5 ??????? ??????? ??? ?? ??????

58
Output at each stage
Input sentence Its roots are affected by
bacterial infection.
Module
Output
Input
Its roots are affected by bacterial infection.
UNL parser
???? ???????? ?????????
????????
Case marking
???? ???????? ????????? ???????? ??
Morphology
???? ????? ????????? ???????? ???? ??? ???????
??
Syntax Planning
????????? ??????? ?? ???? ????? ???????? ????
???
Output ????????? ??????? ?? ???? ?????
???????? ???? ???
59
References
  • UNL 2005 Specifications http//www.undl.org/unls
    ys/unl/unl2005/
  • S.Singh, M.Dalal, V.Vachhani, P.Bhattacharrya and
    O.Damani Hindi generation from interlingua
    MTsummit 2007
  • (www.cse.iitb.ac.in/vishalv
    )
  • Mrugank Surve, Sarvjeet Singh, Satish Kagathara,
    Venkatasivaramasastry K, Sunil Dubey,
    Gajanan Rane, Jaya Saraswati, Salil Badodekar,
    Akshay Iyer, Ashish Almeida, Roopali Nikam,
    Carolina Gallardo Perez, Pushpak Bhattacharyya,
    AgroExplorer Group AgroExplorer a Meaning Based
    Multilingual Search Engine, International
    Conference on Digital Libraries (ICDL), New
    Delhi, India, Feb 2004.
  • Agro Explorer http//agro.mlasia.iitb.ac.in
  • aAQUA http//www.aaqua.org
Write a Comment
User Comments (0)
About PowerShow.com