MEANING: a Roadmap to Knowledge Technologies - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

MEANING: a Roadmap to Knowledge Technologies

Description:

make sense of petabytes of information. Range of techniques to automate knowledge lifecycle ... Diathesis alternations. Domain information. Topic signatures ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 22
Provided by: german
Category:

less

Transcript and Presenter's Notes

Title: MEANING: a Roadmap to Knowledge Technologies


1
Meaning
  • MEANING a Roadmap to Knowledge Technologies
  • German Rigau. TALP Research Center. UPC.
    Barcelona. rigau_at_lsi.upc.es
  • Bernardo Magnini. ITC-IRST. Povo-Trento.
    magnini_at_itc.it
  • Eneko Agirre. IXA group. EHU. Donostia.
    eneko_at_si.ehu.es
  • Piek Vossen. Irion Technologies. Delft.
    Piek.Vossen_at_irion.nl
  • John Carroll. COGS. U. Sussex. Brighton.
    johnca_at_cogs.susx.ac.uk
  • http//www.lsi.upc.es/nlp/meaning/meaning.html

2
MeaningIntroduction
  • Knowledge technologies (semantic web) make
    sense of petabytes of information
  • Range of techniques to automate knowledge
    lifecycle
  • Lexical KB (ontologies)
  • Text understanding (IE or other)
  • extract high-level meaning
  • represent and manage in a KB
  • HLT to enable knowledge technologies

3
MeaningIntroduction
  • Building large and rich KB by hand
  • ExpensiveE.g. CYC, WordNet (EuroWordNet)
  • Introspection fails to reflect reality in texts,
    domains
  • Is a saint an animate being? not always,
    image.
  • Contradictions
  • ? Hamper applications of HLT and KT
  • Richer KBs (ontologies)
  • Domain knowledge
  • Contradictory subsets
  • ? Semi-automatic means

4
MeaningIntroduction
  • Crucial intermediate tasks
  • Word Sense Disambiguation? From words to
    concepts (word senseconcept in KB)
  • Large scale enrichment of (multilingual) Lexical
    KB? Enable semantic processing
  • Goal
  • ?Large-scale extraction of shallow meaning
    relations among concepts

5
MeaningShallow semantics
act
Invite s456
object
source
destination
s378
s412
s933
(Chirac) (invita) (al Dalai_Lama) (a un almuerzo
oficial)
(Chirac) (invites) (the Dalai_Lama) (to an
official lunch)
6
MeaningIntroduction
  • Crucial intermediate tasks
  • Word Sense Disambiguation
  • Large scale enrichment of (multilingual) Lexical
    KB
  • Problems (research goals)
  • Enriching LKBs, acquisition of linguistic
    knowledge
  • Corpora need to be accurately tagged with
    concepts
  • Accurate WSD needs
  • Hand-tagged data OR richer LKB
  • Multilinguality
  • Words in several languages linked to common
    concepts

7
MeaningOutline
  • Major research goals
  • Knowledge acquisition into LKBs
  • WSD into LKB concepts
  • Multilingualism
  • Meaning roadmap
  • Overview of the project

8
MeaningKnowledge acquisition into LKBs
  • Semi-automatic acquisition of linguistic
    knowledge from corpora is working
  • Subcategorization information
  • Selectional preferences
  • Thematic role assignments
  • Diathesis alternations
  • Domain information
  • Topic signatures
  • Rich lexico-semantic relations between words
    (dictionaries)
  • Large bodies of text with (fast) shallow
    processors

9
MeaningKnowledge acquisition into LKBs
  • Knowledge for words is not enough
  • Verb senses have different selectional
    preferences for e.g. the subject
  • The car ate all the petrol (WN)
  • Verb senses may have different subcat. frames
  • Better to key into word senses source corpora
    should be tagged
  • Better reflect linguistic phenomena
  • Detect new senses
  • Clustering senses
  • Integrate easily into the multilingual LKB

10
MeaningWSD into LKB concepts
  • Senseval-2 uses word senses (concepts) from WN
    1.7
  • No large-scale broad-coverage WSD system is
    available
  • Accuracy around 60-70 (V/A/N) when hand-tagged
    data available
  • Use hand-tagged data to train ML systems
  • Ngs estimate 16 persons/year (short)
  • Promising research lines
  • Automatically create training corpus using
    semantic relations in the LKB (WN)
  • Use untagged data to improve performance
  • Higher precision if more knowledgeable features
    are used (subcat, sel. preferences, domains)
  • Coarse grained Domain tagging / Clusters of
    senses

11
MeaningExploiting EWN Semantic Relations WSD
12
MeaningExploiting EWN Semantic Relations
partido 1 Pero España puso al partido
intensidad, ritmo y coraje. El seleccionador cree
que el partido de hoy contra Italia dará la
medida de España El Racing no gana en su campo
desde hace seis partidos. partido 2 Todos los
partidos piden reformas legales para TV3. La
derecha planea agruparse en un partido. El
diputado reiteró que ni él ni UDC, como
partido, han recibido dinero de Pellerols.
13
MeaningExploiting EWN Semantic Relations
partido 1 Rivera pide el soporte de la afición
para encarrilar las semifinales. Sólo el equipo
de Valero Ribera puede sentenciar una semifinal
como lo hizo ayer en un Palau Blaugrana
completamente entregado. El Racing ganó los
cuartos de final en su campo. partido 2 No
negociaremos nunca com un partido político que
sea partidario de la independencia de Taiwan. Una
vez más es noticia la desviación de fondos
destinados a la formación ocupacional hacia la
financiación de un partido político. Estas lleyes
fueron votadas gracias a un consenso general de
los partidos políticos.
14
MeaningMultilingualism
  • Language diversity is a barrier
  • Language diversity is helpful
  • Languages realize meaning in different ways
  • Use EuroWN multilingual architecture
    Interlingual Index (ILI) links translation
    equivalents via interlingual concepts
  • head ---------- s984574 --------- cabeza
  • -------- s984557 --------- jefe
  • Research on how linguistic knowledge behaves when
    ported to other language (e.g.subcat information)
  • Very important for resource-poor languages

15
MeaningMultilingualism
  • Selectional preference for the object of the
    first sense of know
  • sense 1 know, cognize -- (be cognizant or aware
    of a fact or a specific piece of information
    possess knowledge or information about
  • 0,1128 ltcommunicationgt
  • 0,0615 ltmeasure quantity amount quantumgt
  • 0,0535 ltattributegt
  • 0,0389 ltobject physical_objectgt
  • 0,0307 ltcognition knowledgegt
  • In EuroWordNet (http//ixa.si.ehu.es)
  • antzeman_1, jakin_2 and ezagutu_1 in Basque.
  • conocer_1 and saber_1 in Spanish
  • conèixer_1 and saber_1 in Catalan

16
MeaningMEANING roadmap
  • Solutions have been tried with relative success
    in isolation
  • Combination for significant advances (which?)
  • Web as corpus BNC (100 Mw) small for many
    phenomena
  • Incremental design
  • WSD using whatever knowledge available at the
    time for bootstrapping
  • Acquisition of linguistic knowledge using WSD
    available at the time (may discard low accuracy
    examples)
  • Integrating acquired knowledge in the
    Multilingual Central Repository and porting
    knowledge from one language to the other
  • Series of cycles WSD0, WSD1, WSD2, ACQ0, ACQ1,
    ACQ2, PORT0, PORT1, PORT2

17
Meaning
Architecture
Italian Web Corpus
English Web Corpus
WSD
WSD
Italian EWN
English EWN
ACQ
ACQ
UPLOAD
UPLOAD
Multilingual Central Repository
PORT
PORT
PORT
PORT
Basque EWN
Spanish EWN
ACQ
ACQ
UPLOAD
UPLOAD
Basque Web Corpus
Catalan EWN
Spanish Web Corpus
WSD
Catalan Web Corpus
WSD
18
Meaning
Project overview
  • 3 years research project (started march 2002)
  • 1.610 M Euro
  • 2 contracted people per site
  • Consortium
  • TALP, UPC (German Rigau)
  • ITC-IRST (Bernardo Magnini)
  • IXA, UPV/EHU (Eneko Agirre)
  • University of Sussex (John Carroll)
  • Irion Technologies (Piek Vossen)

19
Meaning Project results
  • A Tool Set that using the semantic knowledge of
    EWN will obtain automatically from the web large
    collections of examples for each particular word
    sense.
  • A Tool Set for enriching EWN using the knowledge
    acquired automatically from the Web.
  • A Tool Set for selecting accurately the senses of
    the open-class words for the languages involved
    in the project.
  • Multilingual Central Repository to maintain
    compatibility between WordNets of different
    languages and versions, past and new.
  • A semantically annotated corpus for each WordNet
    word sense, that is, a multilingual web corpus
    with semantically annotated corpora
  • Demonstration CLIR, Q/A system.
  • The results of MEANING will be public and free
    for research.

20
MeaningWhy now?
  • Huge amounts of data throw out non reliable
  • Syntactic dependencies with high enough accuracy
  • Supervised WSD with high enough accuracy
  • Coarser grains, sense domain tagging
  • Bootstrapping
  • Success coping with multilingualism
  • Porting linguistic knowledge from one language to
    other using MT / comparable corpora
  • CLIR as good as monolingual IR

21
Meaning
  • MEANING a Roadmap to Knowledge Technologies
  • German Rigau. TALP Research Center. UPC.
    Barcelona. rigau_at_lsi.upc.es
  • Bernardo Magnini. ITC-IRST. Povo-Trento.
    magnini_at_itc.it
  • Eneko Agirre. IXA group. EHU. Donostia.
    eneko_at_si.ehu.es
  • Piek Vossen. Irion Technologies. Delft.
    Piek.Vossen_at_irion.nl
  • John Carroll. COGS. U. Sussex. Brighton.
    johnca_at_cogs.susx.ac.uk
  • http//www.lsi.upc.es/nlp/meaning/meaning.html
Write a Comment
User Comments (0)
About PowerShow.com