Language Technology and the Semantic Web - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

Language Technology and the Semantic Web

Description:

... resources are captured in dictionaries, thesauri, and semantic networks, all ... ontology of the world in general or of more specific domains, such as medicine. ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 78
Provided by: LIA137
Category:

less

Transcript and Presenter's Notes

Title: Language Technology and the Semantic Web


1
Language Technology and the Semantic Web
  • Thierry Declerck Paul Buitelaar (Saarland
    University DFKI GmbH)

2
  • We present collaborative research work on the
    combination of language technology (LT) and
    technologies for encoding (domain) knowledge in
    ontologies, supporting the emergence of the
    Semantic Web (SW), or maybe more appropriate
    Semantic Webs
  • MUMIS (dealing with multimedia content
    indexing and searching in the soccer domain,
    finished in December 2002)
  • MuchMore (dealing with cross-lingual
    information retrieval in the medical domain,
    finished in Mai 2003)
  • Esperonto (developing a Semantic Annotation
    Service for upgrading the actual Web to the
    Semantic Web, Sept. 2002 - Mai 2005)

3
Semantic Web Applications of LT
  • Supporting accurate ontology-based semantic
    annotation of multilingual web documents
    (Knowledge Markup)
  • Supporting Ontology Learning/Construction from
    linguistically/semantically annotated
    multilingual text (Knowledge Extraction)
  • See also the Special Interest Group (SIG-5)
    OntoWeb-lt on Language Technology in Ontology
    Development and Use http//ontoweb-lt.dfki.de

4
Knowledge Markup and Knowledge Extraction
Text/Speech
Text/Speech Mining
Linguistic and Semantic Annotations
Concepts, Relations, Events
 
Linguistic Analysis Morpho-Syntactic Analysis and
Tagging, Semantic Class Tagging, Term/NE
Recognition, Grammatical Function Tagging,
Dependency Structure Analysis
5
Knowledge Markup and Knowledge Extraction (2)
Text/Speech/Image-Video
Text/Speech/Media Mining
Linguistic, Low-level Image and Semantic
Annotations
Concepts, Relations, Events
 
Linguistic and Media Analysis
6
Integration of Language Technology and Domain
Knowledge
7
Linguistic Analysis
Language technology tools are needed to support
the upgrade of the actual web to the Semantic Web
(SW) by providing an automatic analysis of the
linguistic structure of textual documents. Free
text documents undergoing linguistic analysis
become available as semi-structured documents,
from which meaningful units can be extracted
automatically (information extraction) and
organized through clustering or classification
(text mining). Here we focus on the following
linguistic analysis steps that underlie the
extraction tasks morphological analysis,
part-of-speech tagging, chunking, dependency
structure analysis, semantic tagging.
8
Morphological Analysis
Morphological analysis is concerned with the
inflectional, derivational, and compounding
processes in word formation in order to determine
properties such as stem and inflectional
information. Together with part-of-speech (PoS)
information this process delivers the
morpho-syntactic properties of a word. While
processing the German word Häusern (houses) the
following morphological information should be
analysed PoSN NUMPL CASEDAT GENNEUT
STEMHAUS
9
Part-of-Speech Tagging
Part-of-Speech (PoS) tagging is the process of
determining the correct syntactic class (a
part-of-speech, e.g. noun, verb, etc.) for a
particular word given its current context. The
word works in the following sentences will be
either a verb or a noun He works N,V the
whole day for nothing. His works N,V have all
been sold abroad. PoS tagging involves
disambiguation between multiple part-of-speech
tags, next to guessing of the correct
part-of-speech tag for unknown words on the basis
of context information.
10
Chunking
Following Abney chunks as the non-recursive
parts of core phrases, such as nominal,
prepositional, adjectival and adverbial phrases
and verb groups. Chunk parsing is an important
step towards making natural language processing
robust, since the goal of chunk parsing is not to
deliver a full analysis of sentences, but to
extract just the linguistic fragments that can be
surely identified. However, even if this strategy
fails to produce an analysis for the whole
sentence, the partial linguistic information
gained so far will still be useful for many
applications, such as information extraction and
text mining.
11
Named Entities detection
Related to chunking is the recognition of
so-called named entities (names of institutions
and companies, date expressions, etc.). The
extraction of named entities is mostly based on a
strategy that combines look up in gazetteers
(lists of companies, cities, etc.) with the
definition of regular expression patterns. Named
entity recognition can be included as part of the
linguistic chunking procedure and the following
sentence fragment the secretary-general of
the United Nations, Kofi Annan, will be
annotated as a nominal phrase, including two
named entities United Nations with named entity
class organization, and Kofi Annan with named
entity class person
12
Dependency Structure Analysis
A dependency structure consists of two or more
linguistic units that immediately dominate each
other in a syntax tree. The detection of such
structures is generally not provided by chunking
but is building on the top of it. There are two
main types of dependencies that are relevant for
our purposes On the one hand, the internal
dependency structure of phrasal units or chunks
and on the other hand the so-called grammatical
functions (like subject and direct object).
13
Internal Dependency Structure
In linguistic analysis, for this we use the
terms head, complements and modifiers, where the
head is the dominating node in the syntax tree of
a phrase (chunk), complements are necessary
qualifiers thereof, and modifiers are optional
qualifiers. Consider the following example The
shot by Christian Ziege goes over the goal. The
prepositional phrase by Christian Ziege
(containing the named entity Christian Ziege)
depends on (and modifies) the head noun shot.
.
14
Grammatical Functions
Determine the role (function) of each of the
linguistic chunks in the sentence and allow to
identify the actors involved in certain events.
So for example in the following sentence, the
syntactic (and also the semantic) subject is the
NP constituent The shot by Christian
Ziege The shot by Christian Ziege goes over
the goal. This nominal phrase depends on (and
complements) the verb goes, whereas the Noun
shot is the head of the NP (it this the shot
going over the goal, and not Christian Ziege!)
15
Semantic Tagging
Automatic semantic annotation has developed
within language technology in recent years in
connection with more integrated tasks like
information extraction, which require a certain
level of semantic analysis. Semantic tagging
consists in the annotation of each content word
in a document with a semantic category. Semantic
categories are assigned on the basis of a
semantic resources like WordNet for English or
EuroWordNet, which links words between many
European languages through a common inter-lingua
of concepts.
16
Semantic Resources
  • Semantic resources are captured in dictionaries,
    thesauri, and semantic networks, all of which
    express, either implicitly or explicitly, an
    ontology of the world in general or of more
    specific domains, such as medicine.
  • They can be roughly distinguished into the
    following three groups
  • Thesauri Semantic resources that group
    together similar words or terms according to a
    standard set of relations, including broader
    term, narrower term, sibling, etc. (like Roget)
  • Semantic Lexicons Semantic resources that
    group together words (or more complex lexical
    items) according to lexical semantic relations
    like synonymy, hyponymy, meronymy, and antonymy
    (like WordNet)
  • Semantic Networks Semantic resources that
    group together objects denoted by natural
    language expressions (terms) according to a set
    of relations that originate in the nature of the
    domain of application (like UMLS in the medical
    domain)

17
The MeSH Thesaurus
MeSH (Medical Subject Headings) is a thesaurus
for indexing articles and books in the medical
domain, which may then be used for searching
MeSH-indexed databases. MeSH provides for each
term a number of term variants that refer to the
same concept. It currently includes a vocabulary
of over 250,000 terms. The following is a sample
entry for the term gene library (MH is the term
itself, ENTRY are term variants) MH
Gene Library ENTRY Bank, Gene ENTRY
Banks, Gene ENTRY DNA Libraries ENTRY
Gene Bank etc.
18
The WordNet Semantic Lexicon
WordNet has primarily been designed as a
computational account of the human capacity of
linguistic categorization and covers an extensive
set of semantic classes (called synsets). Synsets
are collections of synonyms, grouping together
lexical items according to meaning similarity.
Synsets are actually not made up of lexical
items, but rather of lexical meanings (i.e.
senses)
19
WordNet An example
The word 'tree' has two meanings that roughly
correspond to the classes of plants and that of
diagrams, each with their own hierarchy of
classes that are included in more general
super-classes 09396070 tree 0 09395329
woody_plant 0 ligneous_plant 0 09378438
vascular_plant 0 tracheophyte 0 00008864
plant 0 flora 0 plant_life 0 00002086
life_form 0 organism 0 being 0 living_thing 0
00001740 entity 0 something 0 10025462 tree
0 tree_diagram 0 09987563 plane_figure 0
two-dimensional_figure 0 09987377 figure 0
00015185 shape 0 form 0 00018604
attribute 0 00013018 abstraction 0
20
CyC A Semantic Network
CYC is a semantic network of over 1,000,000
manually defined rules that cover a large part of
common sense knowledge about the world . For
example, CYC knows that trees are usually
outdoors, or that people who died stop buying
things. Each concept in this semantic network is
defined as a constant, which can represent a
collection (e.g. the set of all people), an
individual object (e.g. a particular person), a
word (e.g. the English word person), a quantifier
(e.g. there exist), or a relation (e.g. a
predicate, function, slot, attribute). The entry
for the predicate mother mother
(mother ANIM FEM)
isa FamilyRelationSlot BinaryPredicate
This says that the predicate mother takes two
arguments, the first of which must be an element
of the collection Animal, and the second of
which must be an element of the collection
FemaleAnimal.
21
Word Sense Disambiguation
Words mostly have more than one interpretation,
or sense. If natural language were completely
unambiguous, there would be a one-to-one
relationship between words and senses. In fact,
things are much more complicated, because for
most words not even a fixed number of senses can
be given. Therefore, only in certain
circumstances and depending on what we mean
exactly with sense, can we give restricted
solutions to the problem of Word Sense
Disambiguation (WSD)
22
A simplified Example of a Domain Ontology
Instances
 

23
Example of RDF Schema forthe Movie Ontology
ltrdfRDF xmlnsrdf'http//www.w3.org/1999/02/22
-rdf-syntax-ns' xmlnsrdfs'http//www.w3.org/2
000/01/rdf-schema' xmlnsNS0'http//webode.dia
.fi.upm.es/RDFS/MovieOntology'
gt ltrdfDescription rdfabout'http//webode.dia.fi
.upm.es/RDFS/MovieOntologySpecialEffectsCompanyAc
ting'gt ltrdftype rdfresource'http//www.w3.o
rg/2000/01/rdf-schemaClass'/gt
ltrdfscommentgtDetails of company that created
special effects in this movielt/rdfscommentgt
ltrdfssubClassOf rdfresource'http//webode.dia.f
i.upm.es/RDFS/MovieOntologyCompanyActing'/gt lt/rdf
Descriptiongt ltrdfDescription
rdfabout'http//webode.dia.fi.upm.es/RDFS/MovieO
ntologyPolice'gt ltrdftype rdfresource'http
//www.w3.org/2000/01/rdf-schemaClass'/gt
ltrdfscommentgtFilms that deal solely with police
activitylt/rdfscommentgt ltrdfssubClassOf
rdfresource'http//webode.dia.fi.upm.es/RDFS/Mov
ieOntologyCrime'/gt lt/rdfDescriptiongt
 

etc
24
Integration of Ontology and Semantic Lexicon
  • Example of Semantic Lexicon is WordNet (sometimes
    also referred to as a Linguistic Ontology)
  • Ontologies are domain specific models, usually
    lacking linguistic information (PoS, Morphology,
    Syntax etc.)
  • To be Integrated in One Resource or Kept/Accessed
    Separately?
  • Standardization
  • Format Web-based Standards for Lexical Semantic
    Representation will Increase their Uptake
  • (Easy Plug-and-Play, Remote Access, etc.)
  • Content Widely Used (Lexical) Semantic Resources
    will lead to (Further) Semantic Standardization

25
Defining a Linguistic Ontology for the Art World
(Tentative)
  • ltrdfRDF
  • xmlnsrdf http//www.w3.org/1999/02/22-rdf-sy
    ntax-ns
  • xmlnsrdfs http//www.w3.org/2000/01/rdf-schem
    a
  • xmlnsxsd http//www.daml.org/2000/10/XMLSche
    ma
  • xmlnsdaml http//www.daml.org/2001/03/damloi
    l
  • xmlnsart http//www.art-world.org/art-world
  • gt
  • ltdamlOntology rdfaboutConcepts in the Art
    Worldgt
  • ltdamlimports rdfresourceshttp//www.daml.or
    g/2001/03/damloilgt
  • lt/damlOntologygt

26
Defining Art World Concepts (Classes,
Synsets) (Tentative)
ltdamlClass rdfID"art-world.01"gt
ltrdfslabelgtart-world.01lt/rdfslabelgt
ltrdfssubClassOf rdfresource"http//www.art-wo
rld.org/art-world.00"/gt lt/damlClassgt ltart-world
.01 rdfID"work"/gt ltart-world.01
rdfID"painting"/gt ltdamlClass
rdfID"art-world.02"/gt ltart-world.02
rdfID"beautiful"/gt ltart-world.02
rdfID"colourful"/gt ltdamlClass
rdfID"art-world.03"/gt ltart-world.03
rdfID"paper"/gt ltart-world.03 rdfID"canvas"/gt
27
Defining Properties (Selection
Restrictions) (Tentative)
ltdamlObjectProperty rdfID"manner"gt
ltrdfsrange rdfresource"art-world.02"/gt
ltrdfsdomain rdfresource"art-world.01"/gt lt/daml
ObjectProperty gt   ltdamlObjectProperty
rdfID"medium"gt ltrdfsrange rdfresource"art-
world.03"/gt ltrdfsdomain rdfresource"art-worl
d.01"/gt lt/damlObjectProperty gt   lt/rdfRDFgt
28
(Semantic) Lexicons will be
  • an Important Part of the Semantic Web
  • Represented Using Markup Languages (RDF)
  • Accessible in a Remote, Distributed Fashion
  • Central to Further Semantic Standardization

29
Multilingual terminological lexicon, attached to
a domain ontology (MUMIS)
  • ltlex-element id"ID" concept"Shot-on-goal"gt
  • lt... lang"DE" type"main"gtTorschusslt/termgt
  • lt... lang"EN" type"main"gtshot on
    goallt/termgt
  • lt... lang"NL" type"main"gtschot op
    doellt/termgt
  • ltdefinitiongtein Angriffsspieler kickt den
    Ball zu den
  • gegnerischen Torlt/definitiongt
  • lt... lang"DE" type"synonym"gtDistanzschusslt/t
    ermgt
  • lt... lang"DE" type"synonym"gtNachschusslt/term
    gt
  • lt... lang"DE" type"synonym"gtSchusslt/termgt
  • lt... lang"DE" type"synonym"gtabziehlt/termgt
  • lt/lex-elementgt

30
Extension and Formalization of the multilingual
terminological lexicon, including
syncategorematic information. Supporting WSD.
  • ltlex-element id"ID" concept"Shot-on-goal"gt
  • lt...lang "DE" type "main pos N mod
    von concept Player concept player
    gender gen pos posspron
    gtTorschusslt/termgt
  • lt...lang"DE" type"synonym pos V comp
    SUBJ concept Player gtabziehlt/termgt
  • ltdefinitiongtURL DFB home page/glossarylt/defin
    itiongt
  • lt/lex-elementgt

31
Integrating Syntactic and Domain Knowledge
Including Syntactic Analysis for a more accurate
tagging of domain specific semantic annotation
 

32
Abstraction over Syntactic Annotation
Ontology_3 Dependencies Head
Comp Mod Spec
 
Ontology_4 Grammatical Functions Subject,
Object, Ind. Object NP Adjunct, PP Adjunct, etc..

33
Merging of Syntactic and Domain Knowledge
Example of a possible rule for conceptual
annotation If (Head of Subj_NP of
Verbtypesoccershot-on-goal is a person) gt
annotate head of NP with semantic class
soccerplayer Example of a rule for
Instance Filling If (term annotated with
concept soccerplayer) gt try to find
information about relations Team, Age etc.
(Template Filling in Information Extraction).
 

34
NLP-based knowledge markup
35
MuchMore DTD for Annotation
id

code

from

pref



umlsterm
umlsterms
to

tui

cui
msh
code

id



xrceterms
xrceterm
pref
from


to

tui

cui
msh

id

term1
semrels

semrel

term2
type


document
sentence

ewnterms
ewnterm
sense

offset

id


gramrels
gramrel

type

id

from

chunks

chunk

to

type

id

text

token

pos

lemma

36
MuchMore Linguistic Annotation (Lemmatization,
POS, Basic Chunking)
Balint syndrom is a combination of symptoms
including simultanagnosia, a disorder of spatial
and object-based attention, disturbed spatial
perception and representation, and optic ataxia
resulting from bilateral parieto-occipital
lesions. lttextgt lttoken id"w1"
pos"NN"gtBalintlt/tokengt lttoken id"w2"
pos"NN"gtsyndromlt/tokengt lttoken id"w3"
pos"VBZ" lemma"be"gtislt/tokengt lttoken id"w4"
pos"DT" lemma"a"gtalt/tokengt lttoken id"w5"
pos"NN" lemma"combination"gtcombinationlt/tokengt
lttoken id"w6" pos"IN" lemma"of"gtoflt/tokengt
lttoken id"w7" pos"NNS" lemma"symptom"gtsymptomslt
/tokengt ... lttoken id"w20" pos"JJ"
lemma"spatial"gtspatiallt/tokengt lttoken id"w21"
pos"NN" lemma"perception"gtperceptionlt/tokengt
lttoken id"w22" pos"CC" lemma"and"gtandlt/tokengt
lttoken id"w23" pos"NN" lemma"representation"gtr
epresentationlt/tokengt ... lt/textgt ltchunksgt ltchu
nk id"c1" from"w1" to"w2" type"NP"/gt ltchunk
id"c7" from"w20" to"w23" type"NP"/gt lt/chunksgt
gt
37
MuchMore Semantic Annotation (UMLS, EuroWordNet)
Balint syndrom is a combination of symptoms
including simultanagnosia, a disorder of spatial
and object-based attention, disturbed spatial
perception and representation, and optic ataxia
resulting from bilateral parieto-occipital
lesions. ltumlsterm id"t7" from"w20"
to"w21"gt ltconcept id"t7.1" cui"C0037744"
preferred"Space Perception" tui"T041"gt ltmsh
code"F2.463.593.778"/gt ltmsh
code"F2.463.593.932.869"/gt lt/conceptgt lt/umlsterm
gt ltumlsterm id"t8" from"w26"
to"w26"gt ltconcept id"t8.1" cui"C0029144"
preferred"Optics" tui"T090"gt ltmsh
code"H1.671.606"/gt lt/conceptgt lt/umlstermgt ltsemr
el id"r7" term1"t7.1" term2"t8.1"
reltype"issue_in"/gt ltewnterm id"e2"
from"w21" to"w21"gt ltsense offset"0487490"/gt lt
sense offset"3955418"/gt ltsense
offset"4002483"/gt lt/ewntermgt
38
MUMIS DTD for Linguistic Annotation
Subord-Clause
AdvP
AP
Document
Sentence
Paragraph
NE
NP
PP
VG
39
MUMIS DTD for Linguistic Annotation
TYPE
STRUK
AP_AGR
AP
STRING
W
AP_HEAD
40
VG
MUMIS DTD for Linguistic Annotation
TYPE
VG_TYPE
VG_SUBCAT_STEM
VG_AGR
STRING
VG
SENT_STRING
KLAMMER
STRUK
VG_STRG
W
VG_HEAD
...
41
MUMIS DTD for Linguistic Annotation
STEM
INFL
POS
TC
TYPE
STRING
CLAUSE_PP_ADJUNKT
CLAUSE_SUBJ
SENT_STRING
CLAUSE_PRED_SUBCAT
W
CLAUSE
CLAUSE_TYPE
CLAUSE_VG_LIST
CLAUSE_PP_LIST
CLAUSE_NP_LIST
CLAUSE_PRED_STRG
CLAUSE_PRED_AGR
...
42
MUMIS Linguistic Annotation (Lemmatization
Dependency Structure)
Industrie, Handel und Dienstleistungen werden in
der ersten Liste aufgeführt, wobei die in
Klammern gesetzten Zahlen auf die Mutterfirmen
hinweisen. (Industry, trade and services are
mentioned in the first list, in which numbers
within brackets point to parent
companies.)   ltchunksgt ltchunk id"c1" from"w1"
to"w5" type"NP" headw1,w3,w5/gt ltchunk
id"c2" from"w6" to"w6" type"VG"/gt ltchunk
id"c3" from"w7" to"w10" type"PP" headw7
complementw8,w9,w10/gt ltchunk id"c4"
from"w11" to"w1" type"VG"/gt
. lt/chunksgt   ltclausesgt ltclause id"cl1"
from"c1" to"c4" pred_struct"c2 c4"
GF_Subj"c1"/gt ltclause id"cl2" from"c6"
to"c9" pred_struct"c9" GF_Subj"c6"/gt lt/clausesgt
43
MUMIS Semantic Annotation (Events)
7. Ein Freistoss von Christian Ziege aus 25
Metern geht über das Tor. ltchunksgt
ltchunk id"c1" from"w1" to"w5" type"NP"
headw2 pp modifierw3 w4 w5/gt
ltchunk id"c2" from"w6" to"8" type"PP"
headw6 complementw7 w8/gt ltchunk
id"c3" from"w9" to"9" type"VG"/gt
ltchunk id"c4" from"w10" to"w12" type"PP"
headw10 complementw11 w12/gt lt/chunksgt ltclau
sesgt ltclause id"cls1" from"c1"
to"c4" pred_struct"c3 GF_Subj"c1"/gt lt/clauses
gt lteventsgt ltevent id"e1"
clausecls1 event-namefree-kickgt
ltargumentsgt ltargument id"arg1"
name"player valuew4, w5/gt
ltargument id"arg2" name"location
value25-meter/gt ltargument id"arg3"
name"time value0700/gt
lt/argumentsgt lt/eventgt
ltevent id"e2" clausecls1 event-namegoal-scen
e-failgt ltargumentsgt
ltargument id"arg1" name"player valuew4,
w5/gt ltargument
id"arg2" name"location value25-meter/gt
ltargument id"arg3"
name"time value0700/gt
lt/argumentsgt lt/eventgt lt/eventsgt
 

44
Conceptual Annotations for Multimedia Indexing
and Retrieval A multilingual cross-document and
incremental IE approach (MUMIS)
  • Technology development to automatically index
    (with formal annotations) lengthy multimedia
    recordings (off-line process) Find and annotate
    relevant entities, relations and events
  • Technology development to exploit indexed
    multimedia archives (on-line process) Search for
    interesting scenes and play them via Internet
  • Test Domain Soccer Games / UEFA Tournament 2000

 

45
Off-line Task
Indexing by...
  • Automatic Speech Recognition (Radio/TV
    Broadcasts)
  • Automatically transforms the speech signals
    into texts (for 3 languages Dutch, English and
    German)
  • Natural Language Processing (Information
    Extraction)
  • Analyse all available textual documents
    (newspapers, speech transcripts, tickers, formal
    texts ...), identify and extract interesting
    entities, relations and events
  • Merging all the annotations produced so far
  • Create a database with formal annotations
  • Use video processing to adjust time marks

46
Information Extraction
  • Information Extraction (IE) is the task of
    identifying, collecting and normalizing relevant
    information for a specific application or user.
  • The relevant information is typically
    represented in form of predefined templates,
    which are filled by means of Natural Language
    (NL) analysis.
  • IE combines pattern matching mechanisms,
    (shallow) NLP and domain knowledge (terminology
    and ontology).

47
Information Extraction (2)
  • IE is generally subdivided in following tasks
  • - Named Entity task (NE)
  • - Template Element task (TE)
  • - Template Relation task (TR)
  • - Scenario Template task (ST)
  • - Co-reference task (CO)

48
Subtask of IE
  • Named Entity task (NE) Mark into the text each
    string that represents, a person, organization,
    or location name, or a date or time, or a
    currency or percentage figure.
  • Template Element task (TE) Extract basic
    information related to organization, person, and
    artifact entities, drawing evidence from
    everywhere in the text.

49
Subtask of IE (2)
  • Template Relation task (TR) Extract relational
    information on employee_of, manufacture_of,
    location_of relations etc. (TR expresses
    domain-independent relationships).
  • Scenario Template task (ST) Extract
    pre-specified event information and relate the
    event information to particular organization,
    person, or artifact entities (ST identifies
    domain and task specific entities and relations).
  • Co-reference task (CO) Capture information on
    co-referring expressions, i.e. all mentions of a
    given entity, including those marked in NE and
    TE.

50
IE applied to soccer
  • Terms as descriptors for the NE task
  • Team Titelverteidiger Brasilien, den
    respektlosen Außenseiter Schottland
  • PlayerSuperstar Ronaldo, von Bewacher Calderwood
    noch von Abwehrchef Hendry, von Jackson als
    drittem Stürmer, Torschütze Cesar, von Roberto
    Carlos (16.),
  • Referee vom spanischen Schiedsrichter Garcia
    Aranda
  • Trainer Schottlands Trainer Brown, Kapitän
    Hendry seinen Keeper Leighton
  • Location im Stade de France von St. Denis (more
    fine-grained location detection would be
    Stadion im Stade de France and City von St.
    Denis )
  • Attendance Vor 80000 Zuschauern

51
IE applied to soccer (2)
  • Terms for NE Task
  • Time in der 73. Minute, nach gerade einmal 350
    Minuten, von Roberto Carlos (16.), nach einer
    knappen halben Stunde, scheiterte Rivaldo
    (49./52.) jeweils nur knapp, das vor der Pause
    Versäumte versuchten die Brasilianer nach
    Wiederbeginn, ...
  • Date am Mittwoch, der Turnierstart (?), im
    WM-Eröffnungsspiel (?)
  • Score/Result Brasilien besiegt Schottland 21,
    einen 21 (11)-Sieg, der zwischenzeitliche
    Ausgleich, in der 4. Minute in Führung gebracht,
    köpfte zum 10 ein

52
IE applied to soccer (3)
  • Relations for TR Task
  • Opponents Brasilien besiegt Schottland, feierte
    der Top-Favorit ... einen glücklichen 21
    (11)-Sieg über den respektlosen Außenseiter
    Schottland,
  • Player_of hatte Cesar Sampaio den vierfachen
    Weltmeister ... in Führung gebracht, Collins
    gelang ... der zwischenzeitliche Ausgleich für
    die Schotten, der Keeper des FC Aberdeen,
    Brasiliens Keeper Taffarel
  • Trainer_of Schottlands Trainer Brown
  • ...

53
IE applied to soccer (4)
  • Events for ST task
  • Goal in der 4. Minute in Führung gebracht, das
    schnellste Tor ... markiert, Cesar Sampaio köpfte
    zum 10 ein, Collins (38.) verwandelte den
    Strafstoß, hätte Kapitän Hendry seinen Keeper
    Leighton um ein Haar zum zweiten Mal bezwungen,
    von dem der Ball ins Tor prallte
  • Foul als er den durchlaufenden Gallacher im
    Strafraum allzu energisch am Trikot zog
  • Substitution und mußte in der 59. Minute für
    Crespo Platz machen...

54
NL Processing and Knowledge Markup of (German)
soccer texts with the SCHUG system
  • A multilingual ontological lexicon
  • Formal Text1
  • Formal Text2
  • XML Soccer Annotation for Text1
  • XML Soccer Annotation for Text2
  • Merging of Annotations for Formal Texts
  • Semi-Formal Text
  • Semi-Formal Text annotated with Soccer
    Information (XML)

 
 

55
Multilingual ExtensionSpanish (Esperonto)
  • Ontology
  • ltlex-element id"ID" conceptSecond-half"gt
  • lt... lang"DE" type"main"gtzweite
    Halbzeitlt/termgt
  • lt... lang"EN" type"main"gtsecond
    halflt/termgt
  • lt... langES" type"main"gtreanudacionlt/termgt
  • lt/lex-elementgt
  • .
  • Processing with the SCHUG system
  • Example

 
 

56
Conceptual Annotations for Multimedia Indexing
and Retrieval MUMIS
 

57
The first user interface of MUMIS
 
 

58
EsperontoPartners
Intelligent Software Components (Coord)
Semantic Web, Annotation Services. UPM ontology
development and evaluation. University of
Innsbruck Semantic Web languages. Saarland
University multilingual Annotation services,
using Information Extraction UNILIV Semantic
indexation of Semantic Web content. Routing
solutions. Visualization and navigation to make
content presentation user-friendlier. Residencia
de Estudiantes Content provider. Cultural tour
test case. Evaluation. CIDEM Content provider.
Fund finder test case. Evaluation. BioVista
Content provider. Scientific Discovery test case.
 
 

59
Aim
Application Service Provision of Semantic
Annotation, Aggregation, Indexing and Routing of
Textual, Multimedia, and Multilingual Web Content
The project aims at bridging the gap between the
actual World Wide Web and the semantic Web by
providing a service to "upgrade" existing content
to semantic Web content. Ontologies play a key
role in this effort, together with multilingual
Natural Language Analysis of textual documents
currently in the web as free or HTML encoded
texts.
 
 

60
Main Goals
  • To bridge the gap between the current web and the
    Semantic Web SemASP
  • Ontology-based annotation
  • Sources
  • Static pages
  • Pages dinamically generated from DB
  • Textual and multimedia information
  • Web services
  • Added value knowledge-based services on top of
    the constructed semantic web
  • Routing based on P2P communication
  • Semantic aggregation
  • Meaning negotiation
  • Support Multilinguality on ontology construction,
    ...

 
 

61
Applications

Agent
Visualization Service Provider
Multilingual NL Generation
Semantic Web

Semantic indices, Concept instances


Tagger/ Wrapper
Tagger/ Wrapper
Tagger/ Wrapper
Tagger/ Wrapper
Certificate
Multilinguality
Ontology Repository Service
Workbench
Reengineering
SemASP
Maintenance
Mapping
Multilingual
NL Understanding

World Wide Web
Static Information Provider
Web Server Provider
Dynamic Information Provider
Multimedia Data Provider
62
Ontology-based Annotation
  • Annotate accurately document with concepts and
    terms described in various semantic resources
    EuroWordNet, UMLS, Soccer ontology etc.
  • Annotate documents with relations defined in
    the ontology

 

63
Ontology construction from Text
There are various methodologies under
investigation for extracting/learning knowledge
from text, and to encode it in an ontology (see
Ontology Learning Overview - OntoWeb D1.5
http//www.ontoweb.org). Many are based on
Machine Learning techniques We discuss here the
possibility of a rule-based approach for partial
and shallow ontology construction from text,
based on various levels of syntactic patterns
annotated in the documents.
 

64
Ontology construction from Text A starting
experiment Medicine
Document Set 65 sample phrases that link
symptoms with Rheumatoid Arthritis (RA).
 

65
Ontology construction from Text Apposition and
Paranthesis (1)
The effects of rheumatoid arthritis on bone
include structural joint damage (erosions) and
osteoporosis Linguistic Structure The
effects of rheumatoid arthritis on bone
include structural joint damage ( erosions )
and osteoporosis gt The Apposition (2
syntactic heads joint and erosions in one NP)
including a parenthesis construction suggests a
synonymy relation or a definition. Heuristic
Establishing Semantic Relations on the top of
linguistic head-modifiers constructions
 

66
Ontology construction from Text Apposition with
Paranthesis (2)
  • For symptoms of rheumatoid arthritis (pain,
    joint stiffness), the reference treatment is a
    nonsteroidal antiinflammatory drug (NSAID) such
    as diclofenac or ibuprofen.
  • Linguistic Structure
  • For symptoms of rheumatoid arthritis ( pain ,
    joint stiffness ) , the reference treatment
    is a nonsteroidal antiinflammatory drug (
    NSAID)
  • Suggesting a semantic relation between (pain
    and joint stiffness)
  • Classify pain and joint stiffness as symptom
    of RA. The word symptom is linguistically
    annotated as the head of the Compl-NP of the PP
    starting with For.

 

67
Ontology construction from Text Apposition with
Paranthesis (3)

But there is a need for constraining the
hypothesis In patients with rheumatoid
arthritis (RA) gt RA is abbreviation of
rheumatoid arthritis And in the
sentence Fourteen consecutive elbows have been
treated for rheumatoid arthritis (9 elbows) and
for post-traumatic osteoarthrosis (5 elbows) by
total elbow replacement with the GSB III implant.
, the parenthesis (9 elbows) and (5 elbows)
have no semantic relations to the preceding head
nouns!
 

68
Ontology construction from Text Apposition with
commas

Etoricoxib, a selective COX2 inhibitor, has been
shown to be as effective as non-selective
non-steroidal anti-inflammatory drugs in the
management of chronic pain in rheumatoid
arthritis and osteoarthritis, Linguistic
Structure Etoricoxib, a selective COX2
inhibitor, has been shown The same
hypothesis as in the former examples a semantic
relation between Etoricoxib and selective
COX2 inhibitor. Probably a isa relation
 

69
Ontology construction from Text Compound Analysis
Joints destructions, joint damage, joint
disease, joint stiffness but joint
cartilage. Knee joints vs. tender joints
What can happen to joins, where are joints
located?. Use of synsets to detect relations?
Joint cartilage is not a disease.
 

70
Ontology construction from Text PP
post-modification
inflammation of joints, synovial lining of
joints Here use of synsets for grouping that
what can happen to joints?
 

71
Ontology construction from Text Phrase Internal
Coordination
  • The effects of rheumatoid arthritis on bone
    include structural joint damage (erosions) and
    structural joint damage
  • Linguistic Structure
  • The effects of rheumatoid arthritis on bone
    include structural joint damage ( erosions )
    and osteoporosis
  • RA causes structural joint damage AND structural
    joint damage (interpreting the head noun
    effects as a causation).
  • Hypothesis The two heads of an NP coordination
    are somehow related.

 

72
Ontology construction from Text Phrase Internal
Coordination (2)
  • A study was conducted to determine the incidence
    of ulnar and peripheral neuropathy
  • Linguistic Structure
  • The incidence of ulnar and peripheral
    neuropathy
  • The AP ulnar and peripheral AP modifies the
    head noun neuropathy. The AP is a coordinated
    one, having two Adjectival heads.
  • Hypothesis They correspond to two types of
    neuropathy

 

73
Ontology construction from Text Subject Verb
Objetcs (Ind. Obj. etc.)
Rheumatoid arthritis is an immunologically
mediated inflammation of joints of unknown
aetiology and often leads to disability gt RA
leads to Disability (effect of ellipsis
resolution RA detected as the subject of the
verb leads, even if not realised in text.
Reference resolution very important for knowledge
extraction) gt Lexical semantic info collects
all objects of RA leads to gtSuggest Causality
(verb lead to)
 

74
Ontology construction from Text Subject Verb
Objects (Ind. Obj etc.)
These changes constitute hallmarks of synovial
cell activation and contribute to both chronic
inflammation and hyperplasia On line exercise!
 

75
Future Work
Still have to identify accurately the sub-set of
linguistic tags, describing syntactic/semantic
patterns that are relevant for ontology
extraction (or even ontology mark-up).
 

76
First Conclusions
Construction of partial and shallow ontologies
from (complex) syntactic patterns seems feasible.
It might seem expensive in the sense that
documents first should be (automatically)
linguistically annotated. But Machine Learning
methods also needs a lot of semi-automatically
annotated data for training. A need to conduct a
comparative evaluation taking into account as
many parameters as possible.
 

77
Practical Sessions (Adrian Raschip)
  • Exercise 1 Semi-Automatic Terminological
    extension Romanian and other languages. On the
    base of the TMX encoded MUMIS multilingual
    terminology
  • Exercise 2 (Manual) linguistic annotation of
    English and Romanian Text on Soccer
  • Exercise 3 Define a soccer ontology in Protégé
  • Exercise 4 Search for possible mapping rules
    between linguistic annotations and relations that
    might be relevant to be extracted
Write a Comment
User Comments (0)
About PowerShow.com