Title: Nicoletta Calzolari ILC - CNR - Pisa, Italy
1Nicoletta CalzolariILC - CNR - Pisa, Italy
Language Resources Semantic Web
2To make the Semantic Web a reality ...
- need to tackle the twofold challenge of
- content availability and
- multilinguality
- Natural convergence with HLT
- multilingual semantic processing
- ontologies
- semantic-syntactic computational lexicons
3Computational Multilingual Lexicons an
essential component for the Semantic Web
- Language - lexicons - are the gateway to
knowledge - Semantic Web developers need repositories of
words terms - knowledge of their relations in
language use ontological classification. - The cost of adding this structured and
machine-understandable lexical information can be
one of the factors that delays its full
deployment. - The effort of making available millions of
words for dozens of languages is something that
no small group is able to afford. - A radical shift in the lexical paradigm - whereby
many participants add linguistic content
descriptions in an open distributed lexical
framework - is required to make the Web usable
4Infrastructure of Language Resources...
...static
- Semantic network Euro-/ItalWordNet
- Lexicons PAROLE/SIMPLE/CLIPS
- TreeBank
- sw
-
International Standards
But they will never be complete
dynamic
- Lexical acquisition systems (syntactic
semantic) from text corpora - Robust systems of morphosyntactic syntactic
analysis - Word-sense disambiguation systems
5Italian Semantic Network Italian module of
EuroWordNet (http//www.hum.uva.nl/ewn/)
- 50.000 lemmas organized in synonym groups
(synsets), structured in hierarchies linked by
130.000 semantic relations - 50.000 hyperonymy/hyponymy relations
- 16.000 relations among different POS (role,
cause, derivation, etc..) - 2.000 part-whole relations
- 1.500 antonymy relations, etc.
- Synsets linked to the InterLingual Index
(ILIPrinceton WordNet), - Through the ILI link to all the European WordNets
(de-facto standard) - to the common Top Ontology
- Possibility of plug-in with domain terminological
lexicons - Usable in IR, CLIR, IE, QA, ...
6Domain - Semantic class
mangiare
7Domain - Semantic class
zucchero
mangiare
NATURAL_SUBSTANCE
alloro
tartufo
FLAVOURING
cucinare
cuocere
VEGETAL_ENTITY
mestolo
friggere
mangiare
cucinare
mangiare
mangiare
mangiare
mangiare
mangiare
cucinarecuocerearrostirebollirelessarestufa
re friggere rosolaregrigliare
bollire
mangiare
pentola
mangiare
friggitrice
carne
tavola
forchetta
ristorante
mela
posata
BUILDING
cuoco
carota
FURNITURE
coniglio
bollitore
FOOD
pesce
FRUIT
arrosto
VEGETABLES
pesciera
SUBSTANCE_FOOD
INSTRUMENT
CONTAINER
PROFESSION
ARTIFACT _FOOD
8machine language learning
9machine language learning
linguistic learning
development of conceptual networks
linguistic change models
language usage models
adaptive classification systems
information extraction
bootstrapping of lexical information
bootstrapping of grammars
10Beyond MILE towards open distributed lexicons
Ontology URI http//www.zzz
Semantic Lexicon URI http//www.xxx
Syntactic Constructions URI http//www.yyy
Lex_object semFeature URI http//www.xxxHUMAN
Lex_object syntagmaNT URI http//www.zzzNP
Monolingual/Multilingual Lexicon
11Target.. Multilingual
Knowledge Management
Technical Feasibility
- Prerequisite is it an achievable goal a commonly
agreed text/lexicon annotation protocol also for
the semantic/conceptual level (to be able to
automatically establish links among different
languages)? - Yes, at the lexical level
-
- More complex, for corpus annotation?
EAGLES/ISLE
12A few Issues for discussionlexicon standards
- Semantic Web standards and the needs of content
processing technologies - importance of reaching consensus on (linguistic
and non-linguistic) content, in addition to
agreement on formats and encoding issues (words
convey content knowledge) - short/medium term requirements wrt standards for
multilingual lexicons content encoding, also
industrial requirements - Relation with Spoken language community
- MILE Asian languages how to cooperate
concretely? - Define further steps necessary to converge on
common priorities - .
13A few Issues for discussioncontent,
priorities...
- For which type of resources to invest? wrt short
vs. medium term results? - Need for robust systems, able to acquire/tune
lexical/linguistic (also multilingual) knowledge,
to auto-enrich static basic resources? - What the relation betw. lexical standards and
text annotation protocols? - Knowledge management is critical. For content
interoperability, is the field mature enough to
converge around agreed standards also for the
semantic/conceptual level (e.g. to automatically
establish links among different languages)? - Is the field of multilingual lexical resources
ready to tackle the challenges set by the
Semantic Web development?
Towards a new paradigm??
14A new paradigm for LR?
- Where the focus is on cooperation
- New Strategic Vision?
- towards a Distributed Open Lexical
Infrastructure? - for distributed cooperative creation,
management, etc. of Lexical Resources - technical organisational requirements
15ELITE (expression of interest for the 6thFP)
European Lexical Infrastructure and Technology
Language Resources Semantic Web
- New proposed paradigm for lexicon development
-
- Open Distributed Lexical Infrastructure
- for content description and content
interoperability, - to make lexical resources usable within the
emerging Semantic Web scenario