Title: Presentazione di PowerPoint
1 Lexicons and Complex
Expressions towards Multilingual Linking
Nicoletta Calzolari Copenhagen, October 2001
2What is SIMPLE?
A set of 12 harmonised computational lexicons for
HLT applications, geared for multilingual links
- A common
- rich model
- representation language
- methodology of building the lexicon
- common Template Types, with default obligatory
info (Type defining), and indication of optional
info - First time on a large scale, for so many
languages - Lexical meaning represented in terms of
integrated combinations of different sorts of
information (semantic type, argument structure,
relations, features, etc. ) - Ontology-based information comes together with
predicative representation and syntactic linking - A shared set of SemUs (from EWN) (about 700) of
the 12 Lexicons cross-lingually related
3PAROLE/SIMPLE Architecture CLIPS Italian
National Project
60,000 lemmas
55,000 lemmas
55,000 SemU
MuS
SynU
Sem Info
Sem Info
Sem Info
Sem Info
TEMPLATE
Lexical Rel
Sem. Rel
Sem. Feat
4Semantic information in SIMPLE
Word senses encoded as Semantic Units (SemUs),
containing the following info
- Semantic type
- Domain
- Lexicographic gloss
- Extdended Qualia structure
- Reg. Polysemy altern.
- Event type
- Derivation relations
- Synonymy
- Collocations
- Argument structure for predicative SemUs
- Selection restrictions on the arguments
- Link of the arguments to the syntactic
subcategorization frames (represented in the
PAROLE lexicons)
5Semantic Multidimensionality and NLP
NLP tasks (IE, WSD, NP Recognition, etc.) need to
access multidimensional aspects of word meaning,
represented in SIMPLE with the Extended Qualia
Relations
Is_a_part_of
Member_of
la pagina del libro (the page of the book) il
difensore della Juventus (Juventus fullback) il
suonatore di liuto (the lute player) il tavolo di
legno (the wooden table)
Telic
Made_of
6Overall Organization
...
Greek lexicon
Danish lexicon
Type Ontology ?150 types
Catalan lexicon
Template
Instantiation
Italian lexicon
Pred. Layer
7Semantic Information
The SIMPLE Way
The Core Ontology represents a first level of
organization of the semantic type system
Each type is associated to a Template consisting
of a cluster of information (relations,
features, argument structure, event type, etc.)
that defines the type
The information characterizing a Semantic Unit
includes
a. The type defining information (associated
to the template the SemU instantiates)
b. Additional information (other relations or
features, selectional restrictions, terminology,
cross-part of speech relations, polysemy, etc.)
8Template
redundancy
9 Verb Examples hear, smell, etc. Noun
Examples sight, look, etc. Linguistic
Tests . Levin Class 30.1 (See verb, e.g.
detect, see, notice), 30.4 (Stimulus subject,
e.g. look,
smell) Comments Processes involving an
experiencing relation, . SemU
1
ltguardare_2gt (look) Usyn BC
Number 105 Template_Type
Perception Template_Supertype
Psychological_event Domain
General Semantic Class
Perception Gloss
//free//
osservare con attenzione Event type
process Pred _Rep. Lex_Pred
(ltarg0gt,ltarg1gt) Derivation ltDerivational
relationgt Selectional Restr. arg0
Animate //concept// arg1default Entity
Formal isa
(1,ltSemUgtPerceptiongt)
ltpercepiregtPsych_ev Agentive
ltNilgt Constitutive instrument (1,
ltSemUgtBody_part) ltocchiogt
intentionality yes,no
//optional// yes Telic
ltNilgt Collocates Collocates
(ltSemU1gt,...ltSemUngt) Complex ltNilgt
Template for Perception
10Semantic Relations
Modular Representation of a SemU
Flexibility an extendable framework to allow
coherent future extensions tuning for specific
applications/text types
Pred. Layer
Predicate, arguments, selection restrictions, ..
Rel. Layer
Relations betw. SemUs
Features
Qualia multiple meaning dimensions in a sense
Derivation cross-PoS relations
Polysemy regular polysemous classes
Collocation collocational information
11Semantic Relations
..
Activity
..
..
?100 Rels.
- The targets of relations identify
- prototypical semantic information associated with
a SemU - elements of dictionary definitions of SemUs
- typical corpus collocates of the SemU
12Semantic Relations
Ala (wing)
ltfabbricaregt make
Agentive
SemU 3232 Type Part Part of an airplane
ltvolaregt fly
Used_for
Is_a_part_of
ltaeroplanogt airplane
Isa
SemU 3268 Type Part Part of a building
ltpartegt part
Isa
Used_for
Isa
SemU D358 Type Body_part Organ of birds for
flying
ltedificiogt building
Is_a_part_of
Is_a_part_of
SemU 3467 Type Role Role in football
ltuccellogt bird
ltgiocatoregt player
Isa
13Relations and Predicates
Pred_SELL ltARG0gt, ltARG1gt, ltARG2gt, ltARG3gt
SemU Sell V
Is_the_agent_of
SemU Sale N
SemU Seller N
Event_noun
14Argument Structure
Comprendere V
Comprensione N
SemU 61725 Type Cognitive_event To understand
SemU 61726 Type Cognitive_event Understanding
master
SemU 6962 Type Constitutive_state To include
verb_nominalization
Comprendere1 ltArg1 Humangt, ltArg2 Semioticgt
Comprendere2 ltArg1 Groupgt, ltArg2gt
master
problems with selection restrictions !!!
15SIMPLE/CLIPS figures (now)
(?11,000 Lex. Units) 16,903 SemUs
- Nouns 12161
- Verbs 3476
- Adjectives 1266
- Predicates 4368
- Templates
- Instrument 734
- Human 712
- PsychologicalProperty 586
- Profession 541
- Purpose_Act 535
- Part 503
- Human_Group 502
- Relational_Act 521
- AgentTemporaryActivity 320
- Domain 303
- Features Relations
- Agentive 1945
- EventTypeProcess 1846
- EventTypeTransition 1463
- AgentiveCause 1175
- Usedfor 1488
- Synonym 1258
- ResultingState 1197
- Isapartof 909
- Hasaspart 800
- Istheactivityof 611
- Objectoftheactivity 598
- AntonymGrad 575
- Createdby 525
- Agentverb 454
- Concerns 421
16Core Lexicons enlarged in National Projects
- PAROLE/SIMPLE/EWN start providing the common
platform - For the subsidiarity concept the process started
at the EU level is continued at the national
level -
- extended in (at least) 9 National Projects
- (Danish, Greek, Italian, Portuguese, Swedish,
...) - (to be) used in applications
- True Infrastructure of harmonised LRs in EU
- Basis for Multilingual LR
- ENABLER (coord. A. Zampolli)
17HarmonisationNeed for a Global View
- Interaction/sharing of data software/tools
- Need of compatibility among various components
- An exemplary cycle
- Formalisms
- Grammars
- Software Taggers,
- Chunkers, Parsers
- Representation
Annotation - Lexicon
Corpora - Software
- Acquisition Systems
- I/O Interfaces
Languages
18SIMPLE wrt EAGLES/ISLEStandards for
Multilingual Lexical resources
EAGLES guidelines for syntactic and semantic
lexicons
PAROLE/SIMPLE Lexicons
MT systems
ISLE recommendations for multilingual lexicons
Multilingual Lexicons
19Mission(http//lingue.ilc.pi.cnr.it/EAGLES96/isle
/ISLE_Home_Page.htm)
- MT and multilingual HLT need to enhance
production, maintenance extension of
computational lexical resources - ISLE goals
- provide a common environment for the development,
integration, interchange sharing of lexical
resources with various types of linguistic
information - establish a virtuous circle betw. research,
applications, standardization process lay down
a bridge betw. the worlds of research and
application - mark the boundary between well-consolidated
practice and theoretical achievements in
multilingual HLT, and areas still open to
research but critical for future technological
improvements - Crucial role of intercontinental cooperation for
preparing ISLE recommendations and for their
validation
20ISLE and MT
- Academic and industrial members of the MT
community actively involved in the ISLE group - Microsoft, NMSU, Sail Labs, Systran, UMIACS,
UPenn, ISI, etc. - Survey phase
- a number of lexical resources for MT systems
surveyed by ISLE - MT systems requirements provide the main
reference points for ISLE work, to determine - types of lexical information critical to SL ? TL
mapping - criteria to create bilingual resources from
existing monolingual ones - common data structures to develop reusable
multilingual resources - critical areas of the lexicon MWEs, complex
transfer cases, collocational/example-based
information, etc.
MWE parenthesis
21MWE in ISLE XMELLT - 2 types of MWE
- (Deverbal) nominalisations support (light) verbs
- make an acquisition1 (noun.act
verb.possession) - complete an acquisition1
- undertake an acquisition1
- make an application1 (noun/verb.communication)
- have an application1 in
- decide on an application1 (consider, hear)
- get an application1 (receive, take)
- submit an application1 (file)
- Noun(/Adj/Poss)Noun MW (Ital.
NPP/NAdj/NVinf/...) - air pollution
- job application
- murder suspect
- police action police scandal
- coltello da macellaio butcher's
knife - carta di credito credit card
1st
2nd
No equivalent structures
22The BoundariesSupport Verbs more than Light
Verbs? Nominalisations . to a broader set
1st
- Both verbs, combined with an event noun, whose
subjects are - participants in the event identified by the noun
- related to some scenario associated with the
event - Type 1 take an exam, give an exam
- Type 2 pass an exam, fail an exam, grade
(evaluate) an exam - Type 1 perform an operation, undergo an
operation - Type 2 survive an operation
- But also enlarge the concept of nominalisation
to - event/result/abstract nouns not morphologically
derived - dare un ceffone (to slap)
- provare rancore (to bear sb. a grudge)
- fare una festa (to have a party)
- fare festa (to have a holiday)
- fare festa a qno (to give sb. a warm welcome)
- prestare attenzione (to pay attention)
No verb (for diachronic reason)
23Hypothesis for encodingMelcuk type Lexical
Functions (LF)
1st
- to record semantic contribution and/or aspectual
properties conveyed by the V - to express argument-sharing betw 2 arg structures
- Oper1 perform an operation made an apology
- Oper2 undergo an operation merits discussion
had a visit - Func0 silence reign
- Laborij take into consideration
- Incep start the attack
- Cont maintain influence
- Fin complete the acquisition
- Liqu eradicate the disease
- Real keep the promise, approve the application
- AntiReal turn down, withdraw the application
- .
24Nominalisations examples from Corpus
1st
- accusa
- (supp-v formulare, lanciare, muovere,
rivolgere,... (Oper1) - subiredefault, beccarsi,
attirarsi, rischiare,... (Oper2) - mettere, porre,... sotto a.
(Laborij) - rintuzzare, rigettare, smontare,
(Liqu) - Problematic?
- ritorcere, rovesciare (...)
- sostenere, (...)
- ripetere, (...)
- ..
- __________________________________________________
__________ - acquisizione
- (supp-v (fare)default, condurre,
curare,effettuare,... (Oper1) - varare,... (Incep)
- perfezionare, completare,
concludere, (Fin) - evitare, compromettere, (Liqu)
- sfumare, (LiquFunc0)
- Problematic?
- annuciare, dichiarare, (say)
Automatic acquisition
25Support Verbs what to list for multilingual
lexicons?
1st
- Decide if to include/list, for a noun
- all the verbs usable for a Melcukian LF
- INCEP cominciare default vs. varare,
intraprendere, - INCEP begin default vs. open (an
investigation), - OPER1say a prayer (not make, like with other
speech act nouns) - OPER1pay attention
- only those lexically dedicated to that noun
(needed for generation) (not the general
available by default for a LF) - begin an exam/operation or finish an
exam/operation -
- similar words preferentially select different
verbs to express similar meanings (same lexical
functions) lexical preference
26Complex nominals in a multilingual framework
2nd
- Different syntactic patterns in L1 L2
- NNh ( head noun) in English is usually NhPP in
Italian - tooth brush spazzolino da denti
- the syntactic pattern is not predictable
- hair/clothes brush spazzola per capelli/abiti
- nail brush spazzola per le unghie
- travel agency agenzia di viaggi
- real estate agency agenzia immobiliare
- marriage bureau agenzia matrimoniale
- A MWE in L1 corresponding to a fully
compositional phrase - cucchiaino da caffè coffee spoon???
- For MT implies some conceptual (interlingual?)
representation -
- but the encoding process must find an
appropriate MWE if it is called for - analogous to blocking/pre-emption a
regular/compositional process is not carried out
(dispreferred) because the semantic space
occupied by the concept associated with that
formation is already claimed by some ready-made
expression?
Fillmore
27Broader scope extension to non MWE?
2nd
- If look at devices in grammar that allow to
produce new MWEs - a continuum
- NPPgtcollocationgtmulti-wordgtidiom
- productive mechanisms in the language
- but idiosyncratic
- information at the borderline betw. grammar
lexicon - Amounts to
- describe productive modification relation of N in
general - in particular those lexically selected/preferred
by a N (its semantic paradigm) - MWE are a subset of these
- (give good hints to discover most prominent
relations??) - look at the semantic structure of Nouns i.e. at
the variety of modifiers they can select by
virtue of their meaning
Fillmore
28Noun Compounds/Complex Nominalsare pervasive
2nd
- There is a motivation in most NN construction
- the context provides it
- The FrameNet (SIMPLE) way
- appeal to specific frame structures (qualia
structures) associated with the head noun, - determine from corpus attestations which frame
elements (qualia) can get instantiated as a
modifier word - container complex nominals can specify
- material (aluminium c., glass c., )
- contents (food c., trash c., )
- size (3 quart c., )
- function (shipping c., storage c., )
- ...
Fillmore Busa
29Noun Compounds/Complex Nominals
multidimensional semantic approaches
2nd
- a. FrameNet
- Container Frame Frame Elements
Material,Contents,Size,Function - Material aluminum container, glass c., metal c.,
tin c. - Contents food container, beverage c., trash c.,
water c., milk c., fuel c. - Size 3 quart container
- Function shipping container, storage c.
- b. SIMPLE
- Qualia Relations of "container" used in
compounds - Constitutive made_of MATERIAL
- aluminum container, glass c., metal c., tin
c. - Telic contains ENTITY
- food container, beverage c., trash c., water
c., milk c., fuel c. - Constitutivesize QUANTITY
- 3 quart container
- Telicis_used_for EVENT
- shipping container, storage c.
30Complex Nominals/Lexical Constructions in a
multilingual context
2nd
- describe vs. list?
- if a compound noun is clearly lexicalized, it's
simply one of the words in L1 - but if it is an instance of some productive
word-formation rule, we should describe it - both describe list
- list explicitly in the lexical entry
- what is idiomatic/idiosyncratic wrt generation
for - lexical selection
- mucca pazza vs. matta
- prestare attenzione vs. pay attention
- structural pattern
- travel agency agenzia di viaggi
- marriage bureau agenzia matrimoniale (di
matrimonio) - real estate agency agenzia immobiliare
31In a multilingual context
2nd
- ...regularities in each language, but they dont
match - Both for decoding encoding, we need both
- a linguistic apparatus for interpretation
- (e.g. to go to a language where it is not a
MWE - cucchiaino da caffè for a
Japanese useful to know used for) - lists for idioms, for unpredictable/idiosyncrati
c - Same apparatus to interpret both MWE regular N
constructions (similar power of expressiveness)
general principles of semantic constitution of
lex. items their combinatorics in terms e.g.
of frames/qualia/ - basic sem. notions
- a general schema to characterise the problem,
e.g. - frame (qualia) structure of the headN
- semantic Type of the modifier N
- allow the headN to impose its interpretation on
the modification rel. - ...
32Complex nominals, e.g. knife (coltello) triggers
2nd
- a cutting frame (FrameNet)
- specific SIMPLE dimensions of meaning
- extensively evaluate whether qualia roles
(already) encoded in SIMPLE correspond to what is
necessary to interpret N-N modification
relations - SIMPLE Extended Qualia structure
- for the interpretation of the semantic relation
betw. Ns - (internal relational structure of MWE)
- butchers knife (coltello da macellaio) ? TELIC
(used_by) Y Human ? PPda - plastic knife (coltello di plastica) ?
CONST (made_of) X Material ?PPdi - table knife (coltello da tavola) ? TELIC
(used_in) Z Location ?PPda - hunting knife (coltello da caccia) ? TELIC
(used_in_activity) E Activity ?Ppda - piatto di legno ? CONST (made_of) X Material
?PPdi - piatto di pasta ? CONST (contains) X Food ?PPdi
PP disambig.
33In SIMPLE possible extension
2nd
- Deverbal nominalisation
- noun murder (uccisione, delitto, omicidio
(different sem. pref.) - ?PPdi
PREDMURDER(uccidere) - ?PPda_parte_di, di
ARG1agentHum/Anim? - verb murder (uccidere)
ARG2patientHum/Anim? - ? subjNP
MOD1instrWeapon - ? objNP
MOD2meansAction - MOD3......
- instr PPcon Weapon (knife m., con coltello)
- means PPper Action (strangulation m., per
strangolamento) - loc Ppplocdi Location (Kent State murders,
nel ...) - time Ppptimedi Time (1983 murders, del 1983)
As if it were a Situation
34 Monolingual Linguistic Representation
Strategy
- consider as the starting point for MILE the
edited union of the basic notions represented in
the existing syntactic/semantic lexicons (their
models) - evaluate their notions wrt EAGLES recommendations
for syntax and semantics - evaluate their usefulness adequacy for
multilingual tasks - evaluate integrability of their notions in a
unitary MILE - look for deficient areas, e.g. MWE
- ...
To be decided should ISLE reach a consensus at
the level of the types of information only, or
also at the level of their token values? .
different answers for diff. notions
35 the Multilingual ISLE Lexical Entry
(MILE)
- General methodological principles (from EAGLES)
- Basic requirements for the MILE
- Discover and list the (maximal) set of basic
notions needed to describe the MILE (up to which
level standardisation is feasible?) - Granularity
- The leading principle for the design of the MILE
the edited union of existing lexicons/models
(redundancy is not a problem) - Modular and layered various degrees of
specification possible - Allow for underspecification ( hierarchical
structure)
36The MILE
- Main features
- factor out primitive units of lexical information
- explicit representation of information to be
targeted by multilingual NLP tools - rely on lexical analyses with the highest degree
of inter-theoretical agreement - avoid framework-specific representational
solutions - open to different paradigms of multilinguality
- oriented to the creation of large-scale lexical
databases
37MILE
- Objective definition of the MILE
- as a meta-entry to act as a common format for
resource sharing and integration/architecture for
lexical data encoding - ? its basic notions
- ? general architecture
- formalized as an entity-rel.
- model (XML, RDF, etc.)
- with a tool to support it
-
-
- open to task- system-dependent parameterisation
38Agreed Principles
- MILE builds on the monolingual entry expands it
- MILE incorporates previous EAGLES recommendations
-
- is the complete entry
- adopt as starting point the PAROLE/SIMPLE DTD
- to be revised, augmented, ...
- We consider 2 broad categories of
applications - MT
- CLIR (linking module may be simpler/ontology
based) - (label info types wrt application)
39Modularity in MILE
- Advantages
- Flexibility of representation
- Easy to customise and update
- Easy integration of existing resources
- High versatility towards different applications
- Modularity at least under three respects
- in the macrostructure and general architecture of
the MILE - in the microstructure of the MILE
- monolingual linguistic representation (previous
EAGLES revised/updated) - collocational/corpus-driven information (new)
- multilingual apparatus (e.g. transfer conditions
and actions interlingua) (new) - in the specific microstructure of the MILE
word-sense
40Modularity in MILE
A. MILE Macrostructure
C. Word-Sense Microstructure
MILE
B. MILE Microstructure
41The MILE Architecture Monolingual Lexical
Description
- three independent and yet linked layers
characterising the MILE in a source language - possibly corresponds to the typology of
information contained in major existing lexicons,
such as PAROLE-SIMPLE, (Euro)WordNet, COMLEX,
FrameNet, etc. - simple and complex lexical unit (to account for
MWEs) - various degrees of granularity of lexical units
representation
semantic layer
correspondence conditions
syntactic layer
morphological layer
42The MILE ArchitectureMultilingual Layer
- acts as an (independent) interface layer between
monolingual lexicons
multilingual layer
semantic layer
correspondence conditions
syntactic layer
Lexicon 1
Lexicon 2
morphological layer
43The MILE Multilingual Layer.(NEW)
- Correspondences can be established between
different types of linguistic objects (strings,
syntactic descriptions, semantic elements,
predicates, etc.) - Transfer tests and actions to target various
types of lexical information in the monolingual
layers - constrain syntactic positions and their fillers
- lexicalize syntactic positions
- add positions or arguments
- add new features to define more fine-grained
sense distinctions relevant at the multilingual
level - restructuring argument configurations
- collocational information
- ...
44Paths to Discover theBasic Notions of MILE
- clues in dictionaries to decide on target
equivalent - guidelines for lexicographers
- clues (to disambiguate/translate) in corpus
concordances - lexical requirements from various types of
transfer conditions and actions in MT systems - lexical requirements from interlingua-based
systems
45- Organisational Proposal
- division of labour
- Highlighted some hot issues assigned tasks
- sense indicators (EU)
- selection preferences (EU)
- lexicographic relevance (EU)
- argument structure (US)
- MWE (EU US)
- collocations parallel corpora (US)
- modifiers (EU)
- semantic relations (EU)
- transfer conditions (EU US)
- collocational patterns (US)
- ontology (US)
- metaphors (EU)
- interlingua requirements (US)
- spoken lexicon (EU)
- meta-representation (US EU)
- ...
46Organisational ProposalThe tasks will lead to
- an in-depth analysis of each area aiming at
identifying - the most stable solutions adopted in the
community - linguistic specifications and criteria
- possible representational solutions, their
compatibility, etc. - evaluation of their respective weight/importance
in a multilingual lexicon (towards a layered
approach to recommendations) - open issues and current boundaries of the
state-of-the-art (which cannot be standardised
yet) - model limitations through creation of a sample
dictionary -
- see how the various pieces fit together can be
merged in a unified proposal - evaluate if we can combine in a hybrid
super-model the transfer interlingua approaches
47Information Types examples
48CLWG Ongoing Activities
- to prepare a preliminary proposal of the MILE
-
- existing models for lexical representation and
data interchange (Genelex, Olif, etc.) are
explored - model limitations and expressive power are tested
through creation of sample entries in a few
languages - groups at work
- lexical description and information types of
relevant info - lexicographic exploration systematic summary
classification of types of transfer tests (also
extracted from MRDs) - multilingual correspondences
- lexical data modeling format representation
issues - tool development
49Representation issues
- Working with GENELEX, lexicon development work is
(can be) affected by - impossibility (or difficulty) of defining
abstract and general classes or types of objects - lack of inheritance mechanisms
- lack of default expression and default rewriting
mechanisms - Cf. Lexical templates in SIMPLE
- not included in the GENELEX data-structure
- implemented in the editing sw. tool
- very useful to capture relevant lexical
generalizations, enhance consistency in encoding,
speed-up lexicographers work, etc.
50CLWG Ongoing Activity
MILE Lexical Objects Formal Specifications
MILE Lexical Entry Formal Specifications
MILE Shared Lexical Objects
User Defined Lexical Objects
Monolingual Multilingual Lexicons
51- MILE Repository of Shared Lexical Objects
- Basic syntactic constructions (e.g. transitive,
etc.) - (Micro-)semantic objects (e.g. features,
relations) - (Macro-)semantic objects (e.g. lexical templates)
- Multilingual constructions (e.g. basic transfer
conditions and actions) -
MILE Shared Lexical Objects
Simplify using MILE
- - New Lexical objects defined by the User
according to the common MILE formal
data-structure specification. - Sub-types of the Shared MILE Objects
- Possibly enriched with metadata defining their
semantics and usage
User-Defined Lexical Objects
Monolingual Multilingual Lexicons
- Lexical entries obtained by referring to
various lexical objects (both Shared and
User-defined) - The MILE lexical entry model
specifies how lexical objects can be combined to
achieve the proper lexical representation
52Involvement of Asian Languages
- participation in last meetings
- some input from Asia
- formal cooperation EU-ASIA steps to put in
motion
53Impact synergies
- real impact to be evaluated later
- through the use in applications
- already its being a US/EU project
- the Asian interest
- synergies now, e.g.
- PAROLE/SIMPLE (also instantiated in 9 national
projects) main input - EuroWordNet provides input
- XMELLT (NSF) provides input
- OLIF expects ( provides) input
- SALT complementary
- ENABLER validation ( expects input)
- ELSNET validation
- SENSEVAL validation
- NIMM WG for Metadata for CL(also with the US
OLAC) - ...
54Target . Multilingual Content
Management the
Resources viewpoint
- The relevance/impact of (good vs. less good) LRs
for high-quality Cross/Multilingual systems is
high, even if not easily measurable. - Different applications, component technologies -
approaches within - need different info types
(e.g. CLIR or content access systems wrt MT) - For each, need to specify (not an easy task)
- clear lexical/linguistic/conceptual requirements
- priority info types (which, how encoded, etc.)
- the respective role of e.g. annotated corpora,
mono- bi- multilingual lexicons (with different
info types), ontologies, KBs
55Economic Feasibilityfor which (Multilingual)
Resources to invest?
- Wrt short- vs. medium-term impact
- Basic, general purpose bi-/multilingual lexicons,
but to be tuned, adapted to different
applications - need of robust systems able to acquire/tune
- (multilingual) lexical/linguistic/conceptual
knowledge, - to accompany static basic resources
- We shouldnt rely only on parallel
corpora. More advisable to aim at - reliable methods for acquisition
use of comparable corpora, - accompanied by
- robust technologies for annotation
(at different levels morphosyntactic, - syntactic/functional, semantic,
), and by - a shared set of (text) annotation schemata
56Target.. Multilingual
Knowledge Management
Technical Feasibility
- Prerequisite is it an achievable goal a commonly
agreed text/lexicon annotation protocol also for
the semantic/conceptual level (to be able to
automatically establish links among different
languages)? - Yes, at the lexical level
-
- More complex, for corpus annotation?
EAGLES/ISLE
57Content for practical use Gap betw. Resources
and Systems?
?
- If we had real-size lexicons with very
fine-grained semantic/conceptual info, would
there be systems (non ad-hoc toy systems) able to
use them? - A vicious circle between
- i) lack of suitable, large-size and knowledge
intensive, resources (lexicons and corpora, with
many different types of syntactic and semantic
information encoded), and - ii) systems ability to use them effectively
- The two targets should be pursued in parallel,
- should closely interact with each other, and
- be gradually integrated