Presentazione di PowerPoint - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Presentazione di PowerPoint

Description:

common Template Types, with default obligatory info (Type ... Agentive 1945. EventTypeProcess 1846. EventTypeTransition 1463. AgentiveCause 1175. Usedfor 1488 ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 58
Provided by: nicole191
Category:

less

Transcript and Presenter's Notes

Title: Presentazione di PowerPoint


1

Lexicons and Complex
Expressions towards Multilingual Linking
Nicoletta Calzolari Copenhagen, October 2001
2
What is SIMPLE?
A set of 12 harmonised computational lexicons for
HLT applications, geared for multilingual links
  • A common
  • rich model
  • representation language
  • methodology of building the lexicon
  • common Template Types, with default obligatory
    info (Type defining), and indication of optional
    info
  • First time on a large scale, for so many
    languages
  • Lexical meaning represented in terms of
    integrated combinations of different sorts of
    information (semantic type, argument structure,
    relations, features, etc. )
  • Ontology-based information comes together with
    predicative representation and syntactic linking
  • A shared set of SemUs (from EWN) (about 700) of
    the 12 Lexicons cross-lingually related

3
PAROLE/SIMPLE Architecture CLIPS Italian
National Project
60,000 lemmas
55,000 lemmas
55,000 SemU
MuS
SynU
Sem Info
Sem Info
Sem Info
Sem Info
TEMPLATE
Lexical Rel
Sem. Rel
Sem. Feat
4
Semantic information in SIMPLE
Word senses encoded as Semantic Units (SemUs),
containing the following info
  • Semantic type
  • Domain
  • Lexicographic gloss
  • Extdended Qualia structure
  • Reg. Polysemy altern.
  • Event type
  • Derivation relations
  • Synonymy
  • Collocations
  • Argument structure for predicative SemUs
  • Selection restrictions on the arguments
  • Link of the arguments to the syntactic
    subcategorization frames (represented in the
    PAROLE lexicons)

5
Semantic Multidimensionality and NLP
NLP tasks (IE, WSD, NP Recognition, etc.) need to
access multidimensional aspects of word meaning,
represented in SIMPLE with the Extended Qualia
Relations
Is_a_part_of
Member_of
la pagina del libro (the page of the book) il
difensore della Juventus (Juventus fullback) il
suonatore di liuto (the lute player) il tavolo di
legno (the wooden table)
Telic
Made_of
6
Overall Organization

...
Greek lexicon
Danish lexicon
Type Ontology ?150 types
Catalan lexicon
Template
Instantiation
Italian lexicon
Pred. Layer

7
Semantic Information
The SIMPLE Way
The Core Ontology represents a first level of
organization of the semantic type system
Each type is associated to a Template consisting
of a cluster of information (relations,
features, argument structure, event type, etc.)
that defines the type
The information characterizing a Semantic Unit
includes
a. The type defining information (associated
to the template the SemU instantiates)
b. Additional information (other relations or
features, selectional restrictions, terminology,
cross-part of speech relations, polysemy, etc.)
8
Template
redundancy
9
Verb Examples hear, smell, etc. Noun
Examples sight, look, etc. Linguistic
Tests . Levin Class 30.1 (See verb, e.g.
detect, see, notice), 30.4 (Stimulus subject,
e.g. look,
smell) Comments Processes involving an
experiencing relation, . SemU
1
ltguardare_2gt (look) Usyn BC
Number 105 Template_Type
Perception Template_Supertype
Psychological_event Domain
General Semantic Class
Perception Gloss
//free//
osservare con attenzione Event type
process Pred _Rep. Lex_Pred
(ltarg0gt,ltarg1gt) Derivation ltDerivational
relationgt Selectional Restr. arg0
Animate //concept// arg1default Entity
Formal isa
(1,ltSemUgtPerceptiongt)
ltpercepiregtPsych_ev Agentive
ltNilgt Constitutive instrument (1,
ltSemUgtBody_part) ltocchiogt
intentionality yes,no
//optional// yes Telic
ltNilgt Collocates Collocates
(ltSemU1gt,...ltSemUngt) Complex ltNilgt
Template for Perception
10
Semantic Relations
Modular Representation of a SemU
Flexibility an extendable framework to allow
coherent future extensions tuning for specific
applications/text types

Pred. Layer
Predicate, arguments, selection restrictions, ..
Rel. Layer
Relations betw. SemUs
Features
Qualia multiple meaning dimensions in a sense
Derivation cross-PoS relations
Polysemy regular polysemous classes
Collocation collocational information
11
Semantic Relations
..
Activity
..
..
?100 Rels.
  • The targets of relations identify
  • prototypical semantic information associated with
    a SemU
  • elements of dictionary definitions of SemUs
  • typical corpus collocates of the SemU

12
Semantic Relations
Ala (wing)
ltfabbricaregt make
Agentive
SemU 3232 Type Part Part of an airplane
ltvolaregt fly
Used_for
Is_a_part_of
ltaeroplanogt airplane
Isa
SemU 3268 Type Part Part of a building
ltpartegt part
Isa
Used_for
Isa
SemU D358 Type Body_part Organ of birds for
flying
ltedificiogt building
Is_a_part_of
Is_a_part_of
SemU 3467 Type Role Role in football
ltuccellogt bird
ltgiocatoregt player
Isa
13
Relations and Predicates
Pred_SELL ltARG0gt, ltARG1gt, ltARG2gt, ltARG3gt
SemU Sell V
Is_the_agent_of
SemU Sale N
SemU Seller N
Event_noun
14
Argument Structure
Comprendere V
Comprensione N
SemU 61725 Type Cognitive_event To understand
SemU 61726 Type Cognitive_event Understanding
master
SemU 6962 Type Constitutive_state To include
verb_nominalization
Comprendere1 ltArg1 Humangt, ltArg2 Semioticgt
Comprendere2 ltArg1 Groupgt, ltArg2gt
master
problems with selection restrictions !!!
15
SIMPLE/CLIPS figures (now)
(?11,000 Lex. Units) 16,903 SemUs
  • Nouns 12161
  • Verbs 3476
  • Adjectives 1266
  • Predicates 4368
  • Templates
  • Instrument 734
  • Human 712
  • PsychologicalProperty 586
  • Profession 541
  • Purpose_Act 535
  • Part 503
  • Human_Group 502
  • Relational_Act 521
  • AgentTemporaryActivity 320
  • Domain 303
  • Features Relations
  • Agentive 1945
  • EventTypeProcess 1846
  • EventTypeTransition 1463
  • AgentiveCause 1175
  • Usedfor 1488
  • Synonym 1258
  • ResultingState 1197
  • Isapartof 909
  • Hasaspart 800
  • Istheactivityof 611
  • Objectoftheactivity 598
  • AntonymGrad 575
  • Createdby 525
  • Agentverb 454
  • Concerns 421

16
Core Lexicons enlarged in National Projects
  • PAROLE/SIMPLE/EWN start providing the common
    platform
  • For the subsidiarity concept the process started
    at the EU level is continued at the national
    level
  • extended in (at least) 9 National Projects
  • (Danish, Greek, Italian, Portuguese, Swedish,
    ...)
  • (to be) used in applications
  • True Infrastructure of harmonised LRs in EU
  • Basis for Multilingual LR
  • ENABLER (coord. A. Zampolli)

17
HarmonisationNeed for a Global View
  • Interaction/sharing of data software/tools
  • Need of compatibility among various components
  • An exemplary cycle
  • Formalisms
  • Grammars
  • Software Taggers,
  • Chunkers, Parsers
  • Representation
    Annotation
  • Lexicon

    Corpora
  • Software
  • Acquisition Systems
  • I/O Interfaces

Languages
18
SIMPLE wrt EAGLES/ISLEStandards for
Multilingual Lexical resources
EAGLES guidelines for syntactic and semantic
lexicons
PAROLE/SIMPLE Lexicons
MT systems
ISLE recommendations for multilingual lexicons
Multilingual Lexicons
19
Mission(http//lingue.ilc.pi.cnr.it/EAGLES96/isle
/ISLE_Home_Page.htm)
  • MT and multilingual HLT need to enhance
    production, maintenance extension of
    computational lexical resources
  • ISLE goals
  • provide a common environment for the development,
    integration, interchange sharing of lexical
    resources with various types of linguistic
    information
  • establish a virtuous circle betw. research,
    applications, standardization process lay down
    a bridge betw. the worlds of research and
    application
  • mark the boundary between well-consolidated
    practice and theoretical achievements in
    multilingual HLT, and areas still open to
    research but critical for future technological
    improvements
  • Crucial role of intercontinental cooperation for
    preparing ISLE recommendations and for their
    validation

20
ISLE and MT
  • Academic and industrial members of the MT
    community actively involved in the ISLE group
  • Microsoft, NMSU, Sail Labs, Systran, UMIACS,
    UPenn, ISI, etc.
  • Survey phase
  • a number of lexical resources for MT systems
    surveyed by ISLE
  • MT systems requirements provide the main
    reference points for ISLE work, to determine
  • types of lexical information critical to SL ? TL
    mapping
  • criteria to create bilingual resources from
    existing monolingual ones
  • common data structures to develop reusable
    multilingual resources
  • critical areas of the lexicon MWEs, complex
    transfer cases, collocational/example-based
    information, etc.

MWE parenthesis
21
MWE in ISLE XMELLT - 2 types of MWE
  • (Deverbal) nominalisations support (light) verbs
  • make an acquisition1 (noun.act
    verb.possession)
  • complete an acquisition1
  • undertake an acquisition1
  • make an application1 (noun/verb.communication)
  • have an application1 in
  • decide on an application1 (consider, hear)
  • get an application1 (receive, take)
  • submit an application1 (file)
  • Noun(/Adj/Poss)Noun MW (Ital.
    NPP/NAdj/NVinf/...)
  • air pollution
  • job application
  • murder suspect
  • police action police scandal
  • coltello da macellaio butcher's
    knife
  • carta di credito credit card

1st
2nd
No equivalent structures
22
The BoundariesSupport Verbs more than Light
Verbs? Nominalisations . to a broader set
1st
  • Both verbs, combined with an event noun, whose
    subjects are
  • participants in the event identified by the noun
  • related to some scenario associated with the
    event
  • Type 1 take an exam, give an exam
  • Type 2 pass an exam, fail an exam, grade
    (evaluate) an exam
  • Type 1 perform an operation, undergo an
    operation
  • Type 2 survive an operation
  • But also enlarge the concept of nominalisation
    to
  • event/result/abstract nouns not morphologically
    derived
  • dare un ceffone (to slap)
  • provare rancore (to bear sb. a grudge)
  • fare una festa (to have a party)
  • fare festa (to have a holiday)
  • fare festa a qno (to give sb. a warm welcome)
  • prestare attenzione (to pay attention)

No verb (for diachronic reason)
23
Hypothesis for encodingMelcuk type Lexical
Functions (LF)
1st
  • to record semantic contribution and/or aspectual
    properties conveyed by the V
  • to express argument-sharing betw 2 arg structures
  • Oper1 perform an operation made an apology
  • Oper2 undergo an operation merits discussion
    had a visit
  • Func0 silence reign
  • Laborij take into consideration
  • Incep start the attack
  • Cont maintain influence
  • Fin complete the acquisition
  • Liqu eradicate the disease
  • Real keep the promise, approve the application
  • AntiReal turn down, withdraw the application
  • .

24
Nominalisations examples from Corpus
1st
  • accusa
  • (supp-v formulare, lanciare, muovere,
    rivolgere,... (Oper1)
  • subiredefault, beccarsi,
    attirarsi, rischiare,... (Oper2)
  • mettere, porre,... sotto a.
    (Laborij)
  • rintuzzare, rigettare, smontare,
    (Liqu)
  • Problematic?
  • ritorcere, rovesciare (...)
  • sostenere, (...)
  • ripetere, (...)
  • ..
  • __________________________________________________
    __________
  • acquisizione
  • (supp-v (fare)default, condurre,
    curare,effettuare,... (Oper1)
  • varare,... (Incep)
  • perfezionare, completare,
    concludere, (Fin)
  • evitare, compromettere, (Liqu)
  • sfumare, (LiquFunc0)
  • Problematic?
  • annuciare, dichiarare, (say)

Automatic acquisition
25
Support Verbs what to list for multilingual
lexicons?
1st
  • Decide if to include/list, for a noun
  • all the verbs usable for a Melcukian LF
  • INCEP cominciare default vs. varare,
    intraprendere,
  • INCEP begin default vs. open (an
    investigation),
  • OPER1say a prayer (not make, like with other
    speech act nouns)
  • OPER1pay attention
  • only those lexically dedicated to that noun
    (needed for generation) (not the general
    available by default for a LF)
  • begin an exam/operation or finish an
    exam/operation
  • similar words preferentially select different
    verbs to express similar meanings (same lexical
    functions) lexical preference

26
Complex nominals in a multilingual framework
2nd
  • Different syntactic patterns in L1 L2
  • NNh ( head noun) in English is usually NhPP in
    Italian
  • tooth brush spazzolino da denti
  • the syntactic pattern is not predictable
  • hair/clothes brush spazzola per capelli/abiti
  • nail brush spazzola per le unghie
  • travel agency agenzia di viaggi
  • real estate agency agenzia immobiliare
  • marriage bureau agenzia matrimoniale
  • A MWE in L1 corresponding to a fully
    compositional phrase
  • cucchiaino da caffè coffee spoon???
  • For MT implies some conceptual (interlingual?)
    representation
  • but the encoding process must find an
    appropriate MWE if it is called for
  • analogous to blocking/pre-emption a
    regular/compositional process is not carried out
    (dispreferred) because the semantic space
    occupied by the concept associated with that
    formation is already claimed by some ready-made
    expression?

Fillmore
27
Broader scope extension to non MWE?
2nd
  • If look at devices in grammar that allow to
    produce new MWEs
  • a continuum
  • NPPgtcollocationgtmulti-wordgtidiom
  • productive mechanisms in the language
  • but idiosyncratic
  • information at the borderline betw. grammar
    lexicon
  • Amounts to
  • describe productive modification relation of N in
    general
  • in particular those lexically selected/preferred
    by a N (its semantic paradigm)
  • MWE are a subset of these
  • (give good hints to discover most prominent
    relations??)
  • look at the semantic structure of Nouns i.e. at
    the variety of modifiers they can select by
    virtue of their meaning

Fillmore
28
Noun Compounds/Complex Nominalsare pervasive
2nd
  • There is a motivation in most NN construction
  • the context provides it
  • The FrameNet (SIMPLE) way
  • appeal to specific frame structures (qualia
    structures) associated with the head noun,
  • determine from corpus attestations which frame
    elements (qualia) can get instantiated as a
    modifier word
  • container complex nominals can specify
  • material (aluminium c., glass c., )
  • contents (food c., trash c., )
  • size (3 quart c., )
  • function (shipping c., storage c., )
  • ...

Fillmore Busa
29
Noun Compounds/Complex Nominals
multidimensional semantic approaches
2nd
  • a. FrameNet
  • Container Frame Frame Elements
    Material,Contents,Size,Function
  • Material aluminum container, glass c., metal c.,
    tin c.
  • Contents food container, beverage c., trash c.,
    water c., milk c., fuel c.
  • Size 3 quart container
  • Function shipping container, storage c.
  • b. SIMPLE
  • Qualia Relations of "container" used in
    compounds
  • Constitutive made_of MATERIAL
  • aluminum container, glass c., metal c., tin
    c.
  • Telic contains ENTITY
  • food container, beverage c., trash c., water
    c., milk c., fuel c.
  • Constitutivesize QUANTITY
  • 3 quart container
  • Telicis_used_for EVENT
  • shipping container, storage c.

30
Complex Nominals/Lexical Constructions in a
multilingual context
2nd
  • describe vs. list?
  • if a compound noun is clearly lexicalized, it's
    simply one of the words in L1
  • but if it is an instance of some productive
    word-formation rule, we should describe it
  • both describe list
  • list explicitly in the lexical entry
  • what is idiomatic/idiosyncratic wrt generation
    for
  • lexical selection
  • mucca pazza vs. matta
  • prestare attenzione vs. pay attention
  • structural pattern
  • travel agency agenzia di viaggi
  • marriage bureau agenzia matrimoniale (di
    matrimonio)
  • real estate agency agenzia immobiliare

31
In a multilingual context
2nd
  • ...regularities in each language, but they dont
    match
  • Both for decoding encoding, we need both
  • a linguistic apparatus for interpretation
  • (e.g. to go to a language where it is not a
    MWE
  • cucchiaino da caffè for a
    Japanese useful to know used for)
  • lists for idioms, for unpredictable/idiosyncrati
    c
  • Same apparatus to interpret both MWE regular N
    constructions (similar power of expressiveness)
    general principles of semantic constitution of
    lex. items their combinatorics in terms e.g.
    of frames/qualia/
  • basic sem. notions
  • a general schema to characterise the problem,
    e.g.
  • frame (qualia) structure of the headN
  • semantic Type of the modifier N
  • allow the headN to impose its interpretation on
    the modification rel.
  • ...

32
Complex nominals, e.g. knife (coltello) triggers
2nd
  • a cutting frame (FrameNet)
  • specific SIMPLE dimensions of meaning
  • extensively evaluate whether qualia roles
    (already) encoded in SIMPLE correspond to what is
    necessary to interpret N-N modification
    relations
  • SIMPLE Extended Qualia structure
  • for the interpretation of the semantic relation
    betw. Ns
  • (internal relational structure of MWE)
  • butchers knife (coltello da macellaio) ? TELIC
    (used_by) Y Human ? PPda
  • plastic knife (coltello di plastica) ?
    CONST (made_of) X Material ?PPdi
  • table knife (coltello da tavola) ? TELIC
    (used_in) Z Location ?PPda
  • hunting knife (coltello da caccia) ? TELIC
    (used_in_activity) E Activity ?Ppda
  • piatto di legno ? CONST (made_of) X Material
    ?PPdi
  • piatto di pasta ? CONST (contains) X Food ?PPdi

PP disambig.
33
In SIMPLE possible extension
2nd
  • Deverbal nominalisation
  • noun murder (uccisione, delitto, omicidio
    (different sem. pref.)
  • ?PPdi
    PREDMURDER(uccidere)
  • ?PPda_parte_di, di
    ARG1agentHum/Anim?
  • verb murder (uccidere)
    ARG2patientHum/Anim?
  • ? subjNP
    MOD1instrWeapon
  • ? objNP
    MOD2meansAction
  • MOD3......
  • instr PPcon Weapon (knife m., con coltello)
  • means PPper Action (strangulation m., per
    strangolamento)
  • loc Ppplocdi Location (Kent State murders,
    nel ...)
  • time Ppptimedi Time (1983 murders, del 1983)

As if it were a Situation
34
Monolingual Linguistic Representation
Strategy
  • consider as the starting point for MILE the
    edited union of the basic notions represented in
    the existing syntactic/semantic lexicons (their
    models)
  • evaluate their notions wrt EAGLES recommendations
    for syntax and semantics
  • evaluate their usefulness adequacy for
    multilingual tasks
  • evaluate integrability of their notions in a
    unitary MILE
  • look for deficient areas, e.g. MWE
  • ...

To be decided should ISLE reach a consensus at
the level of the types of information only, or
also at the level of their token values? .
different answers for diff. notions
35
the Multilingual ISLE Lexical Entry
(MILE)
  • General methodological principles (from EAGLES)
  • Basic requirements for the MILE
  • Discover and list the (maximal) set of basic
    notions needed to describe the MILE (up to which
    level standardisation is feasible?)
  • Granularity
  • The leading principle for the design of the MILE
    the edited union of existing lexicons/models
    (redundancy is not a problem)
  • Modular and layered various degrees of
    specification possible
  • Allow for underspecification ( hierarchical
    structure)

36
The MILE
  • Main features
  • factor out primitive units of lexical information
  • explicit representation of information to be
    targeted by multilingual NLP tools
  • rely on lexical analyses with the highest degree
    of inter-theoretical agreement
  • avoid framework-specific representational
    solutions
  • open to different paradigms of multilinguality
  • oriented to the creation of large-scale lexical
    databases

37
MILE
  • Objective definition of the MILE
  • as a meta-entry to act as a common format for
    resource sharing and integration/architecture for
    lexical data encoding
  • ? its basic notions
  • ? general architecture
  • formalized as an entity-rel.
  • model (XML, RDF, etc.)
  • with a tool to support it
  • open to task- system-dependent parameterisation

38
Agreed Principles
  • MILE builds on the monolingual entry expands it
  • MILE incorporates previous EAGLES recommendations
  • is the complete entry
  • adopt as starting point the PAROLE/SIMPLE DTD
  • to be revised, augmented, ...
  • We consider 2 broad categories of
    applications
  • MT
  • CLIR (linking module may be simpler/ontology
    based)
  • (label info types wrt application)

39
Modularity in MILE
  • Advantages
  • Flexibility of representation
  • Easy to customise and update
  • Easy integration of existing resources
  • High versatility towards different applications
  • Modularity at least under three respects
  • in the macrostructure and general architecture of
    the MILE
  • in the microstructure of the MILE
  • monolingual linguistic representation (previous
    EAGLES revised/updated)
  • collocational/corpus-driven information (new)
  • multilingual apparatus (e.g. transfer conditions
    and actions interlingua) (new)
  • in the specific microstructure of the MILE
    word-sense

40
Modularity in MILE
A. MILE Macrostructure
C. Word-Sense Microstructure
MILE
B. MILE Microstructure
41
The MILE Architecture Monolingual Lexical
Description
  • three independent and yet linked layers
    characterising the MILE in a source language
  • possibly corresponds to the typology of
    information contained in major existing lexicons,
    such as PAROLE-SIMPLE, (Euro)WordNet, COMLEX,
    FrameNet, etc.
  • simple and complex lexical unit (to account for
    MWEs)
  • various degrees of granularity of lexical units
    representation

semantic layer
correspondence conditions
syntactic layer
morphological layer
42
The MILE ArchitectureMultilingual Layer
  • acts as an (independent) interface layer between
    monolingual lexicons

multilingual layer
semantic layer
correspondence conditions
syntactic layer
Lexicon 1
Lexicon 2
morphological layer
43
The MILE Multilingual Layer.(NEW)
  • Correspondences can be established between
    different types of linguistic objects (strings,
    syntactic descriptions, semantic elements,
    predicates, etc.)
  • Transfer tests and actions to target various
    types of lexical information in the monolingual
    layers
  • constrain syntactic positions and their fillers
  • lexicalize syntactic positions
  • add positions or arguments
  • add new features to define more fine-grained
    sense distinctions relevant at the multilingual
    level
  • restructuring argument configurations
  • collocational information
  • ...

44
Paths to Discover theBasic Notions of MILE
  • clues in dictionaries to decide on target
    equivalent
  • guidelines for lexicographers
  • clues (to disambiguate/translate) in corpus
    concordances
  • lexical requirements from various types of
    transfer conditions and actions in MT systems
  • lexical requirements from interlingua-based
    systems

45
  • Organisational Proposal
  • division of labour
  • Highlighted some hot issues assigned tasks
  • sense indicators (EU)
  • selection preferences (EU)
  • lexicographic relevance (EU)
  • argument structure (US)
  • MWE (EU US)
  • collocations parallel corpora (US)
  • modifiers (EU)
  • semantic relations (EU)
  • transfer conditions (EU US)
  • collocational patterns (US)
  • ontology (US)
  • metaphors (EU)
  • interlingua requirements (US)
  • spoken lexicon (EU)
  • meta-representation (US EU)
  • ...

46
Organisational ProposalThe tasks will lead to
  • an in-depth analysis of each area aiming at
    identifying
  • the most stable solutions adopted in the
    community
  • linguistic specifications and criteria
  • possible representational solutions, their
    compatibility, etc.
  • evaluation of their respective weight/importance
    in a multilingual lexicon (towards a layered
    approach to recommendations)
  • open issues and current boundaries of the
    state-of-the-art (which cannot be standardised
    yet)
  • model limitations through creation of a sample
    dictionary
  • see how the various pieces fit together can be
    merged in a unified proposal
  • evaluate if we can combine in a hybrid
    super-model the transfer interlingua approaches

47
Information Types examples
48
CLWG Ongoing Activities
  • to prepare a preliminary proposal of the MILE
  • existing models for lexical representation and
    data interchange (Genelex, Olif, etc.) are
    explored
  • model limitations and expressive power are tested
    through creation of sample entries in a few
    languages
  • groups at work
  • lexical description and information types of
    relevant info
  • lexicographic exploration systematic summary
    classification of types of transfer tests (also
    extracted from MRDs)
  • multilingual correspondences
  • lexical data modeling format representation
    issues
  • tool development

49
Representation issues
  • Working with GENELEX, lexicon development work is
    (can be) affected by
  • impossibility (or difficulty) of defining
    abstract and general classes or types of objects
  • lack of inheritance mechanisms
  • lack of default expression and default rewriting
    mechanisms
  • Cf. Lexical templates in SIMPLE
  • not included in the GENELEX data-structure
  • implemented in the editing sw. tool
  • very useful to capture relevant lexical
    generalizations, enhance consistency in encoding,
    speed-up lexicographers work, etc.

50
CLWG Ongoing Activity
MILE Lexical Objects Formal Specifications
MILE Lexical Entry Formal Specifications
MILE Shared Lexical Objects
User Defined Lexical Objects
Monolingual Multilingual Lexicons
51
  • MILE Repository of Shared Lexical Objects
  • Basic syntactic constructions (e.g. transitive,
    etc.)
  • (Micro-)semantic objects (e.g. features,
    relations)
  • (Macro-)semantic objects (e.g. lexical templates)
  • Multilingual constructions (e.g. basic transfer
    conditions and actions)

MILE Shared Lexical Objects
Simplify using MILE
  • - New Lexical objects defined by the User
    according to the common MILE formal
    data-structure specification.
  • Sub-types of the Shared MILE Objects
  • Possibly enriched with metadata defining their
    semantics and usage

User-Defined Lexical Objects
Monolingual Multilingual Lexicons
- Lexical entries obtained by referring to
various lexical objects (both Shared and
User-defined) - The MILE lexical entry model
specifies how lexical objects can be combined to
achieve the proper lexical representation
52
Involvement of Asian Languages
  • participation in last meetings
  • some input from Asia
  • formal cooperation EU-ASIA steps to put in
    motion

53
Impact synergies
  • real impact to be evaluated later
  • through the use in applications
  • already its being a US/EU project
  • the Asian interest
  • synergies now, e.g.
  • PAROLE/SIMPLE (also instantiated in 9 national
    projects) main input
  • EuroWordNet provides input
  • XMELLT (NSF) provides input
  • OLIF expects ( provides) input
  • SALT complementary
  • ENABLER validation ( expects input)
  • ELSNET validation
  • SENSEVAL validation
  • NIMM WG for Metadata for CL(also with the US
    OLAC)
  • ...

54
Target . Multilingual Content
Management the
Resources viewpoint
  • The relevance/impact of (good vs. less good) LRs
    for high-quality Cross/Multilingual systems is
    high, even if not easily measurable.
  • Different applications, component technologies -
    approaches within - need different info types
    (e.g. CLIR or content access systems wrt MT)
  • For each, need to specify (not an easy task)
  • clear lexical/linguistic/conceptual requirements
  • priority info types (which, how encoded, etc.)
  • the respective role of e.g. annotated corpora,
    mono- bi- multilingual lexicons (with different
    info types), ontologies, KBs

55
Economic Feasibilityfor which (Multilingual)
Resources to invest?
  • Wrt short- vs. medium-term impact
  • Basic, general purpose bi-/multilingual lexicons,
    but to be tuned, adapted to different
    applications
  • need of robust systems able to acquire/tune
  • (multilingual) lexical/linguistic/conceptual
    knowledge,
  • to accompany static basic resources
  • We shouldnt rely only on parallel
    corpora. More advisable to aim at
  • reliable methods for acquisition
    use of comparable corpora,
  • accompanied by
  • robust technologies for annotation
    (at different levels morphosyntactic,
  • syntactic/functional, semantic,
    ), and by
  • a shared set of (text) annotation schemata

56
Target.. Multilingual
Knowledge Management
Technical Feasibility
  • Prerequisite is it an achievable goal a commonly
    agreed text/lexicon annotation protocol also for
    the semantic/conceptual level (to be able to
    automatically establish links among different
    languages)?
  • Yes, at the lexical level
  • More complex, for corpus annotation?

EAGLES/ISLE
57
Content for practical use Gap betw. Resources
and Systems?
?
  • If we had real-size lexicons with very
    fine-grained semantic/conceptual info, would
    there be systems (non ad-hoc toy systems) able to
    use them?
  • A vicious circle between
  • i) lack of suitable, large-size and knowledge
    intensive, resources (lexicons and corpora, with
    many different types of syntactic and semantic
    information encoded), and
  • ii) systems ability to use them effectively
  • The two targets should be pursued in parallel,
  • should closely interact with each other, and
  • be gradually integrated
Write a Comment
User Comments (0)
About PowerShow.com