XMELLT - PowerPoint PPT Presentation

About This Presentation
Title:

XMELLT

Description:

Cross-lingual Multi-word Expression Lexicons. for Language Technology ... word expression lexicon incorporating both morpho-syntactic and semantic information ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 18
Provided by: drewfn
Category:
Tags: xmellt | lexicon

less

Transcript and Presenter's Notes

Title: XMELLT


1
XMELLT
  • Cross-lingual Multi-word Expression Lexicons
  • for Language Technology
  • Multilingual Information Access and Management
  • International Research Co-operation

Nancy Ide Department of Computer Science Vassar
College
2
Participants
  • Department of Computer Science, Vassar College
  • International Computer Science Institute,
    University of California, Berkeley
  • Department of Computer Science, New York
    University
  • Computing Research Laboratory, New Mexico State
    University

3
Framework
  • Planning project
  • one-year time frame
  • Originally submitted as a joint NSF-EU project
    with additional European partners
  • Istituto di Linguistica Computazionale, CNR, Pisa
  • Institut für Maschinelle Sprachverarbeitung,
    Stuttgart
  • LexiQuest, Paris

4
Overall goal
  • define a core international infrastructure to
    support the creation of a multi-lingual
    multi-word expression lexicon incorporating both
    morpho-syntactic and semantic information

5
Specific aims
  • determine the type and dimensions of information
    to serve the needs of critical NLP applications
  • specify an overall architecture for a joint
    software and lingware development project

6
Aims...
  • Explore the possibilities for recognizing and
    acquiring multi-word lexical units from corpora
    by means of partial parsing, statistics, etc.
  • Outline a collaborative project to acquire and
    represent multi-word lexical entries for multiple
    languages

7
Motivation
  • Multi-word constructions are extremely frequent
    in language
  • 30of the lexical stock
  • Existing resources do not adequately treat
    multi-word expressions

8
Limitations
  • constructed for particular system or application
  • incorporate tailored information (e.g., primarily
    syntax with little semantics)
  • not reusable
  • most devoted to a single language and/or approach

9
Limitations...
  • not flexible, expandable to multiple languages
  • MT systems' lexicons are typically little more
    than "translation memories"
  • No interface among single-word entries,
    multi-word entries, syntax, and semantics

10
XMELLT Approach
  • Broad view of multi-word expressions
  • idioms, compounds, collocations, co-occurrence
    patterns
  • focus on linking of individual language lexicons
  • individual words and multi-word expressions
  • different types of multi-word expressions
  • e.g., English noun-noun vs Romance noun-PP

11
Considerations
  • internal variation
  • sub-categorization properties
  • idiosyncratic constraints on inflection
  • meaning (non-)compositionality

12
Encoding Model
  • Compatible and integrated with existing and de
    facto standards
  • e.g., EAGLES, PAROLE/SIMPLE, NOMLEX

13
Activities
  • Assessment of existing lexical resources for
    multi-word expressions
  • Delivery of survey

14
Activities...
  • Creation of a small set of sample entries
  • add lexical information on support verb
    constructions to 50 nouns drawn from NOMLEX for
    English, Italian, German, and French
  • create lexical entries for 50 N-N English
    constructs from the PAROLE/SIMPLE lexicons and
    corresponding constructs in Italian, German, and
    French

15
Activities...
  • Develop preliminary specifications for
    structuring and encoding multi-lingual,
    multi-word expression lexicons
  • required linguistic information
  • harmonized data architecture and encoding format

16
Activities...
  • Exploration of techniques for automatic
    acquisition
  • Months 1-6 Survey of acquisition techniques,
    typology of MWE
  • Months 7-12 Design of architecture for MWE
    acquisition

17
Project information
  • Start date June (?)
  • Web site
  • Contact

http//www.cs.vassar.edu/ide/XMELLT.html
Nancy Ide (PI) Department of Computer
Science Vassar College ide_at_cs.vassar.edu
Write a Comment
User Comments (0)
About PowerShow.com