Multilingual Lexical Resources application requirements - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Multilingual Lexical Resources application requirements

Description:

4. Machine Translation. full morphosyntactic coding of mono entries ... are not sufficient for translation. Word-to-work links. are more specific, cover ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 19
Provided by: UB7
Category:

less

Transcript and Presenter's Notes

Title: Multilingual Lexical Resources application requirements


1
Multilingual Lexical Resourcesapplication
requirements
  • Gr. Thurmair
  • 3-6-2000

2
Outline
  • 1 Multilingual Applications
  • 2 Lexicon Requirements
  • 3 Paradigms of ML resources
  • 4 Synthesis proposal
  • 5 Challenges and actions

3
ML applications (1)
  • 1. Cross-Lingual Information Retrieval
  • multilingual Concept Net
  • words with representation of senses
  • words with links to concepts
  • ontology relations between concepts
  • multilingual equivalents between concepts
  • gt ML Concept Net, EuroWordNet

4
ML Applications (2)
  • 2. ML Information Extraction
  • Domain objects with special semantics
  • person - river - company designator - drug
  • Application specific semantic frames
  • (immigrate -gt agent ltpersongt)
  • Ontology (domain model) for objects
  • multilingual Links to concept nodes
  • (concepticon)

5
ML Applications (3)
  • 3. Multilingual Key Topic identificationand
    classification
  • key topic identifier (key terms / key objects)
  • key topic translation
  • standard terminology lookup tool

6
ML Applications (4)
  • 4. Machine Translation
  • full morphosyntactic coding of mono entries
  • usually entry-based, not reading-based
  • transfer lexicons
  • bilingual, directed
  • linguistic tests and actions
  • sense disambiguation as part of
    transferselection (subject area codes)

7
Requirements for ML lexicons
  • ML lexicons must support
  • some monolingual descriptions
  • morphosyntax, word senses, pragmatics
    (application-specific issues)
  • monolingual hierarchies
  • conceptual hierarchies / ontologies
  • subject area classifications
  • multilingual links between the monos

8
Paradigms of ML resources
  • 2 Paradigms are implemented
  • word-to-word links in translation technology
  • concept-to-concept links in ML concept nets

9
Paradigms of ML resources
  • Concept-to-concept links
  • are more general, support recall
  • have broader language coverage
  • are not sufficient for translation
  • Word-to-work links
  • are more specific, cover meaning nuances
  • support precision
  • cost more effort to maintain
  • no empty transfers must occur

10
Synthesis Proposal (1)
  • Combine word and concept links
  • Base the resource on word-to-word links
  • Promote some words to concepts
  • for concepts, word-to-word links are identical
    to concept-to-concept links
  • for synset members, use
  • word-to-word transfer if appropriate and
    available
  • conceptual transfer otherwise
  • give attributes to links

11
Synthesis Proposal (2)
  • Keep synset relationships for IR applications
  • Keep transfer relations for MT applications

12
Synthesis Proposal (3)
  • Consequences
  • we save all 11 concept - synset links
  • (this is the majority) (kill -gt ltkillgt)
  • all concept nodes must also be words
  • (can of course be multiwords)
  • we translate on both general and specific level
  • we can offer default translations
  • in cases no word-to-word links are available

13
Consequences (1)
  • Design a database to support this
  • monolingual entries with annotations
  • transfer links for these entries
  • multilingual, directed
  • cross-reference links between entries
  • on word level abb_for, forbidden_for
  • between words and concepts (synset)
  • between concepts (hierarchical)
  • attributes for the links

14
Consequences (1)
  • Entry
  • canonical_form char 40language intpart_of_sp
    eech charreading_number charsubject_area char
    concept_type word - term - concept
  • TransferLink
  • EQ-type full - partial - none
  • CrossrefLinkLinkstatus t-t, t-c,
    c-cLinktype hyponym, ....

15
Consequences (2)
  • Convert existing resources
  • MT lexicons are entry-based
  • entries can have several readings
  • entries have several translations
  • kill -gt töten (gv)
  • -gt abbrechen (dp)
  • Concept nets / term bases are concept-based
  • kill_1 -gt töten
  • kill_2 -gt abbrechen_1
  • interrupt_3 lt- abbrechen_2

16
Consequences (2)
CN -gt MT merge readings MT -gt CN create
readings (SL and TL!)
17
Consequences for MT systems
  • Lexical resources
  • same mono lexicons used in many language
    pairs(impact on other languages?)
  • Combinatorics
  • ambiguous words cause combinatoric problemsin
    parsing (impact on performance?)
  • Disambiguation Strategies
  • disambiguaiton in analysis, not transfer(impact
    on output?)

18
Thank you for your attention
Write a Comment
User Comments (0)
About PowerShow.com