From term extraction to terminology compilation: - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

From term extraction to terminology compilation:

Description:

The challenge for computational terminology in an era of unlimited ... yes, in its basic form, by terminographers and ... extractor. Research agenda: ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 14
Provided by: uqo
Category:

less

Transcript and Presenter's Notes

Title: From term extraction to terminology compilation:


1
From term extraction to terminology compilation
The challenge for computational terminology in an
era of unlimited corpora availability
  • Kyo Kageura
  • Library and Information Science Course
  • Graduate School of Education
  • University of Tokyo
  • Oct 9, 2008
  • TAMA 2008

2
Background (1)?
  • Automatic term recognition (ATR) Are they used?
  • yes, in its basic form, by terminographers and
    lexicographers
  • yes, in closed situations such as patent
    translations, etc.
  • no, not by most users such as online volunteer
    translators.
  • Which ATR method/system?
  • ease of use, well maintained, etc.
  • recall, precision, f-measure unimportant, beyond
    a certain level.

3
Background (2)?
  • End users' (or more precisely, my) situation
  • type of texts to be translated is open
  • potential range of reference is also open
  • term lookup is frequently needed
  • refer to Web as the universal information
    resource.
  • Result of terminographers' work Are they
    available?
  • yes, for in-house use
  • no, in most cases, for general users, including
    online/volunteer translators and other
    non-technical translators (myself!).
  • Do these translators (me!) need ATR type of help?
  • yes, if it constitutes something equivalent to
    terminological dictionaries or supplements
    existing terminological dictionaries
  • not term extraction, but terminology compilation

4
Three possible models of reference tools
  • Maximal model
  • you can stop looking for information there
    (libraries)
  • Quality model
  • you can believe in what it says (high-quality
    dictionaries)
  • Singularity model
  • you have nothing better/else (Google)
  • These models are intimately related to one
    another
  • a good reference tool represents at least two of
    these models
  • completely different from recall, precision
    f-measure.
  • you need to make decision by yourself beyond that
    point.

5
Requirements of each model
  • Maximal model
  • entry what you can find by Google should be
    available
  • translation what you can find by Google should
    be available
  • examples what you can find by Google should be
    available.
  • Quality model
  • entry entries should be coherent as a set and
    match user's expectations (e.g. what you tend to
    check by association should be there)
  • translation give basic translations consistently
    from which users can derive possible extensions
    of translations
  • examples give basic range of examples.
  • Singularity model
  • no general internal requirements

6
Technologies for the maximal model
  • Entries Universal term crawler
  • collect all the technical terms existing on the
    web
  • but what is a term?
  • collect all terms and non-terms on the web.
  • Translations Maximum range of candidates
  • still needs ordering by goodness, converges to
    the technology for the quality model
  • Examples Maximum range of candidates
  • still needs ordering by goodness, converges to
    the technology for the quality model

7
Technologies for the quality model
  • Entries
  • stratify texts in accordance with their register
    and text types
  • collect maximal set of terms from the relevant
    set of texts
  • make explicit the coherency of entries as a set.
  • Translations
  • make correspondence between source and target
    text class
  • make explicit the coherency of translations as a
    set.
  • Examples
  • something equivalent to entries and translations.

8
Technologies for the singularity model
  • Tear and/or hide other sources
  • Run a huge propaganda
  • Blackmail
  • Block other peoples research and development
  • Bundle the system with hardware.

9
The overall picture
Maximal model
Maximal dictionary
Universal crawler of entries, translations and
examples
Preference assigner
Maximal/ coherent set of relevant texts
entry, translation and example extractor
Maximal/ coherent set of relevant texts
Maximal/ coherent set of relevant texts
Quality dictionary
Relevant text collector
Consistency validator
Quality model
10
Research agenda theoretical issues
  • How many terms are there on the web (in source
    and target language)?
  • What is the consistency of entries?
  • What is a good example?
  • How are their order defined?
  • What is a relevant set of texts for term
    extraction?
  • How are their preferences stratified?

11
Research agenda technical issues
  • How can all the terms be identified and
    extracted?
  • How can consistencies be measured?
  • How can examples be clustered and classified?
  • How can these examples be ordered?
  • How can features be identified for relevant text
    classification?
  • How can these texts be further classified and
    ordered?

12
Last words
  • Do I really intend to pursue these agendas?
  • Yes, if my current funding proposal is accepted
    (if not, I'll still carry out basic research
    related to these agendas, but will not be able to
    make operational real world systems)
  • What is the prospect of the research?
  • In 3 years, we will make maximal crawler
    available
  • In 2 years, we will make relevant text collector
    open
  • I am not sure how coherency checker can be made,
    maybe with the interaction of human experts? - a
    la ontology research? which I do not like that
    much...

13
  • Merci
  • Thanks
Write a Comment
User Comments (0)
About PowerShow.com