From term extraction to terminology compilation: - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

From term extraction to terminology compilation:

Description:

The challenge for computational terminology in an era of unlimited ... yes, in its basic form, by terminographers and ... extractor. Research agenda: ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 14

Provided by: uqo

Category:

more less

Transcript and Presenter's Notes

Title: From term extraction to terminology compilation:

1
From term extraction to terminology compilation
The challenge for computational terminology in an
era of unlimited corpora availability

Kyo Kageura
Library and Information Science Course
Graduate School of Education
University of Tokyo
Oct 9, 2008
TAMA 2008

2
Background (1)?

Automatic term recognition (ATR) Are they used?
yes, in its basic form, by terminographers and
lexicographers
yes, in closed situations such as patent
translations, etc.
no, not by most users such as online volunteer
translators.
Which ATR method/system?
ease of use, well maintained, etc.
recall, precision, f-measure unimportant, beyond
a certain level.

3
Background (2)?

End users' (or more precisely, my) situation
type of texts to be translated is open
potential range of reference is also open
term lookup is frequently needed
refer to Web as the universal information
resource.
Result of terminographers' work Are they
available?
yes, for in-house use
no, in most cases, for general users, including
online/volunteer translators and other
non-technical translators (myself!).
Do these translators (me!) need ATR type of help?
yes, if it constitutes something equivalent to
terminological dictionaries or supplements
existing terminological dictionaries
not term extraction, but terminology compilation

4
Three possible models of reference tools

Maximal model
you can stop looking for information there
(libraries)
Quality model
you can believe in what it says (high-quality
dictionaries)
Singularity model
you have nothing better/else (Google)
These models are intimately related to one
another
a good reference tool represents at least two of
these models
completely different from recall, precision
f-measure.
you need to make decision by yourself beyond that
point.

5
Requirements of each model

Maximal model
entry what you can find by Google should be
available
translation what you can find by Google should
be available
examples what you can find by Google should be
available.
Quality model
entry entries should be coherent as a set and
match user's expectations (e.g. what you tend to
check by association should be there)
translation give basic translations consistently
from which users can derive possible extensions
of translations
examples give basic range of examples.
Singularity model
no general internal requirements

6
Technologies for the maximal model

Entries Universal term crawler
collect all the technical terms existing on the
web
but what is a term?
collect all terms and non-terms on the web.
Translations Maximum range of candidates
still needs ordering by goodness, converges to
the technology for the quality model
Examples Maximum range of candidates
still needs ordering by goodness, converges to
the technology for the quality model

7
Technologies for the quality model

Entries
stratify texts in accordance with their register
and text types
collect maximal set of terms from the relevant
set of texts
make explicit the coherency of entries as a set.
Translations
make correspondence between source and target
text class
make explicit the coherency of translations as a
set.
Examples
something equivalent to entries and translations.

8
Technologies for the singularity model

Tear and/or hide other sources
Run a huge propaganda
Blackmail
Block other peoples research and development
Bundle the system with hardware.

9
The overall picture
Maximal model
Maximal dictionary
Universal crawler of entries, translations and
examples
Preference assigner
Maximal/ coherent set of relevant texts
entry, translation and example extractor
Maximal/ coherent set of relevant texts
Maximal/ coherent set of relevant texts
Quality dictionary
Relevant text collector
Consistency validator
Quality model
10
Research agenda theoretical issues

How many terms are there on the web (in source
and target language)?
What is the consistency of entries?
What is a good example?
How are their order defined?
What is a relevant set of texts for term
extraction?
How are their preferences stratified?

11
Research agenda technical issues

How can all the terms be identified and
extracted?
How can consistencies be measured?
How can examples be clustered and classified?
How can these examples be ordered?
How can features be identified for relevant text
classification?
How can these texts be further classified and
ordered?

12
Last words

Do I really intend to pursue these agendas?
Yes, if my current funding proposal is accepted
(if not, I'll still carry out basic research
related to these agendas, but will not be able to
make operational real world systems)
What is the prospect of the research?
In 3 years, we will make maximal crawler
available
In 2 years, we will make relevant text collector
open
I am not sure how coherency checker can be made,
maybe with the interaction of human experts? - a
la ontology research? which I do not like that
much...