Title: Presentazione di PowerPoint
1TKE2005 - 7th International conference on
Terminology and Knowledge Engineering
Thesaurus classification and relational
structure the EARTh experience
Fulvio MAZZOCCHI, Paolo PLINI
Thursday, August 18th 2005 Terminology and
Classification
2Overview
- This contribution will present
- General aspects of environmental terminology
- The CNR project concerning the development of a
general environmental thesaurus (EARTh) - EARTh classification structure and semantic model
- EARTh relational structure
- EARTh terminological content
- SuperThes the thesaurus management software that
has been utilized
3Some general considerations on environmental
terminology
- In the contemporary society the environment is
a crucial topic - new issues continually arise (i.e., biological
pollution) - there is a rapid evolution of environmental
knowledge - new technologies are created
This dynamicity of the domain is reflected in the
development of environmental terminology
4Some general considerations on environmental
terminology (2)
- The environment is a multidisciplinary domain.
- Each term could be defined in different ways
depending on the context in which it is
considered. - For example, the term benzene
- an environmental planner may consider it as a
pollutant substance that could enter within
biogeochemical cycles creating potential damages
to the environment - a biologist may consider its toxicity and the
different routes on which it can enter an
organism - an engineer would consider it as a fuel for a
combustion engine - a chemist may see it as the precursor of a class
of chemical compounds - etc.
5Some general considerations on environmental
terminology (3)
- There are problems of semantic overlapping
- i.e., environmental conservation, environmental
protection, environmental preservation,
environmental safeguard - Biocultural issues
- The environment may be conceptualised in
different manner according to different cultural
point of view. There is a strong relationships
among language, knowledge and the environment
(see for example Terralingua initiatives,
http//www.terralingua.org).
6ECOTerm
The need of systems able to rationalise
environmental information management is a
much-debated topic. A relevant initiative on
environmental terminology is represented by
ECOinformatics/ECOTerm (http//ecoinfo.eionet.eu.i
nt/). It is born to bring together the major
providers of environmental terminologies to
discuss the status of their terminologies, how
they are applying new technologies, and how these
resources can be made more valuable to the
community by integration and collaboration. It
involves several institutions (UNEP, FAO, EEA, US
EPA, USGS, JRC, CCLRC, CNR, UBA). Two meetings
were organised in Geneva and Berlin, next one
should be held in Rome in 2006.
7CNR involvement in environmental
terminology/EARTh genesis
8The Vision
We are working on a new thesaurus model to be
applied in the environmental domain. The
thesaurus should
- be an highly structured and refined tool able to
combine a strong conceptual basis with a
flexibility towards different applications
- represent an updated semantic and terminological
map of the environmental domain
- take into account the cultural dimension of
knowledge organization
- allow different levels of comprehensibility and
applicability for users with different expertise
- ensure the porting of the thesaurus into
different technological applications.
9EARTh Architecture
- Matrix semantic structure
- Vertical structure based on a systems of
categories - Thematic set-up to be developed for local
applications
- Thesaurus Relations
- Differentiation and better semantic expression of
relations - In particular, the RT transversal structure
(thesaurus as semantic connector) will be
strengthened
10EARTh Classification Scheme
The Classification Scheme of EARTh is based on
categories. Following a bottom-up perspective,
terms could be analysed according to a
progressive hierarchical scale. In that scale
conceptual features are progressively discarded
following an intensional perspective (while in an
extensional perspective the number of things
associated to that intension is increased). The
maximum level of generality is thus reached. The
categories represent the top of this vertical
structure.
EARTh first and second level categories
11EARTh Classification Schemewhy adopting a
categorial approach?
- Categorial approach ensure
- stable conceptual basis for knowledge
organization - a tool to classify concepts starting from their
most basic meaning, as referred to by the logic
inherent to the system - a stronger control over semantic arrangement
- top level applicability to different domains,
enhancing interdisciplinarity.
12EARTh Matrix Model - vertical structure
Vertical structure The vertical structure of
EARTh is based on different classification and
hierarchical levels. The vertical structure is an
operative tool that, by providing the categorial
interpretation of the meaning of the terms and by
placing them in the semantic tree aims to
orientate the users towards the most essential
characteristics of their semantics. Nevertheless
it does not limit the conceptual analysis of
terms to a static and univocal view.
13EARTh Matrix Model - Themes
Thematic setup for applications The model
envisages the possibility to develop additional
arrangements of the terminology. The vertical
structure may be complemented by micro-worlds of
thematically linked terms (themes). While the
tree structure tends to scatter them under their
referral category, themes reassemble the terms
according to their own perspective. This model
should also allow meaning representation
according to different second order
acceptations.
suolo - soil
14Meaning Representation application to the case
of Benzene
Benzene is an aromatic organic substance.
aromatic, organic, substance seem to be
essential semantic traits, that cannot be
ignored (in our present Western conceptualization)
Benzene is toxic Benzene is pollutant Benzene
is dangerous
toxic, pollutant, dangerous are typical traits.
They have a weaker weight in meaning
representation, even if they all represent
important properties in the environmental context.
In EARTh Themes provide additional perspective
for term interpretation and act as a tool to
represent other semantic traits. Theme
HEALTH ? benzene seen as a toxic substance.
Theme POLLUTION ? benzene seen as a
pollutant. Theme SAFETY ? benzene seen as a
dangerous substance.
15EARTh Matrix Model the case of Benzene
16Semantic relations of traditional thesauri some
limitations
Traditional thesauri provide a limited set of
relationships between terms, distinguishing only
among hierarchical relationships, associative
relationships and equivalence relationships. Moreo
ver thesaurus relationships are often applied
inconsistently. This causes ambiguity in the
interpretation and can result in unpredictable
semantic structures. Perhaps the generic
hierarchical relation is the most misused. Many
existent thesauri provide relations that are
labelled as BT/NT but they could be better
interpreted as associative relations.
Monitoring Recycling NT Monitoring
technique NT Recycling ratio (GEMET,
1999) (GEMET, 1999) Many relations are also
indicated as ASSOCIATIVE associative, but their
nature is not specified. Remote sensing Air
quality management Eutrophication RT
Cartography RT Air quality RT Sewage (EnVoc,
1997) (EnVoc, 1997) (EnVoc, 1997)
17Refinement of thesaurus relational structure
- One of the solutions that is commonly proposed to
overcome these limitations entails a
reengineering of the traditional thesauri into
systems provided with an extended network of
well-defined relationships. - The augmentation of thesaurus relationships
- should support a better semantic control
- should open up new possibilities for information
retrieval - could be used for automated processing.
- In EARTh, the implementation of a more refined
set of semantic relationships is at present under
construction. Standard relationships will be
arranged into richer subtypes, whose semantic
content is specified. Linguistic structures will
express semantic relations.
18 Hierarchical relation
Thesaurus standards and the scientific literature
include three kinds of hierarchical relations
Generic, Partitive and Instance, which are
conflated into one generic hierarchical
relationship. In EARTh, Generic, Partitive and
Instance relations will be differentiated. We
will also try to identify for each of them
different subtypes.
19Application of node labels
Node labels will also indicate the use of
different subdivision criteria in generic
hierarchical relations.
20Associative relation
The associative relation covers a heterogeneous
and undifferentiated set of relations. It can
express many kinds of association between terms
that are not hierarchically based. ISO 704
defines it as a relation that exists when a
thematic connection can be established between
concepts by virtue of experience. In our work we
will try to specify the nature of the relations
and to differentiate RTs in subtypes.
Specifying and increasing associative relations
will allow the development of a net-like
structure that emphasize the system of
interrelations, the connecting ties that limit
the degree of separation of a conceptual field
and cannot be represented by the
taxonomic-hierarchic tree-like model (this is
crucial in the environmental domain).
21Equivalence relation
Equivalence relationship covers at least the
following basic types synonyms, lexical variants
and near-synonymy. Actual synonyms and lexical
variant forms will be distinguished and different
subtypes will be identified.
Synonymy refers to meaning similarity. It has
also been defined as interchangeability between
terms, although it is very difficult to think
about the existence of an absolute or perfect
synonymy where there is interchangeability in all
contexts. Lexical variants are different word
forms for the same expression and derive from
morphological and grammatical variations. The
category of near-synonyms as such will not be
included at this stage in the system.
22Portability vs. different users
Ensuring a high modularity of the system is
another important requisite to be achieved. Not
all kinds of users may be interested in such a
fine distinction of the thesaurus relations. It
will be possible to navigate the thesaurus
structure with different levels of granularity,
starting from the traditional version of the
thesaurus relational structure.
23EARTh terminology collection and selection
- Our goal is to produce an updated and sound
semantic map of the environmental domain. - The main source (about 4000 terms selected) of
environmental terms is GEMET-General European
Multilingual Environmental Thesaurus (1999)
developed by CNR-EKOLab and UBA-Umweltbundesamt
for the European Environmental Agency. - Other sources are (the terminological base is
approx. 20.000 terms). - sources of general environmental terminology
- UN Environment and Development (1992)
- sources of specific domains terminology
- Italian Thesaurus of Earth Sciences (2000)
- Inland Water terminology (2001)
- Snow and Ice Terminology (2003)
- Thesaurus for Emergency and Disasters (1998/2003)
- Remote Sensing Terminology (2004)
- Other reference documents in specific fields or
concerning contemporary science (chaos theory,
complexity) or related to biocultural diversity
issue.
24EARTh terminological content
At present, EARTh contains about 7.500 terms
already selected and arranged 1.500 terms
concern the environmental pressures (i.e.,
industrial and agricultural activities). 2.500
terms describe the state of the environment
(i.e., natural components and processes). 1.000
terms are about the environmental impacts (i.e.,
waste, pollution, biodiversity loss). 2.500
terms concern the social responses (i.e.
legislative measures, environmental education,
research).
25- SuperThes is a thesaurus management software
developed by TBHS and financed in the frame of a
Memorandum of Understanding between CNR, UBA-A,
UBA-D and TBHS. - It relies on a open source client-server DB
technology (Interbase-Firebird) - For small installations, client and server may
reside on the same computer. - It is fully Unicode compliant and stores all data
in UCS-2 format. - All languages defined in ISO 639-1 are
pre-defined. - Perspectives and ongoing activities
- Visualizer for SuperThes-based Thesauri
- Web interface for SuperThes-based Thesauri
- Further expanding multilingual capabilities
(sorting, UTF8 and UTF32 encodings)
26Main features
- A graphical user interface with drag and drop
features and context menus allows a quick and
efficient data handling
- Powerful word processor plug-in
- supports tables and images
- reads and writes RTF and HTML formats
- reads and writes MS Word documents
- Multimedia editors for sounds and images
- Supports various file formats (jpg, bmp, ico,
emf, wmf) - Data exchange with other applications via files,
clipboard and drag drop
- SuperThes supports a wide range of additional
data types - Boolean, decimal, list, memo, short long text,
geographic coordinates, others (customisable)
27Thank You!
- http//uta.iia.cnr.it
- uta_at_iia.cnr.it
- ? 39 06 90672 712/270
- ? 39 06 90672 660
Information on SuperThes rudolf.legat_at_umweltbund
esamt.at