Presentazione di PowerPoint - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Presentazione di PowerPoint

Description:

EARTh ... The augmentation of thesaurus relationships: should support a better ... In EARTh, the implementation of a more refined set of semantic ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 28
Provided by: paolo75
Category:

less

Transcript and Presenter's Notes

Title: Presentazione di PowerPoint


1
TKE2005 - 7th International conference on
Terminology and Knowledge Engineering
Thesaurus classification and relational
structure the EARTh experience
Fulvio MAZZOCCHI, Paolo PLINI
Thursday, August 18th 2005 Terminology and
Classification
2
Overview
  • This contribution will present
  • General aspects of environmental terminology
  • The CNR project concerning the development of a
    general environmental thesaurus (EARTh)
  • EARTh classification structure and semantic model
  • EARTh relational structure
  • EARTh terminological content
  • SuperThes the thesaurus management software that
    has been utilized

3
Some general considerations on environmental
terminology
  • In the contemporary society the environment is
    a crucial topic
  • new issues continually arise (i.e., biological
    pollution)
  • there is a rapid evolution of environmental
    knowledge
  • new technologies are created

This dynamicity of the domain is reflected in the
development of environmental terminology
4
Some general considerations on environmental
terminology (2)
  • The environment is a multidisciplinary domain.
  • Each term could be defined in different ways
    depending on the context in which it is
    considered.
  • For example, the term benzene
  • an environmental planner may consider it as a
    pollutant substance that could enter within
    biogeochemical cycles creating potential damages
    to the environment
  • a biologist may consider its toxicity and the
    different routes on which it can enter an
    organism
  • an engineer would consider it as a fuel for a
    combustion engine
  • a chemist may see it as the precursor of a class
    of chemical compounds
  • etc.

5
Some general considerations on environmental
terminology (3)
  • There are problems of semantic overlapping
  • i.e., environmental conservation, environmental
    protection, environmental preservation,
    environmental safeguard
  • Biocultural issues
  • The environment may be conceptualised in
    different manner according to different cultural
    point of view. There is a strong relationships
    among language, knowledge and the environment
    (see for example Terralingua initiatives,
    http//www.terralingua.org).

6
ECOTerm
The need of systems able to rationalise
environmental information management is a
much-debated topic. A relevant initiative on
environmental terminology is represented by
ECOinformatics/ECOTerm (http//ecoinfo.eionet.eu.i
nt/). It is born to bring together the major
providers of environmental terminologies to
discuss the status of their terminologies, how
they are applying new technologies, and how these
resources can be made more valuable to the
community by integration and collaboration. It
involves several institutions (UNEP, FAO, EEA, US
EPA, USGS, JRC, CCLRC, CNR, UBA). Two meetings
were organised in Geneva and Berlin, next one
should be held in Rome in 2006.
7
CNR involvement in environmental
terminology/EARTh genesis
8
The Vision
We are working on a new thesaurus model to be
applied in the environmental domain. The
thesaurus should
  • be an highly structured and refined tool able to
    combine a strong conceptual basis with a
    flexibility towards different applications
  • represent an updated semantic and terminological
    map of the environmental domain
  • take into account the cultural dimension of
    knowledge organization
  • allow different levels of comprehensibility and
    applicability for users with different expertise
  • ensure the porting of the thesaurus into
    different technological applications.

9
EARTh Architecture
  • Matrix semantic structure
  • Vertical structure based on a systems of
    categories
  • Thematic set-up to be developed for local
    applications
  • Thesaurus Relations
  • Differentiation and better semantic expression of
    relations
  • In particular, the RT transversal structure
    (thesaurus as semantic connector) will be
    strengthened

10
EARTh Classification Scheme
The Classification Scheme of EARTh is based on
categories. Following a bottom-up perspective,
terms could be analysed according to a
progressive hierarchical scale. In that scale
conceptual features are progressively discarded
following an intensional perspective (while in an
extensional perspective the number of things
associated to that intension is increased). The
maximum level of generality is thus reached. The
categories represent the top of this vertical
structure.
EARTh first and second level categories
11
EARTh Classification Schemewhy adopting a
categorial approach?
  • Categorial approach ensure
  • stable conceptual basis for knowledge
    organization
  • a tool to classify concepts starting from their
    most basic meaning, as referred to by the logic
    inherent to the system
  • a stronger control over semantic arrangement
  • top level applicability to different domains,
    enhancing interdisciplinarity.

12
EARTh Matrix Model - vertical structure
Vertical structure The vertical structure of
EARTh is based on different classification and
hierarchical levels. The vertical structure is an
operative tool that, by providing the categorial
interpretation of the meaning of the terms and by
placing them in the semantic tree aims to
orientate the users towards the most essential
characteristics of their semantics. Nevertheless
it does not limit the conceptual analysis of
terms to a static and univocal view.
13
EARTh Matrix Model - Themes
Thematic setup for applications The model
envisages the possibility to develop additional
arrangements of the terminology. The vertical
structure may be complemented by micro-worlds of
thematically linked terms (themes). While the
tree structure tends to scatter them under their
referral category, themes reassemble the terms
according to their own perspective. This model
should also allow meaning representation
according to different second order
acceptations.
suolo - soil
14
Meaning Representation application to the case
of Benzene
Benzene is an aromatic organic substance.
aromatic, organic, substance seem to be
essential semantic traits, that cannot be
ignored (in our present Western conceptualization)
Benzene is toxic Benzene is pollutant Benzene
is dangerous
toxic, pollutant, dangerous are typical traits.
They have a weaker weight in meaning
representation, even if they all represent
important properties in the environmental context.
In EARTh Themes provide additional perspective
for term interpretation and act as a tool to
represent other semantic traits. Theme
HEALTH ? benzene seen as a toxic substance.
Theme POLLUTION ? benzene seen as a
pollutant. Theme SAFETY ? benzene seen as a
dangerous substance.
15
EARTh Matrix Model the case of Benzene
16
Semantic relations of traditional thesauri some
limitations
Traditional thesauri provide a limited set of
relationships between terms, distinguishing only
among hierarchical relationships, associative
relationships and equivalence relationships. Moreo
ver thesaurus relationships are often applied
inconsistently. This causes ambiguity in the
interpretation and can result in unpredictable
semantic structures. Perhaps the generic
hierarchical relation is the most misused. Many
existent thesauri provide relations that are
labelled as BT/NT but they could be better
interpreted as associative relations.
Monitoring Recycling NT Monitoring
technique NT Recycling ratio (GEMET,
1999) (GEMET, 1999) Many relations are also
indicated as ASSOCIATIVE associative, but their
nature is not specified. Remote sensing Air
quality management Eutrophication RT
Cartography RT Air quality RT Sewage (EnVoc,
1997) (EnVoc, 1997) (EnVoc, 1997)
17
Refinement of thesaurus relational structure
  • One of the solutions that is commonly proposed to
    overcome these limitations entails a
    reengineering of the traditional thesauri into
    systems provided with an extended network of
    well-defined relationships.
  • The augmentation of thesaurus relationships
  • should support a better semantic control
  • should open up new possibilities for information
    retrieval
  • could be used for automated processing.
  • In EARTh, the implementation of a more refined
    set of semantic relationships is at present under
    construction. Standard relationships will be
    arranged into richer subtypes, whose semantic
    content is specified. Linguistic structures will
    express semantic relations.

18
Hierarchical relation
Thesaurus standards and the scientific literature
include three kinds of hierarchical relations
Generic, Partitive and Instance, which are
conflated into one generic hierarchical
relationship. In EARTh, Generic, Partitive and
Instance relations will be differentiated. We
will also try to identify for each of them
different subtypes.
19
Application of node labels
Node labels will also indicate the use of
different subdivision criteria in generic
hierarchical relations.
20
Associative relation
The associative relation covers a heterogeneous
and undifferentiated set of relations. It can
express many kinds of association between terms
that are not hierarchically based. ISO 704
defines it as a relation that exists when a
thematic connection can be established between
concepts by virtue of experience. In our work we
will try to specify the nature of the relations
and to differentiate RTs in subtypes.
Specifying and increasing associative relations
will allow the development of a net-like
structure that emphasize the system of
interrelations, the connecting ties that limit
the degree of separation of a conceptual field
and cannot be represented by the
taxonomic-hierarchic tree-like model (this is
crucial in the environmental domain).
21
Equivalence relation
Equivalence relationship covers at least the
following basic types synonyms, lexical variants
and near-synonymy. Actual synonyms and lexical
variant forms will be distinguished and different
subtypes will be identified.
Synonymy refers to meaning similarity. It has
also been defined as interchangeability between
terms, although it is very difficult to think
about the existence of an absolute or perfect
synonymy where there is interchangeability in all
contexts. Lexical variants are different word
forms for the same expression and derive from
morphological and grammatical variations. The
category of near-synonyms as such will not be
included at this stage in the system.
22
Portability vs. different users
Ensuring a high modularity of the system is
another important requisite to be achieved. Not
all kinds of users may be interested in such a
fine distinction of the thesaurus relations. It
will be possible to navigate the thesaurus
structure with different levels of granularity,
starting from the traditional version of the
thesaurus relational structure.
23
EARTh terminology collection and selection
  • Our goal is to produce an updated and sound
    semantic map of the environmental domain.
  • The main source (about 4000 terms selected) of
    environmental terms is GEMET-General European
    Multilingual Environmental Thesaurus (1999)
    developed by CNR-EKOLab and UBA-Umweltbundesamt
    for the European Environmental Agency.
  • Other sources are (the terminological base is
    approx. 20.000 terms).
  • sources of general environmental terminology
  • UN Environment and Development (1992)
  • sources of specific domains terminology
  • Italian Thesaurus of Earth Sciences (2000)
  • Inland Water terminology (2001)
  • Snow and Ice Terminology (2003)
  • Thesaurus for Emergency and Disasters (1998/2003)
  • Remote Sensing Terminology (2004)
  • Other reference documents in specific fields or
    concerning contemporary science (chaos theory,
    complexity) or related to biocultural diversity
    issue.

24
EARTh terminological content
At present, EARTh contains about 7.500 terms
already selected and arranged 1.500 terms
concern the environmental pressures (i.e.,
industrial and agricultural activities). 2.500
terms describe the state of the environment
(i.e., natural components and processes). 1.000
terms are about the environmental impacts (i.e.,
waste, pollution, biodiversity loss). 2.500
terms concern the social responses (i.e.
legislative measures, environmental education,
research).
25
  • SuperThes is a thesaurus management software
    developed by TBHS and financed in the frame of a
    Memorandum of Understanding between CNR, UBA-A,
    UBA-D and TBHS.
  • It relies on a open source client-server DB
    technology (Interbase-Firebird)
  • For small installations, client and server may
    reside on the same computer.
  • It is fully Unicode compliant and stores all data
    in UCS-2 format.
  • All languages defined in ISO 639-1 are
    pre-defined.
  • Perspectives and ongoing activities
  • Visualizer for SuperThes-based Thesauri
  • Web interface for SuperThes-based Thesauri
  • Further expanding multilingual capabilities
    (sorting, UTF8 and UTF32 encodings)

26
Main features
  • A graphical user interface with drag and drop
    features and context menus allows a quick and
    efficient data handling
  • Powerful word processor plug-in
  • supports tables and images
  • reads and writes RTF and HTML formats
  • reads and writes MS Word documents
  • Multimedia editors for sounds and images
  • Supports various file formats (jpg, bmp, ico,
    emf, wmf)
  • Data exchange with other applications via files,
    clipboard and drag drop
  • SuperThes supports a wide range of additional
    data types
  • Boolean, decimal, list, memo, short long text,
    geographic coordinates, others (customisable)

27
Thank You!
  • http//uta.iia.cnr.it
  • uta_at_iia.cnr.it
  • ? 39 06 90672 712/270
  • ? 39 06 90672 660

Information on SuperThes rudolf.legat_at_umweltbund
esamt.at
Write a Comment
User Comments (0)
About PowerShow.com