Thesaurus Mapping - PowerPoint PPT Presentation

About This Presentation
Title:

Thesaurus Mapping

Description:

statistical, possible synonyms: - for information retrieval. term - term relations : dictionary entries: - limited precision, within LE tools. 7. ICS-FORTH January ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 22
Provided by: stav4
Category:

less

Transcript and Presenter's Notes

Title: Thesaurus Mapping


1
Thesaurus Mapping
Martin Doerr
Centre for Cultural Informatics and Documentation
Systems
Institute of Computer Science
Foundation for Research and Technology - Hellas
Bath, UK, January 11, 2000
2
Thesaurus MappingThe Problem
  • Logical aspects
  • Semantics of involved entities
  • Notions of translation
  • Objectives and logics of mapping
  • Production of mappings
  • Human
  • Language engineering, cluster analysis
  • Architecture
  • Mapping management
  • Mapping service
  • Integration in IT environment

3
Thesaurus MappingWhy do we need mapping?
  • Thesauri for information retrieval depend on
  • View point (e.g. functional, morphological,
    social,
  • special database fields etc.)
  • Language or social group (experts, common people
    etc.)
  • Size and distribution of target material
    (effective partitioning)
  • Therefore
  • Concepts differ
  • Use of concepts differs
  • Semantic embedding differs
  • Even if we agree on the same world
  • Research topic Formalisation of views and
    context

4
Thesaurus Mapping Semantics of entities
  • Concepts are defined by agreement,
  • e.g. orange (colour)
  • Concepts identify sets of real world objects
  • Concepts are identified by
  • scope notes, literature references, examples,
    images
  • Concepts should not be changed
  • they should be created or abandoned
  • they should be understood, accepted or rejected
  • A Descriptor is a concept identifier

5
Thesaurus Mapping Semantics of entities
  • Links should express opinions and differences
  • about set relation between concepts
  • subsumtion, disjointness etc.
  • about derived concepts
  • about term usage
  • opinions may be human or computational !
  • Terms (noun phrases) should be used
  • by social groups to refer to (multiple) concepts
  • without direct linguistic meaning
  • one term is selected as concept identifier

6
Thesaurus Mapping Semantics of entities
  • concept - concept relations
  • set semantics
  • BT, between thesauri/ version - for query
    expansion, users
  • associative RTs, BTP, etc, - for user
    guidance
  • concept - term
  • authoritative preferred, used for - for
    cataloguers, users
  • statistical, possible synonyms - for
    information retrieval
  • term - term relations
  • dictionary entries - limited
    precision, within LE tools

7
Thesaurus Mapping What is a Multilingual
Thesaurus?
  • A translated thesaurus For comprehension
  • Established concepts and terms from one user
    group
  • Optimally interpreted in words of another or
    more languages
  • Translations are not established terms
  • Mapped thesauri (ISO5964) For transition
  • Independent thesauri, each one from another user
    group
  • Established concepts and terms.
  • links declare overlap between concepts
  • Interlingua For communication and knowledge
    sharing
  • Compromise to share concepts between many user
    groups
  • Optimally interpreted in words of another
    language

8
Thesaurus Mapping Functionality of Mapping
  • Transparent query transformation (Z39.50!)
  • Replace Boolean term combination from thesaurus
    A with optimal term combination from thesaurus B
    to retrieve equivalent results
  • Guaranteed transition needed (ev. to higher
    concepts)
  • Need controlled loss of precision or recall
    (research!)
  • Combinatorial explosion
  • Need cascading Thes A gt Thes B gt
    Thes C

9
Thesaurus Mapping Logics of Mapping
  • Interthesaurus relations (ISO 5964)
  • (from Descriptor of Thes. A to Descriptor of
    Thes. B )
  • partial equivalence
  • Better broader equivalence
  • narrower equivalence
  • exact equivalence
  • inexact equivalence (/-)
  • good for FTR only
  • single to multiple equivalence
  • Betterexact equivalence to BOOLEAN
    combination of
  • target terms.
  • AND
    (intersection), OR (union), NOT (complement)

10
Thesaurus Mapping Translation and Mapping
English Heritage Thesaurus
Merimee Thesaurus
AND
Interthesaurus relations
linguistic translation
linguistic translation
Interlingua
English Vocabulary
French Vocabulary
11
Thesaurus Mapping Boolean OR-Combinations
  • Combines instances of B and C
  • Uses properties of either B or C
  • Is BT of B, C and NT of
  • their common broader terms.

Exact equivalence
A
BT
B OR C
Boolean Compound
B
C
12
Thesaurus Mapping Boolean AND-Combinations
  • Uses instances of both, B and C
  • Combines properties of B and C
  • Is NT of B, C and BT of their
  • common narrower terms.

C
B
BT
A
Exact equivalence
B AND C
Boolean Compound
13
Thesaurus Mapping Approximation by Inclusion
Broader equivalence
A
BT
B
Narrower equivalences
C
14
Thesaurus Mapping Avoid redundant linking!
Broader equivalence
B
A
BT
Exact equivalence
Narrower equivalences
15
Thesaurus Mapping Problems of Mapping
  • Consistency and reasoning (Description Logics!)
  • Optimal substitution of combined query terms
  • Protocol to propagate recall/ precision control
  • Inverse reading of one-to-many links.
  • Postcoordination unclear semantics !
  • e.g. grinding factories, solution by DL ?

16
Thesaurus Mapping Production of Mappings
  • Human assessment needs (see Term-IT)
  • CSCW, work flow, decentralised management tools
  • Excellent comparative presentation of thesaurus
    contents
  • Language engineering (see Term-IT)
  • termhood recognition, automatic translation by
    parallel texts,
  • filtering by occurrence in target indexing
    language.
  • Excellent for preprocessing !
  • Analysis of use
  • Cluster analysis with doubly indexed entries.
  • Libraries problem to identify the same work !

17
SIS - Thesaurus Management System Co-operative
linking
Group 1
Group 2
Version 0
Version 0
Version 1
Version 1
Version 2
New Workspace
New Workspace
obsolete term
18
Thesaurus MappingUsers Environment
19
Thesaurus MappingThree-level Architecture
End User
National Authority Providers
Local TMS
Local TMS
concept proposal
Thesaurus initialization
concept proposal
Thesaurus initialization
Update term use
Update term use
CMS Maintainer
CMS Maintainer
CMS
CMS
20
Thesaurus Mapping Architectural Considerations
  • We propose to distinguish
  • Collection Management Systems with local term
    management
  • National authority providers
  • Mapping service
  • Mapping service
  • Co-operative mapping production environment and
    system,
  • - for few languages (3?), domain specific ?
  • Large scale mapping tables detached from
    production system,
  • accessible as replicated Web resource.
  • Integration
  • Access engines connect to mapping resources on
    demand
  • Provision of suitable metadata for CMS
    capabilities

21
Thesaurus Mapping Conclusions
  • Thesaurus mapping is feasible and the best means
    to access coherently multiple CMS with controlled
    vocabulary
  • Thesaurus mapping is a major investment in human
  • resources and IT environment
  • Targeted research can much improve the currently
    feasible
  • - quality of mapping
  • - quality of service
  • - and production cost
Write a Comment
User Comments (0)
About PowerShow.com