Controlled vocabularies: Thesauri and information retrieval - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Controlled vocabularies: Thesauri and information retrieval

Description:

Queensland University of Technology FIT School of Information Systems MM 3. CRICOS ... Syndetic measures. Connectedness. Accessibility. 39. www.fit.qut.edu.au ... – PowerPoint PPT presentation

Number of Views:887
Avg rating:3.0/5.0
Slides: 43
Provided by: fit62
Category:

less

Transcript and Presenter's Notes

Title: Controlled vocabularies: Thesauri and information retrieval


1
Controlled vocabulariesThesauri and information
retrieval
  • Michael Middleton
  • QUT School of Information Systems, Brisbane,
    Australia
  • m.middleton_at_qut.edu.au
  • for
  • STIMULATE 5
  • Vrije Universiteit Brussel
  • Brussels, Belgium
  • July, 2005

2
Introduction
  • Context .. History
  • Vocabulary principles
  • Thesaurus software
  • Thesaurus building . application
  • Thesaurus evaluation
  • The future

3
Context Information life cycle
create
  • Organise to maintain

distribute
dispose
store
use
reuse
maintain
recall
4
Context Information management
  • Domains
  • Operational
  • Analytical
  • Strategic

5
Context indexing
  • Producing representations of records or documents
    that constitute a finding aid to the records in a
    database or to part of a document
  • Assigned indexing
  • Derived indexing

6
Indexer qualities
  • The Art of assigned indexing
  • Empathy
  • Meticulousness
  • Consistency
  • General knowledge
  • Patience

7
Indexing guidelines
  • Conceptual analysis and assigning
  • Aboutness
  • Elements of the document to consider
  • Exhaustivity
  • Specificity
  • Index what is in the item
  • Co-ordination

8
Assigned index representations
  • Alphabetical Subject
  • Classified
  • Alphabetical
  • Notation
  • Chain

9
Indexing exercise
  • How consistent is database indexing?
  • Example the same paper in multiple databases
  • Middleton, M Skills expectations of library
    graduates http//eprints.qut.edu.au/archive/000000
    94/
  • Index it yourself
  • Compare your indexing with others
  • Compare the indexing in ERIC and INSPEC

10
Context metadata
  • Agent
  • Document description
  • Responsibility
  • Administrative
  • Provenance
  • Connections
  • Conditions of use

11
Context metadata
  • Content
  • Topic (application of vocabulary control)
  • Coverage
  • Role

12
Controlled vocabulary
  • Thesaurus
  • A controlled vocabulary of terms in natural
    language that are designed for post-coordination
  • Classification scheme
  • A scheme for organisation by categories in a
    systematic manner this may involve grouping by
    subject, function or other criteria, or
    determining document naming conventions
  • Often involves notation

13
Purpose
  • Indexing by translating diverse natural language
    to consistent terminology
  • Establishing relationships among terms
  • Information retrieval improving precision and
    recall

14
History
  • Bibliographic databases
  • Many applications, list of online associated
    thesauri and classification schemes at
    http//sky.fit.qut.edu.au/middletm/cont_voc.html
  • Standards
  • ISO2788 ISO 5964
  • ANSI Z39.19

15
Thesaurus principles
  • Term relationships
  • Continuing evolution
  • Internally consistent hierarchies to support
    database searching

16
The Thesaurus
  • The vocabulary of a controlled indexing language
    formally organised so that the a priori
    relationships between concepts are made explicit.
  • A thesaurus is an example of metadata

17
Thesaurus extract (ISO sample)
  • 35 mm CAMERAS
  • BT MINIATURE CAMERAS
  • CAMERAS
  • BT OPTICAL EQUIPMENT
  • NT MOVING PICTURE CAMERAS
  • STEREO CAMERAS
  • STILL CAMERAS
  • UNDERWATER CAMERAS
  • RT PHOTOGRAPHY
  • CINE CAMERAS
  • BT MOVING PICTURE CAMERAS
  • NT UNDERWATER CINE CAMERAS
  • RT CINEMA
  • CINEMA
  • RT CINE CAMERAS
  • INSTANT PICTURE CAMERAS
  • SN Cameras which produce a finished
  • print directly
  • BT STILL CAMERAS
  • Land cameras USE VIEW CAMERAS
  • MICROSCOPES
  • BT OPTICAL EQUIPMENT
  • MINIATURE CAMERAS
  • BT STILL CAMERAS
  • NT 35 mm CAMERAS
  • MOVING PICTURE CAMERAS
  • BT CAMERAS
  • NT CINE CAMERAS
  • TELEVISION CAMERAS

18
(No Transcript)
19
Standardising the Vocabulary
  • Types of entities forms of terms
  • Singular vs plural
  • Homonyms
  • Choice of terms
  • Scope notes and history notes

20
Compound terms
  • Terms should be factored into simpler elements to
    improve users understanding.
  • Semantic factoring
  • Syntactic factoring

21
Semantic Relationships
  • Equivalence
  • Establishing relationships between preferred
    (postable) and non-preferred (non-postable) terms
  • Hierarchical
  • Establishing relationships between subordinate
    and superordinate terms. These may be
    distinguished as
  • Generic
  • Whole-part
  • Instance
  • Associative
  • Establishing relationships between terms that are
    mentally associated, but not equivalent or
    hierarchical

22
but, the Functions thesaurus
  • Whereas
  • agenda papers might have
  • broader term documents
  • In a functions thesaurus
  • agenda papers might have
  • broader term meetings

23
Applying a functional thesaurus
  • Top Term
  • PERSONNEL
  • Scope Notes The function of managing all
    employees
  • Related Terms
  • COMPENSATION
  • ESTABLISHMENT
  • INDUSTRIAL RELATIONS etc, etc
  • Narrower Terms
  • ALLOWANCES
  • APPEALS (Decisions)
  • APPOINTMENT
  • ARRANGEMENTS
  • AUTHORISATION
  • COMMITTEES
  • COMPLIANCE etc, etc
  • Use For Terms

24
(No Transcript)
25
Thesaurus Display
  • Alphabetical hierarchies
  • One level above and below entry term
  • Complete hierarchy for each term or separate TT
    display
  • Permuted term lists
  • Combination with classification notation
  • Graphic Displays

26
Applying a thesaurus
  • Download Term Tree from http//www.termtree.com.au
  • Free trial download from

27
Thesaurus software
  • Assigned
  • Integrated database
  • Deriving terminology

28
Thesaurus software - assigned
  • Terms are assigned by vocabulary specialists in
    independent database
  • a.k.a.
  • Synercon Management Consulting
  • MultiTes
  • OpenCyc
  • SuperTHES
  • from THESmain/THESshow for mono-/multilingual
    thesauri
  • Term Tree 2000
  • WebChoir
  • Wordmap

29
Thesaurus software integrated database
  • Terms are assigned by specialists, thesaurus
    works like active data dictionary to control
    database
  • BASIS
  • InMagic Bibliotech PRO
  • BRS/Search
  • STAR

30
Thesaurus software for deriving terminology
  • Terms are created automatically from text
  • Entrieva
  • SemioTagger, SemioMap and SemioSkyline for
    viewing
  • Intology
  • taxonomy builder
  • Verity
  • Thematic Mapping
  • Autonomy
  • taxonomy generation categorization

31
Thesaurus Building - 1
  • Users
  • Define
  • Identify needs
  • Define Thesaurus range depth
  • Raw vocabulary building
  • Identify sources
  • Collect and record terms

32
Thesaurus Building -2
  • Vocabulary organisation
  • Cluster terms
  • Establish relationships using symbols
  • Maintenance

33
Business application
  • Not long term collaborative efforts of
    classification specialists
  • Instead, adapt to business changes
  • Not just descriptions of present business
    processes
  • Instead, reflect strategic planning, competitors
  • Not necessarily a single taxonomy
  • Instead, multiple overlapping taxonomies

34
Content management
  • Describe content as its being created rather
    than classify after creation
  • User-needs orientation

35
Integrating taxonomies
  • Accurate reporting
  • Exchange of data
  • Assist resource discovery
  • Information retrieval

36
Thesaurus evaluation
  • Qualities
  • Information retrieval evaluation

37
Thesaurus Qualities
  • Scope and features description
  • Display forms
  • Correctness of hierarchies
  • Use of scope, history and qualification
  • Adherence to standards
  • Syndetic measures
  • Connectedness
  • Accessibility

38
Thesauri Retrieval evaluation
  • Cranfield experiments since
  • Recall and precision
  • Influence on indexing
  • Conceptual analysis
  • Translation failure
  • Omissions
  • Exhaustivity/Specificity
  • Syntax and false drops
  • Maintenance costs

39
Post-controlled vocabularies
  • Use of a Hedge of terms to represent a broad
    concept, eg
  • psychological aspects of..........
  • ........in Australia
  • ....review items on.....

40
Still to come
  • Research areas
  • Metathesauri
  • Super interlinked vocabularies (e.g. NLM)
  • Semantic Web
  • Enhancing word association with usage statistics
    like links (e.g. THESUS)

41
Review
  • Controlled vocabulary types
  • Software support
  • Business processes
  • Website
  • http//sky.fit.qut.edu.au/middletm/cont_voc.html
  • (about to move to database driven site
    redirection will be applied)

42
Questions?
Write a Comment
User Comments (0)
About PowerShow.com