Thesaurus Construction and Use - PowerPoint PPT Presentation

About This Presentation
Title:

Thesaurus Construction and Use

Description:

... for Hierarchical Classification LITERATURE 100 English Literature 110 English Prose English Prose 16th Century English Prose 17th Century English Prose 18th ... – PowerPoint PPT presentation

Number of Views:3061
Avg rating:3.0/5.0
Slides: 59
Provided by: ValuedGate633
Category:

less

Transcript and Presenter's Notes

Title: Thesaurus Construction and Use


1
Thesaurus Construction and Use
  • University of California, Berkeley
  • School of InformationIS 245 Organization of
    Information In Collections

2
Lecture Overview
  • Review
  • Facetted Classification
  • Traditional vs. Facetted Classification
  • Designing Facetted Classifications
  • Today
  • Thesaurus design
  • Steps in Thesaurus development
  • Indexing

3
Hierarchical Classification
Slide author Marti Hearst
4
Labeled Categories for Hierarchical Classification
  • LITERATURE
  • 100 English Literature
  • 110 English Prose
  • English Prose 16th Century
  • English Prose 17th Century
  • English Prose 18th Century
  • ...
  • 111 English Poetry
  • 121 English Poetry 16th Century
  • 122 English Poetry 17th Century
  • ...
  • 112 English Drama
  • 130 English Drama 16th Century
  • 200 French Literature

Slide author Marti Hearst
5
Facetted Categories
  • Mutually exclusive
  • Non-overlapping, distinct categories
  • Relational
  • Relations between facets, subfacets, and foci
    (elements) are not restricted to hierarchical
    generalization-specialization relations
  • Composable
  • Combined using grammars of order and relation to
    form compound descriptions

6
Facetted Classification Along With Labeled
Categories
  • A Language
  • a English
  • b French
  • c Spanish
  • B Genre
  • a Prose
  • b Poetry
  • c Drama
  • C Period
  • a 16th Century
  • b 17th Century
  • c 18th Century
  • d 19th Century
  • Aa English Literature
  • AaBa English Prose
  • AaBaCa English Prose 16th Century
  • AbBbCd French Poetry 19th Century
  • BbCd Drama 19th Century

Slide author Marti Hearst
7
Ranganathan
  • PMEST Facets
  • P(ersonality)
  • WHO The most important types or names of things
    for the particular discipline
  • M(atter)
  • WHAT Constituent materials
  • E(nergy)
  • HOW Action or activity terms
  • S(pace)
  • WHERE Where things occur
  • T(ime)
  • WHEN When things occur

8
Classical CRG/BC2 Facet Analysis
  • Entity
  • Kind
  • Part
  • Property
  • Material
  • Process
  • Operation
  • Patient
  • Product
  • By-Product
  • Agent
  • Space
  • Time

9
Classical Facet Analysis
  • What is being done?
  • Entity
  • Kind
  • Product
  • By-Product
  • What are its parts?
  • Part
  • What are its properties?
  • Property
  • Material
  • How is this achieved?
  • Process
  • By what means?
  • Operation
  • By whom?
  • Agent
  • Patient
  • Where?
  • Space
  • When?
  • Time

10
Classical Facet Analysis
  • Nouns
  • Entity
  • Kind
  • Part
  • Patient
  • Product
  • By-Product
  • Agent
  • Adjectives
  • Property
  • Material
  • Intransitive Verb
  • Process
  • Transitive Verb
  • Operation
  • Adverb
  • Space
  • Time

11
Semantic and Syntactic Relationships
  • Semantic relationships
  • Is-A (thing/kind, genus/species)
  • Mammals
  • Primates
  • Humans
  • Has-Parts
  • Human
  • Head
  • Eyes
  • Syntactic relationships
  • Compounds
  • Wheat harvesting wheat harvesting
  • Object operation operation on object

12
Facetted Classification
  • Clearly distinguishes between semantic
    relationships and syntactic relationships
  • Semantic relationships
  • Within a facet
  • Containment relations
  • Syntactic relationships
  • Across facets
  • Combinatoric relations
  • Have a syntax for syntactic combination of
    semantic terms

13
Power of Facet Combinations
  • The syntactic relations of facetted
    classifications enable a small controlled
    vocabulary to produce
  • Many, many structured descriptions
  • Complex, but formally structured descriptions
    using nested compound descriptions
  • Descriptions for things we do not have words for

14
Today
  • More on thesaurus standards and examples

15
Types of Indexing Languages
  • Uncontrolled keyword indexing
  • Indexing languages
  • Controlled, but not structured
  • Thesauri
  • Controlled and structured
  • Classification systems
  • Controlled, structured, and coded
  • Facetted classification systems

16
Thesauri
  • A Thesaurus is a collection of selected
    vocabulary (preferred terms or descriptors) with
    links among synonymous, equivalent, broader,
    narrower and other related terms

17
Thesaurus Standards
  • National and International Standards for Thesauri
  • ANSI/NISO z39.19-1994 American National
    Standard Guidelines for the Construction, Format
    and Management of Monolingual Thesauri
  • ANSI/NISO Draft Standard Z39.4-199x American
    National Standard Guidelines for Indexes in
    Information Retrieval
  • ISO 2788 Documentation Guidelines for the
    establishment and development of monolingual
    thesauri
  • ISO 5964 Documentation Guidelines for the
    establishment and development of multilingual
    thesauri

18
Thesaurus Examples
  • Examples
  • Non-Facetted
  • The ERIC Thesaurus of Descriptors
  • Semi-Facetted
  • The Medical Subject Headings (MESH) of the
    National Library of Medicine
  • Facetted
  • The Art and Architecture Thesaurus

19
ERIC Thesaurus Entry
20
ERIC Thesaurus Alphabetic
21
ERIC Thesaurus KWIC Index
22
ERIC Thesaurus Hierarchies
23
ERIC Thesaurus Groups
24
ERIC Thesaurus Online
http//www.ericfacility.net/extra/pub/thessearch.c
fm
25
MESH Entry
26
MESH Alphabetic
27
MESH Tree Structures
28
MESH KWOC Index
29
MESH - Online
http//www.nlm.nih.gov/mesh/meshhome.html
30
AAT Facets
31
AAT Hierarchies (print)
32
AAT Hierarchies (online)
http//www.getty.edu/research/tools/vocabulary/aat
/
33
AAT Entry (online)
34
Lecture Overview
  • Thesaurus Design and Development
  • Controlled Vocabularies for topical description
  • Thesaurus Design
  • Steps In Thesaurus Development (intro)

35
Why Develop a Thesaurus?
  • To provide a conceptual structure or space for
    a body of information
  • To make it possible to adequately describe the
    topical content of information resources at an
    appropriate level of generality or specificity
  • To provide enhanced search capabilities and to
    improve the effectiveness of searching (i.e., to
    retrieve most of the relevant material without
    too much irrelevant material)

36
Why Develop a Thesaurus?
  • To provide vocabulary (or terminological) control
  • When there are several possible terms designating
    a single concept, the thesaurus should lead the
    indexer or searcher to the appropriate concept,
    regardless of the terms they start with

37
Preliminary Considerations
  • What is used now?
  • Continue using an existing thesaurus?
  • Ad hoc modification of existing thesaurus?
  • Develop a new well-structured thesaurus?
  • What is the scope and complexity of the subject
    field?
  • What kind of retrieval objects or data will be
    dealt with?
  • How exhaustive and specific is the desired
    description of objects?

38
Preliminary Considerations
  • The scope and complexity of the field will
    provide some indication of the scope and
    complexity of the thesaurus
  • It is better to plan for a larger and more
    comprehensive system than a smaller system that
    rapidly will become inadequate as the database
    grows
  • Development of a good thesaurus requires a major
    intellectual effort as well as clerical
    operations like data entry and production of
    sorted lists

39
Development of a Thesaurus
  • Term Selection.
  • Merging and Development of Concept Classes.
  • Definition of Broad Subject Fields and Subfields.
  • Development of Classificatory structure
  • Review, Testing, Application, Revision.

40
1. Term Selection
  • Select sources for the collection of terms.
  • Prearranged Sources
  • Open-ended Sources
  • Assign codes to each source.
  • Selection of terms
  • For part of pre-arranged and for all open-ended
    sources
  • Enter terms into database with all information.

41
1.1 Kinds of Sources
  • Prearranged Sources
  • Existing descriptor lists, classification schemes
    thesauri. This includes universal schemes like
    DDC or LCSH.
  • Nomenclatures of single disciplines
  • Treatises on the terminology of a field
  • Encyclopedias, lexica, dictionaries and
    glossaries.
  • Tables of contents of textbooks and handbooks
  • Indexes of journals or abstracting journals
  • Indexes of other publications in the field

42
1.1 Kinds of Sources
  • Open-ended sources
  • Lists of search requests or interest profiles
  • Description of projects/activities to be served
    by the information retrieval system.
  • Discussion with specialists in the field
  • Sample of documents in the field
  • Ask users why and how these documents relate to
    the field.
  • Have documents indexed by experts in the field
  • Lists of titles of documents in the field
  • Abstracts and reviews of documents
  • Your own knowledge

43
Selection of sources
  • Prearranged sources require less effort in
    gathering the material, and may already indicate
    some relationships between terms and concepts and
    relationships among terms.
  • Open-ended sources can reflect current
    terminology and may provide more complete
    coverage.
  • Choose a set of sources that are current, as
    complete as possible, and considered
    authoratative.

44
Selection of Sources
  • Each selected source is assigned an ID for
    tracking its use in the development of the
    thesaurus.
  • Useful when making decisions about which terms to
    prefer
  • Useful for backtracking when questions arise
    (where did this come from?)

45
Selection of Terms
  • Terms can be transferred directly from
    prearranged sources to the recording medium
    (cards or database)
  • Have to decide which terms and references to
    include, or to take the whole source

46
Selection of Terms
  • In open-ended sources you read through the source
    and pick out terms (I.e. words and phrases) that
    might be useful in retrieval or as references to
    other terms.
  • Alternatively, use keyword and phrase extraction
    software to create lists of terms and select from
    those.
  • Transfer selected terms to the recording medium
    (cards or database).

47
2. Merging and Development of Concept Classes
  • Sort Term DB into alphabetical order.
  • First Round Merge information for Identical
    terms -- possibly pulling info from additional
    sources.
  • Second Round Merge synonyms or terms in the same
    concept class.

48
3. Definition of Broad Subject Fields and
Subfields
  • Define Broad Subject fields and sort terms into
    these broad fields
  • Define subfields within each broad field and sort
    terms into these subfields.
  • Work out the detailed structure
  • Select Preferred Terms
  • Merge information for terms in the same concept
    class
  • Repeat these steps
  • for each subfield within a broad field
  • and for each broad field
  • Until all terms have been consolidated and
    preferred terms selected

49
4. Development of Classificatory Structure
  • Produce preliminary version of classified index
    and update the working database.
  • Improve classificatory structure
  • Reality check produce and distribute a version
    of the classified index. Distribute to
    users/experts.

50
5. Final Stages
  • Review
  • Testing
  • Application
  • Revision

51
Review
  • Discuss classified index with users/experts.
  • Select descriptors and checklist descriptors.
  • Assign Notational Symbols
  • Produce Main Thesaurus Indexes

52
Review (cont.)
  • Check cross references and insert where needed
  • Produce Test Version
  • Test by Indexing
  • Modify as needed
  • Produce Production Version.

53
Testing a Thesaurus
  • Assign descriptors to a sample set of NEW
    documents (use enough to get an idea of any gaps
    in the thesaurus.
  • Test retrieval using sample questions and seeing
    how effectively the thesaurus maps to the
    appropriate descriptor

54
Flow of Work in Thesaurus Construction
55
The Indexing Process
  • Concept identification
  • term selection (via thesaurus)
  • term assignment

56
Application The Indexing Process (Manual)
57
Thesaurus Revision and Updates
  • There will always be new concepts, products, or
    expressions that need to be added to the
    thesaurus.
  • Set a regular schedule of reviews and revisions.
  • Collect complaints, problems, etc. and fold into
    revision of the thesaurus

58
References
  • Soegel, D. Indexing Languages and Thesauri
    Construction and Maintenance. Los Angeles
    Melville Publishing Co., 1974
  • Foskett, A.C. The Subject Approach to
    Information. London Clive Bingley, 1982.
  • Standards
  • ANSI/NISO z39.19--1994 -- American National
    Standard Guidelines for the Construction, Format
    and Management of Monolingual Thesauri
  • ANSI/NISO Draft Standard Z39.4-199x -- American
    National Standard Guidelines for Indexes in
    Information Retrieval
  • ISO 2788 -- Documentation -- Guidelines for the
    establishment and development of monolingual
    thesauri
  • ISO 5964-- Documentation -- Guidelines for the
    establishment and development of multilingual
    thesauri
Write a Comment
User Comments (0)
About PowerShow.com