Title: Enhancing search An update on taxonomies, metadata and thesauri
1Enhancing searchAn update on taxonomies,
metadata and thesauri
- Leonard Will
- Willpower Information
2Summary
- Metadata creation is cataloguing
- Taxonomies are classifications
- Thesauri and classifications are complementary
ways of grouping concepts - Facet analysis is a useful technique for
constructing schemes systematically - Most computer search interfaces are inadequate
3Metadata catalogue records
- Resources any things that can be identified
- documents, web pages, images, sound files,
teaching packages, books, museum objects, people,
organisations - Metadata structured information about resources
- May be included with resources (e.g. CIP) or
collected in separate union catalogues (e.g.
OAI-PMH) - Some from the resource itself (size, format),
some from external sources (provenance, location,
accessibility)
4Metadata standards
- Anglo-American Cataloguing Rules (AACR)
- Encoded Archival Description (EAD)
- Learning Object Metadata (LOM)
- Spectrum standard for museum information
- Friend of a Friend (FOAF) and vCard
- e-Government Metadata Standard (eGMS)
- Dublin Core - lowest common denominator
5Kinds of standards
- Content standards which pieces of information
are to be recorded (DC, AACR) - Value standards how is the information to be
recorded ( DC encoding schemes) - formats (ISO date format, NCA name formats, AACR)
- lists of valid values (thesauri, authority files)
- Structure standards how the information is to be
grouped and labelled for use by computers and
humans (XML schemas, MARC) - Application profiles Choices from the above
6Dublin Core metadata
- Title
- Creator
- Subject
- Description
- Publisher
- Contributor
- Date
- Type
- Format
- Identifier
- Source
- Language
- Relation
- Coverage
- Rights
- element refinements
7Subject
- Typically, Subject will be expressed as
keywords, key phrases or classification codes
that describe a topic of the resource. - Recommended best practice is to select a value
from a controlled vocabulary or formal
classification scheme.
8Taxonomies controlled vocabularies
- Taxonomy woolly meaning -gt confusion
- keep it for biological classification systems
- Knowledge organization systems (KOS)
- a better expression for the general concept
- Main types are
- thesauri
- classification schemes
- ontologies
9Thesauri and classification schemes
- Thesauri and classification schemes are
alternative ways of showing concepts and their
relationships - They are complementary and both approaches are
needed - They can both be built on the principles of facet
analysis
10Building blocks of all knowledge organisation
schemes
35 m cameras CCH012 BT film
cameras aqualungs CC D002 BT diving
equipment camera accessories CC H002
BT photographic equipment NT flash
guns light meters tripods RT cameras
11Relationships are between concepts, not words
vehicles road vehicles conveyances voitures 388.34
629.2
BT
Choose one term as a descriptor to label the
concept cars USE automobiles
cars automobiles autos private cars 388.342 629.22
2
NT
12Preferred term substitution
I use the term agriculture for farming, so Ill
search for that
13Relationships between concepts
- Paradigmatic, or a priori apply generally,
independently of any specific document - shoes BT footwear
- shoes RT shoemakers
- Syntagmatic, or a posteriori concepts that are
related only in the context of a specific
document - shoes history
- shoes prices
A thesaurus can show these
A classification scheme can also show these
14Searching hierarchies
I know that buses,cars and lorries are all kinds
of road vehicles, so Ill search for these terms
as well as for road vehicles
I need informationon road vehicles
15Searching related terms
OK,Ill look for that. Would you also be
interested in items dealing with forestry,
livestock or pet breeding?
Please give me informationabout agriculture
16Paradigmatic relationshipsin a thesaurus
- Many relationships are indicated as RT/RT, but
their nature is not specified, so cannot be used
for systematic grouping (ontologies overcome
this) - Hierarchical generic-specific relationship
(BT/NT) allows (requires) grouping of concepts
into facets - the terms have to be in the same
facet
17What is a facet?(Sometimes called a fundamental
facet)
- A high-level grouping of concepts of the same
inherent category, e.g. activities, disciplines,
people, materials, places, times. For example - animals, mice, daffodils and bacteria could all
be members of a living organisms facet - digging, writing and cooking could all be members
of an activities facet - birthdays, wars and football matches could all be
members of an events facet. - A concept cannot belong to more than one facet
18Facets in the AAT
- associated (i.e. abstract) concepts
- physical attributes
- styles and periods
- agents
- activities
- materials
- objects
19What is an array?(Sometimes called a subfacet)
- A grouping of concepts within a facet by some
stated characteristic of division. - vehicles
- ltvehicles by number of wheelsgt
- bicycles
- tricycles
- four-wheeled vehicles
- automobiles
- ltvehicles by load carriedgt
- goods vehicles
- lorries
- passenger vehicles
- automobiles
- buses
Node labels showing characteristics of division
Array
Array
A concept may occur in more than one array
20Parametric search
- Searching for resources that have one or more
specified characteristics - e.g. vehicles which
- have three wheels AND
- are used for carrying passengers
- This is an important and useful aspect of
post-coordinate searching, but it is not faceted
classification
21Ways of displaying concepts and their
paradigmatic relationships
- 1. Alphabetically, with their relationships
35 mm cameras BT film cameras aqualungs
BT diving equipment camera accessories
BT photographic equipment NT flash
guns light meters tripods RT cameras
22Ways of displaying concepts and their
paradigmatic relationships
- 2. Hierarchically - one tree for each facet
(fields of work) . diving . photography .
physics . . optics
(people) ltpeople by agegt . infants . children
. adults ltpeople by occupationgt . divers .
models (people) . photographers . physicists
(equipment) . diving equipment . . aqualungs . .
diving suits . . . dry suits . . . wet suits . .
face masks . photo equipment . . cameras
23Ways of displaying concepts and their
paradigmatic relationships
- 3. In subject groups or categories
(microthesauri) - one tree for each facet in each category
770 PHOTOGRAPHY
(equipment) . photo equipment . . cameras
(fields of work) . photography . . colour
photography
(people) . models (people) . photographers
797.23 DIVING
(fields of work) . diving . . scuba diving . .
snorkel diving
(people) . divers
(equipment) . diving equipment . . aqualungs . .
diving suits . . . dry suits
24Combining concepts syntagmatic relationships
- (places)
- A1 Italy
- A2 The Netherlands
- A3 Russia
- (people)
- B1 potters
- B2 repairers
- B3 ceramicists
(activities) C1 moulding C2 throwing C3 decora
tion (objects) D1 earthenware D2 porcelain D3
stoneware
Node labels showing facet names
Combine to express compound subjects - either
post-coordinate, for searching porcelain AND
decoration AND Russia or pre-coordinate, for
browsing porcelain decoration in Russia D2C3A3
25Order of combining facets
- thing - kind - part - property - material -
process - operation - system operated on -
product - by-product - agent - space - time -
form - e.g.
- porcelain (thing) -
- decoration (process) -
- in Russia (space)
- A facet may occur more than once in a string
26Faceted classificationwith processes
subordinated to objects
- (processes)
- A ceramic production processes in general
- AA forming in general
- AAA coiling
- AAB moulding
- AAC throwing
- AB decoration in general
- ABA glazing
- ABB transfer printing
- (objects)
- B ceramics in general
- BB earthenware in general
- (processes)
- BB.AA forming of earthenware
- BB.AAB moulding of earthenware
- BB.AB decoration of earthenware
- BB.ABA glazing of earthenware
- BB.ABB transfer printing of earthenware
- BC porcelain in general
Words shown in blue may be omitted as they are
implied by the hierarchical structure
27Faceted classificationgeneration of subject
strings
- (objects)
- B ceramics
- BB earthenware
- (processes)
- BB.AA forming
- BB.AAB moulding
- BB.AB decoration
- BB.ABA glazing
- BB.ABB transfer printing
- BC porcelain
- (processes)
- BC.AA forming
- BC.AAB moulding
- ceramics gt earthenware gt forming
- ceramics gt earthenware gt forming gt moulding
- ceramics gt earthenware gt decoration
- ceramics gt earthenware gt decoration gt glazing
- ceramics gt earthenware gt decoration gt transfer
printing
28Alphabetical index
- ceramic production processes A
- ceramics B
- coiling forming ceramic production AAA
- decoration ceramic production AB
- decoration earthenware ceramics BB.AB
- earthenware ceramics BB
- forming ceramic production AA
- forming earthenware ceramics BB.AA
- forming porcelain ceramics BC.AA
- glazing decoration ceramic production ABA
- glazing decoration earthenware
ceramics BB.ABA - moulding earthenware ceramics BB.AAB
- moulding forming ceramic production AAB
- moulding porcelain ceramics BC.AAB
- porcelain ceramics BC
- throwing forming ceramic production AAC
- transfer printing decoration ceramic
production ABB - transfer printing decoration earthenware
ceramics BB.ABB
29The same concepts viewed in different ways
- Classification view
- Good for browsing or surveying a topic
- Like a map
- Like a books contents page
- Shows related concepts together
- Usually arranged by discipline
- Shows syntagmatic and paradigmatic relationships
- Shows compound topics as pre-combined subject
strings
- Thesaurus view
- Good for searching if you know what you want
- Like a gazetteer
- Like a books index
- Gets quickly to individual concepts
- Usually arranged by facet
- Shows paradigmatic relationships
- Lets you combine concepts when searching
30Some clarifications
- A classification can be both hierarchical and
faceted - A classification built on faceted principles can
be enumerative - A symbolic notation is not essential, and should
not determine the structure - A classification can arrange compound topics in a
useful linear sequence - a thesaurus cannot - One-to-one mapping between a thesaurus and a
classification is not possible - A guide to popular topics may be used to
supplement a systematic classification
31Use of a thesaurus
- A thesaurus as a search aid with unindexed
material - Allows searching on terms linked to the term
asked for - Software support for formulating questions
- Browsing the thesaurus to choose terms
- Combining terms with AND, OR, NOT and ( )
32An ambiguous search interface
Does this mean (lorries OR cars) AND diesel
? or does it mean lorries OR (cars AND
diesel) ?
33Thesaurus creation and management
- Standards
- BS/ISO standards give helpful guidance
- Draft revised BS standard now out for comments
- Software
- Many packages available
- Best if integrated with database used for
cataloguing - Cooperative thesaurus development and use
- DIY is a major and continuing task
34Thesaurus development never ends
- It is an ongoing task
- It needs a knowledgeable thesaurus editor
- It needs cooperation and input from indexers and
users - User feedback
35What we need
- Software for the combined development of
thesaurus and classification - Thesaurofacet Classaurus ROOT Bliss Taxomita
- Software support for combining facets when
searching, using a thesaurus. Often referred to
as faceted classification, but not the same thing - Flamenco View-based searching No zero match
(NZM) - Software support for browsing in a classified
catalogue with notation, captions and an
alphabetical index
36Links and further information
- lthttp//www.willpowerinfo.co.uk/gt