Title: Recap
1Recap
- Descriptive metadata elements can be used for
access or selection - For access, it is important to have good
authority control to enable the users to - Find known items from the information they have
available - Gather all the items of a similar nature together
- Choose the right one from among retrieved items
- Authority control takes time and effort, but pays
off in better results for users - Need to balance cost against benefits and make a
decision on your approach for each project - Dont do it halfway, because its not worth it
2Module 5b Subject Analysis and Indexing
- IMT530 Organization of Information Resources
- Winter 2008
- Michael Crandall
3Module 5b Outline
- Subject analysis
- Definition
- Why do this?
- Mais domain-centered analysis
- Consistency
- Subject indexing
- Definition and purpose of subject indexing
- Types of subject indexing
- Indexing non-text objects
- Types of terms used in subject indexing
- The subject indexing process
4Some Questions
- Library catalogs often lump fiction into one
subject heading why? - Would you describe the subject of The
Organization of Information to your mother the
same way you would to a classmate? - Would you use the same subjects to describe
Chapter 9 in Taylor that you would to describe
the whole book? - If you wanted to assign a subject to your kitchen
or garage, what would it be? - What if you had to describe snow to a Papua New
Guinea native? What words would you use? Would
they be the same for an Inuit? - How do you describe the subject of a picture or
film?
5Subject Analysis - Definition
- The process of determining the subject and other
content-related attributes of an object - The purpose of subject analysis is to come to an
understanding of or judgment regarding - what an object is about, in the context of how it
might be used - what an object exemplifies
- what discipline (or other aspect, including
community) an object reflects (for classification)
6Why Subject Analysis?
- One of the primary means of access to information
is through subjects - In order for a computer to access those subjects,
there has to be some way to get to them an index
of some kind - Remember Soergels model, and the necessity for a
means to match user requests to information
objects - Automatic indexing works for some situations, but
not all - As well see, subject concepts are not
necessarily contained in words (especially not in
images!!) - A specific audience may dictate specific analysis
7Wilson on Subjects
- One of the main purposes of Wilsons chapter on
subjects is to analyze the subject analysis
process to take it apart - Starts with the words, then the sentences, then
the work itself, and asks questions about how you
can elicit descriptions of aboutness - Wilson suggests four different ways to approach
this - Purposive- why did the author write
- Figure-ground what stands out among all the
possible subjects - Objective- count what is most frequently
mentioned - Appeal to unity and completeness- what questions
are answered within the work - Ultimately, he concludes that any extraction will
miss some part of the work, and not satisfy some
user
8Subject Analysis in Context
- Subject analysis should always be done in context
- Context considerations include
- user (children, medical practitioners, etc.)
- uses (developing egg substitutes, learning how to
cook) - the document itself (the text of a document,
intended audience, uses, etc.) - institution (public library, corporate intranet)
- administrative and information systems context
9Mais Domain-Centered Approach
10Relevance
- Taylors stages in development of an information
need - The visceral need
- The conscious need
- The formalized need
- The compromised need
- Relevance is usually measured against the last of
these, while ignoring the more complex
situational aspects that affect the other states - Mai concludes that evaluation should be less
mechanistic (focused on terminology matches) and
more humanistic (focused on the visceral needs) - Requires contextual analysis and qualitative
research rather than just precision/recall
measures
11Consistency
- Taylor points out the difficulty of getting
people to assign similar subjects to objects - But when controlled vocabularies and rules for
selecting subject terms from those vocabularies
are used, consistency is much better - Assumes trained subject indexers
- Not likely to be the case in most settings other
than libraries - Again points out need to determine what your
objectives in building a taxonomy are before you
make the investment - So how do you go about subject indexing?
12Definition and Purpose of Subject Indexing
- Subject indexing is the process or technique of
identifying and selecting terms (words, phrases,
sentences, taxonomic categories, notation) used
in a domain of information to indicate the
subject content of a resource for users and to
provide subject access - Purposes of subject indexing may be seen in light
of Cutters objects of the catalog - To facilitate finding a particular object on the
basis of its subject content (finding function) - To display to a user all of the objects that
exhibit particular subject content (collocating
function) - To aid a user in the selection of a particular
object (choice function).
13Rowley Article
- Trade off between precision and recall
- 4 eras in indexing
- Era1 Pre-computer access- Title indexing
- Era 2 Online age- Cranfield and other retrieval
studies showed free indexing worked as well as
controlled in abstract databases - Era 3 Full-text vs. subject indexing- shown to
complement each other (Taylor also points out the
tradeoff between summarization for document
retrieval vs. depth indexing for information
retrieval) - Era 4 Tests with real users instead of
controlled experiments- difficulty in using
search interfaces because of complex and varied
systems
14Types of Subject Indexing Derived Indexing
- Derived Indexing in derived indexing, terms
used for indexing are limited to those that
actually appear in the document or resource. - Derived indexing may be done manually or
automatically - Search engine indexes are examples of automatic
derived indexing
15Assigned Indexing
- Assigned Indexing in assigned indexing, terms
used for indexing are not limited to those in the
object, but may come from the object, the mind of
the indexer, or from a controlled vocabulary - There are two types of Assigned indexing Free
Indexing and Indexing from controlled
vocabularies
16Free Indexing
- In free indexing, the indexer or indexing program
is free to assign terms from anywhere inside or
outside the object - the indexer may take terms from the object, or
use any terms that occur to them - In some free indexing settings, very detailed
instructions guide indexers in their selection of
terms - Other settings are much looser, users can pick
any terms that mean something to them or others - Pictures (http//flickr.com)
- Folksonomies (http//del.icio.us)
17Controlled Vocabulary Indexing
- In indexing from controlled vocabularies,
indexers are constrained by the terms that are
available in lists of terms called controlled
vocabularies - they must assign one or more
terms from the controlled vocabulary. - Controlled vocabulary indexing is much like
choosing terms from a very large drop-down menu.
18Automatic Indexing
- In automatic indexing, it is common for indexing
software applications to use derived indexing
techniques only, enhanced with word stemming and
spelling algorithms to improve matching - However, more advanced programs are being
developed that mimic free indexing (e.g., text
summarization programs) - Some advanced automatic indexing programs
(particularly those in medicine) are making use
of controlled vocabularies in term selection and
identification.
19Mais Conceptions of Indexing
- Simplistic conception of indexing
- automatic extraction (derived indexing)
- Document-oriented indexing
- focus on document document parts
- Content-oriented indexing
- focus on content in document (still document
oriented) - User-oriented indexing
- focus on user possible uses of the document
- Requirement-oriented indexing
- relies on in-depth knowledge of users uses of
documents complete knowledge of context
20Types of Terms Used in Subject Indexing
- Words or short phrases
- descriptors, identifiers, subject headings, or
keywords - Sentences derived indexing may use whole
sentences, but rarely done used in some web
documents and for derived abstracts - abstracts, summaries, or annotations
- Taxonomic categories (such as the type used in
the Yahoo directory) - Notation (such as the type used in the Dewey
Decimal Classification)
21Sample ERIC Indexing Record
- PERSONAL AUTHOR Magnuson,-Sandy Norem,-Ken
- TITLE Challenges for Higher Education Couples in
Commuter Marriages Insights for Couples and
Counselors Who Work with Them. - PUBLICATION YEAR 1999
- SOURCE (JOURNAL CITATION) Family-Journal-Counsel
ing-and-Therapy-for-Couples-and-Families v7 n2
p125-34 Apr 1999 - DOCUMENT TYPE Journal-Articles (080)
Reports-Research (143) - LANGUAGE English
- MAJOR DESCRIPTORS Counseling-Techniques
Dual-Career-Family Job-Satisfaction
Marital-Satisfaction Marriage- - MINOR DESCRIPTORS Trust-Psychology
- MAJOR IDENTIFIERS Career-Commitment
- MINOR IDENTIFIERS Quality-Time
- ABSTRACT Focuses on the experiences of
dual-career couples that maintain two homes to
attain career satisfaction. Findings include
support for the potential strength and
satisfaction of commuting relationships. Trust,
commitment, regular communication, and quality
shared time were endorsed as factors contributing
to successful distance marriages. (Author/GCP)
22Indexing Non-text Objects
- Layne discusses the indexing of images and points
out some useful distinctions - Defines four general types of attributes
- Biographical
- Subject
- Exemplified
- Relationship
- While she discusses in the context of images,
these can prove useful when indexing almost any
object
23Identification of Concepts
- Taylor lists several concepts that can be helpful
in teasing out subject terms - Topics
- Names
- Persons, corporations, geographic, other
- Time periods
- Form (genre)
- http//isotropic.org/papers/chicken.pdf
- See the appendix in Taylor for an example and
checklist
24Indexing Policies
- Many indexers are guided by indexing policies
that determine the types of terms that are
finally used in indexing - Three characteristics of indexing upon which
indexing policies may be built - Exhaustivity
- Specific entry (sometimes called specificity,
but incorrectly) - Coextensivity
25ISO 5963
- Despite Wilsons assertion that subject analysis
is impossible, a variety of standards exist
prescribing how it should be done the British
Standard ISO 5963 in your readings this week is
one of them - Viewed from Wilsons or Mais perspective (and
your own), what are the problems with this
standard?
26(No Transcript)
27Steps in Free and Assigned Indexing
- Identify subject content
- Identify disciplinary context or domain (for
classifications or taxonomies) - Express or describe content (steps 1-3 describe
the subject analysis process) - Select or create terms and add them to the
document representation - If working with a controlled vocabulary (CV),
update and maintain the CV based on the indexing
experience
28Questions?
29Exercise 5
- Purpose is to try different methods of extracting
concepts from an article, so you can see the
impact on users - Spend the rest of class working through the
questions in Exercise 5 - Well discuss before the end of class
30Differences
- Hopefully, this exercise gave you a chance to see
a couple things - How difficult it can be to actually determine
what something is about - How different methods of assigning terms would
result in very different access for users - We didnt throw in Mais perspective on domain
indexing in this exercise, which makes it even
more difficult - This is obviously not a simple thing to do well
- But you now are aware of the issues, and can keep
them in mind when working in this area
31Next Week
- Well start looking in more detail at controlled
vocabularies and discuss how they might interact
with emergent social tagging systems - Remember to read assignments BEFORE class
- Important your mid-term assignments are due at
the start of class next week!!