Title: Indexing
1Indexing retrieval
2Approaches to indexing
Key word indexing
Concept indexing
Social indexing
Non-text indexing
3Keyword Indexing
4Keyword indexing (1)
Entity-oriented - draw terms from entity itself
Advantages
- Quick
- Inexpensive
- No vocabulary lag
- Multiple access points
- Accuracy
- No intellectual effort needed
5Keyword indexing (2)
Disadvantages
- No control over synonyms, near synonyms
- No control over homographs
- Dependent on authors for informative and accurate
titles - No control over word forms
- No cross reference structure
6Historical key word indexing methodologies
Edge-notched cards
Optical coincidence cards
Key word in context (KWIC)
Spatial indexing
7Pre- versus post-coordinate indexing
Mortimer Taube
ChinaFolklore ChinaHistory China
Politics France Folklore France History France
Politics Germany Folklore Germany
History Germany Politics Russia
Folklore Russia History Russia Politics (12
terms)
China, France, Germany, Russia, Folklore,
History, Politics (7 terms)
8Post-coordinate index searching
- History of France ? France History
Two sets of documents
France
History
Boolean AND search yields intersection of the two
sets
France AND History
9Advantages to Taube's system
No need to develop a list of authorized
termspulling terms from documents themselves
No need to articulate rules of punctuation for
representing complex concepts (FranceHistory)
No need to delineate citation order
(Francehistory v. HistoryFrance)
No need to formulate rules for subheadings ("May
subdivide geog.")
10Uniterm cards
Document no. 102 "Arrest statistics of the
Arizona State Police"
state 31 102 53 24 75 96 107 68 49
70 34 95 117 59 115 147
109
police 11 102 23 85 96 87 68 49
60 91 115 107 79
11Searching with uniterm cards
- Query looking for documents about state police
state 31 102 53 24 75 96 107 68 49
70 34 95 117 59 115 147
109
police 11 102 23 85 96 87 68 49
60 91 115 107 79
102 Arrest statistics of the Arizona State Police.
107 A short history of the Wisconsin State Police.
115 The modern police state.
12Edge-notched cards
- One card per bibliographic item
pet-care
Whirdeaux, Ima Caring for your pet pterodactyl /
by Ima Whirdeaux Call no. Q54321 .W45
bears
Turner, Paige Caring for your pet grizzly / by
Paige Turner Call no. Q12345 .T8
pterodactyls
13Pyramid coding for edge-notched cards
20 dots
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
6 7 8 9
10 dots
9 5 2 0 9 5 2 0 8 4 1
8 4 1 7 3 7
3 6 6
They hadn't heard of the Y2K problem yet.
14Optical coincidence cards
- Pre-printed cards with numbers for entire database
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 3
3 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 8
3 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
fleas
15Key Word in Context (KWIC) Index
Stop word
Stop word
- Doc 15 title "A comparison of OCLC and WLN hit
rates for monographs and an analysis of the types
of records retrieved"
CONTEXT ttems of remote users an hit rates for
monograph/A comparison of OCLC and WLN OCLC and
WLN hit rates for onographs/ A comparison
of arison of OCLC and WLN hit n analysis of the
types of s of the types of records phs and an
analysis of the A comparison of OCLC and
KEY WORDS analysis of the types of comparison of
OCLC and WLN hit rates for monographs and
/ monographs and an analysi/ OCLC and WLN hit
rates for rates for monographs and / records
retrieved. A com/ retrieved. A comparison
/ types of records retrieve/ WLN hit rates for
monogra/
POINTER 15 15 15 15 15 15 15 15 15 15
16Key Word Out of Context (KWOC) Index
- aardvark 101
- baggage 123
- banyan 128, 159, 179
- coconut 955, 654
- driving 196, 488, 788
- elementary 455, 785
- elephant 128, 465, 783
- garage 678, 398
- hardware 849, 483, 399
- meter 768
- nadir 877
noxious 112 opium 289 opus 985, 159,
849 people 629, 458 quark 137, 492 radar 968,
295 radio 430, 206, 749 stereo 294, 837,
873 television 745, 727, 883 ultraviolet 958,
774 zebra 276
17Vector space model (VSM)
- Each document represented by a vector
assistive
technology
Vector for document entitled "Assistive
technology for libraries"
libraries
18Vector space model matching
- Similarity between query and document vectors
assistive
Vector for document 1
technology
Vector for document 2
Vector for query
libraries
19VSM term weighting
- Assign high weights to terms that appear
frequently in the document but infrequently in
the database
Term conclusion information blind
Freq. w/in document low high high
No. of documents with term high high low
Query "I'm looking for articles about assistive
technology for the blind."
20VSM refinements
- Adding semantic and syntactical parsing.
Bill is going to the store to make a purchase.
Bill is going to purchase the store.
21Concept indexing
22Concept indexing
- Rather than pulling terms from documents, assign
concept identifier (e.g. FranceHistory) to
documents dealing with history of France
Requires intellectual effort
Takes more time than key word indexing so less
economical
Avoids problems of false coordination and
synonymy through use of vocabulary control
23Vocabulary control (1)
- One indexing term or phrase to represent a
concept - Unidentified flying objects not flying saucers
- Point user to correct term with "use" reference
- Reduces number of searches needed to find items
about a particular topic
24Vocabulary control (2)
- One form of a word to represent the concept
- Dictionaries not dictionary
25Vocabulary control (3)
- One usage of a homographic term
- Fault (geologic) not fault (responsibility for
error) - Usage identified though scope note
- Consistency among indexers as well as one indexer
over time - Helps user to avoid false drops
26Vocabulary control (4)
- Syndetic structure
- Broader terms
- Narrower terms
- Related terms (see also)
- User can negotiate structure to find most
appropriate term, as well as identify additional
related terms of potential use in finding
relevant documents
27Social network indexing
- Tags
- Tag clouds
- User-created tags providing access to library
resources
28flickr
http//www.flickr.com/
29Tags
30Tags
Tags architecture Bohemian South Country
Czech Republic Europe European
historical medieval old Old Town
Other Keywords River Snow town Vltava
31Tags
32Tags
33Tags
(177,583 photos)
34Tags
35Tag clouds
36Geotagging
37Librarian tagging
38Library using flickr
39Peace Palace Library (PPL)
40Social bookmarking http//del.icio.us/
41http//del.icio.us/mauicclibrary
42University of Pennsylvania
http//www.library.upenn.edu/
43PennTags
44Item list with PennTags
45Adding a PennTag
Add to PennTags
46Non-text indexing
47Indexing Music
48Indexing music - transcription
1 1 5 5 6 6 5
49Indexing Music - melodic contour
U
R
U
R
D
R
- / - / - \
50Query by humming
51Query by humming (2)
Digital Audio
Pitch Tracker
MIDI Songs
Melodic contour
Ranked List Of Matching Melodies
Melody Database
Query Engine
Source Ghias, Asif Logan, Jonathan Chamberlin,
David and Brian C. Smith. 1995. Query by
humming--musical Information retrieval in an
audio database. ACM Multimedia 95 - Electronic
Proceedings. http//www.cs.cornell.edu/Info/Facul
ty/bsmith/query-by-humming.html
52Indexing images
Source Trust Territory archives.
53Indexing images - chair (1)
54Indexing images - ?
55Indexing images - chair (2)
56Biometrics - face
57Biometrics - differences
58Biometrics - similarities
- Look at ratios of distances between marker points
59Indexing images
60Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
61Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
62Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
63Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
64Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
65Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
66Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
67Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish
68Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish
69Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish
70Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish
71Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish
72Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish