Indexing - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Indexing

Description:

Caring for your pet pterodactyl / by Ima Whirdeaux. Call no. Q54321 .W45 ... pterodactyls. Pyramid coding for edge-notched cards. Coding the year 1947* 20 dots ... – PowerPoint PPT presentation

Number of Views:203
Avg rating:3.0/5.0
Slides: 73
Provided by: donnaba
Category:

less

Transcript and Presenter's Notes

Title: Indexing


1
Indexing retrieval
2
Approaches to indexing
Key word indexing
Concept indexing
Social indexing
Non-text indexing
3
Keyword Indexing
4
Keyword indexing (1)
Entity-oriented - draw terms from entity itself
Advantages
  • Quick
  • Inexpensive
  • No vocabulary lag
  • Multiple access points
  • Accuracy
  • No intellectual effort needed

5
Keyword indexing (2)
Disadvantages
  • No control over synonyms, near synonyms
  • No control over homographs
  • Dependent on authors for informative and accurate
    titles
  • No control over word forms
  • No cross reference structure

6
Historical key word indexing methodologies
  • Uniterm cards

Edge-notched cards
Optical coincidence cards
Key word in context (KWIC)
Spatial indexing
7
Pre- versus post-coordinate indexing
Mortimer Taube
ChinaFolklore ChinaHistory China
Politics France Folklore France History France
Politics Germany Folklore Germany
History Germany Politics Russia
Folklore Russia History Russia Politics (12
terms)
China, France, Germany, Russia, Folklore,
History, Politics (7 terms)
8
Post-coordinate index searching
  • History of France ? France History

Two sets of documents
France
History
Boolean AND search yields intersection of the two
sets
France AND History
9
Advantages to Taube's system
No need to develop a list of authorized
termspulling terms from documents themselves
No need to articulate rules of punctuation for
representing complex concepts (FranceHistory)
No need to delineate citation order
(Francehistory v. HistoryFrance)
No need to formulate rules for subheadings ("May
subdivide geog.")
10
Uniterm cards
  • One card per term

Document no. 102 "Arrest statistics of the
Arizona State Police"
state 31 102 53 24 75 96 107 68 49
70 34 95 117 59 115 147
109
police 11 102 23 85 96 87 68 49
60 91 115 107 79
11
Searching with uniterm cards
  • Query looking for documents about state police

state 31 102 53 24 75 96 107 68 49
70 34 95 117 59 115 147
109
police 11 102 23 85 96 87 68 49
60 91 115 107 79
102 Arrest statistics of the Arizona State Police.
107 A short history of the Wisconsin State Police.
115 The modern police state.
12
Edge-notched cards
  • One card per bibliographic item

pet-care
Whirdeaux, Ima Caring for your pet pterodactyl /
by Ima Whirdeaux Call no. Q54321 .W45
bears
Turner, Paige Caring for your pet grizzly / by
Paige Turner Call no. Q12345 .T8
pterodactyls
13
Pyramid coding for edge-notched cards
  • Coding the year 1947

20 dots
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
6 7 8 9
10 dots
9 5 2 0 9 5 2 0 8 4 1
8 4 1 7 3 7
3 6 6
They hadn't heard of the Y2K problem yet.
14
Optical coincidence cards
  • Pre-printed cards with numbers for entire database

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 3
3 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 8
3 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
fleas
15
Key Word in Context (KWIC) Index
Stop word
Stop word
  • Doc 15 title "A comparison of OCLC and WLN hit
    rates for monographs and an analysis of the types
    of records retrieved"

CONTEXT ttems of remote users an hit rates for
monograph/A comparison of OCLC and WLN OCLC and
WLN hit rates for onographs/ A comparison
of arison of OCLC and WLN hit n analysis of the
types of s of the types of records phs and an
analysis of the A comparison of OCLC and
KEY WORDS analysis of the types of comparison of
OCLC and WLN hit rates for monographs and
/ monographs and an analysi/ OCLC and WLN hit
rates for rates for monographs and / records
retrieved. A com/ retrieved. A comparison
/ types of records retrieve/ WLN hit rates for
monogra/
POINTER 15 15 15 15 15 15 15 15 15 15
16
Key Word Out of Context (KWOC) Index
  • aardvark 101
  • baggage 123
  • banyan 128, 159, 179
  • coconut 955, 654
  • driving 196, 488, 788
  • elementary 455, 785
  • elephant 128, 465, 783
  • garage 678, 398
  • hardware 849, 483, 399
  • meter 768
  • nadir 877

noxious 112 opium 289 opus 985, 159,
849 people 629, 458 quark 137, 492 radar 968,
295 radio 430, 206, 749 stereo 294, 837,
873 television 745, 727, 883 ultraviolet 958,
774 zebra 276
17
Vector space model (VSM)
  • Each document represented by a vector

assistive
technology
Vector for document entitled "Assistive
technology for libraries"
libraries
18
Vector space model matching
  • Similarity between query and document vectors

assistive
Vector for document 1
technology
Vector for document 2
Vector for query
libraries
19
VSM term weighting
  • Assign high weights to terms that appear
    frequently in the document but infrequently in
    the database

Term conclusion information blind
Freq. w/in document low high high
No. of documents with term high high low
Query "I'm looking for articles about assistive
technology for the blind."
20
VSM refinements
  • Adding semantic and syntactical parsing.

Bill is going to the store to make a purchase.
Bill is going to purchase the store.
21
Concept indexing
22
Concept indexing
  • Rather than pulling terms from documents, assign
    concept identifier (e.g. FranceHistory) to
    documents dealing with history of France

Requires intellectual effort
Takes more time than key word indexing so less
economical
Avoids problems of false coordination and
synonymy through use of vocabulary control
23
Vocabulary control (1)
  • One indexing term or phrase to represent a
    concept
  • Unidentified flying objects not flying saucers
  • Point user to correct term with "use" reference
  • Reduces number of searches needed to find items
    about a particular topic

24
Vocabulary control (2)
  • One form of a word to represent the concept
  • Dictionaries not dictionary

25
Vocabulary control (3)
  • One usage of a homographic term
  • Fault (geologic) not fault (responsibility for
    error)
  • Usage identified though scope note
  • Consistency among indexers as well as one indexer
    over time
  • Helps user to avoid false drops

26
Vocabulary control (4)
  • Syndetic structure
  • Broader terms
  • Narrower terms
  • Related terms (see also)
  • User can negotiate structure to find most
    appropriate term, as well as identify additional
    related terms of potential use in finding
    relevant documents

27
Social network indexing
  • Tags
  • Tag clouds
  • User-created tags providing access to library
    resources

28
flickr
http//www.flickr.com/
29
Tags
30
Tags
Tags architecture Bohemian South Country
Czech Republic Europe European
historical medieval old Old Town
Other Keywords River Snow town Vltava
31
Tags
32
Tags
33
Tags
(177,583 photos)
34
Tags
35
Tag clouds
36
Geotagging
37
Librarian tagging
38
Library using flickr
39
Peace Palace Library (PPL)
40
Social bookmarking http//del.icio.us/
41
http//del.icio.us/mauicclibrary
42
University of Pennsylvania
http//www.library.upenn.edu/
43
PennTags
44
Item list with PennTags
45
Adding a PennTag
Add to PennTags
46
Non-text indexing
47
Indexing Music
48
Indexing music - transcription
1 1 5 5 6 6 5
49
Indexing Music - melodic contour
U
R
U
R
D
R
- / - / - \
50
Query by humming
51
Query by humming (2)
  • Hummed Queries

Digital Audio
Pitch Tracker
MIDI Songs
Melodic contour
Ranked List Of Matching Melodies
Melody Database
Query Engine
Source Ghias, Asif Logan, Jonathan Chamberlin,
David and Brian C. Smith. 1995. Query by
humming--musical Information retrieval in an
audio database. ACM Multimedia 95 - Electronic
Proceedings. http//www.cs.cornell.edu/Info/Facul
ty/bsmith/query-by-humming.html
52
Indexing images
Source Trust Territory archives.
53
Indexing images - chair (1)
54
Indexing images - ?
55
Indexing images - chair (2)
56
Biometrics - face
57
Biometrics - differences
58
Biometrics - similarities
  • Look at ratios of distances between marker points

59
Indexing images
  • Color
  • Layout

60
Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
61
Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
62
Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
63
Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
64
Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
65
Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
66
Indexing images by color
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cColor.mac/qbic?selLangEnglish
67
Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish
68
Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish
69
Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish
70
Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish
71
Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish
72
Indexing images by layout
http//www.hermitagemuseum.org/fcgi-bin/db2www/qbi
cLayout.mac/qbic?selLangEnglish
Write a Comment
User Comments (0)
About PowerShow.com