LIBR 557 Advanced Information Retrieval - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

LIBR 557 Advanced Information Retrieval

Description:

... in a single search request. Detailed, human indexing ... Repetitive lyrics database song 3. I'll tell you what I want, what I really really want, ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 33
Provided by: sheilapo
Category:

less

Transcript and Presenter's Notes

Title: LIBR 557 Advanced Information Retrieval


1
How Information is Organized
  • Databases and the Structure of Information

2
The structure of information
  • Understanding the structure of how something is
    organized is the first huge step in effective
    retrieval and controlling the information anxiety
    beast.
  • - Richard Saul Wurman, Information Anxiety guru

3
Class Objective
  • to introduce the organization of information in
    the traditional information retrieval model

4
Class Outline
  • What is a database?
  • Structure of a simple database
  • Making a database searchable

5
1. What is a Database?
  • A computerized collection of information that is
    arranged in a way that makes it easy to retrieve
    information
  • A collection of machine-readable information
    accessible through a computer

6
Why study databases?
  • Often contain unique information
  • Information is often reliable
  • Cover hundreds of publications in a single search
    request
  • Detailed, human indexing
  • Search syntax is sophisticated and powerful

7
2. Structure of a Simple Database
  • Data
  • raw information
  • Records
  • discrete units of information
  • Fields
  • distinct part or section of a record

8
The repetitive lyrics database
  • 3 songs with highly irritating, repetitive lyrics

9
Repetitive lyrics database song 1
  • Oh yeah, I'll tell you something,
  • I think you'll understand.
  • When I'll say that something
  • I want to hold your hand,
  • I want to hold your hand,
  • I want to hold your hand.

10
Repetitive lyrics database song 2
  • I, I will always, always love you
  • I will always love you
  • I will always love you
  • I will always love you
  • change this to something with more words
    something tricky like Ill, youll

11
Repetitive lyrics database song 3
  • Ill tell you what I want, what I really really
    want,
  • So tell me what you want, what you really really
    want,
  • I wanna, I wanna, I wanna, I wanna, I wanna
    really really really wanna zigazig ah

12
Structure of a Simple Database
  • Data
  • raw information
  • Records
  • discrete units of information
  • Fields
  • distinct part or section of a record

13
Types of Electronic Databases
  • Bibliographic databases
  • Full-text databases
  • Numeric databases
  • Directory databases

14
Typical Fields in a Bibliographic Database
  • Unique identifier
  • Title
  • Author
  • Date published
  • Journal name
  • Publisher
  • Document Type (eg book review)
  • Subjects or descriptors
  • Abstract

15
Types of Electronic Databases
  • Bibliographic databases
  • Full-text databases
  • Numeric databases
  • Directory databases

16
3. Making a Database Searchable
  • Which fields will be searchable?
  • What type of indexing will be used for each
    searchable field?
  • Are there any words that shouldnt be indexed?

17
Inverted Indexing
  • In a back-of-the-book index, entries point to a
    specific page or paragraph in the book
  • In an inverted index, entries point to a specific
    record in the database
  • Inverted indexing is done by software

18
Advantages of Inverted Indexing
  • Speed of Retrieval
  • Word Position
  • Field searching
  • Proximity and phrase searching
  • Word Frequency

19
Steps in Inverted Indexing
  • Provide unique identifier for each document if
    none exists
  • Analyze each record for significant words
  • Generate alphabetical list of significant words
    with a pointer to the unique identifier

20
Inverted Indexing, Step 2
  • What is a Significant Word?
  • Significant words are all words except stop words
    (words like AND, AN, FOR, TO, THE)
  • Stop words slow down the indexing and searching
    process
  • Position of stop words marked to enable proximity
    searching
  • Can be indexed as part of a phrase

21
Inverted Indexing, Step 2
  • Analyze record for significant words
  • Divide record into fields
  • Label field (eg AU, TI)
  • Note position of each word in the field
  • Significant words are identified
  • Position of stopwords also noted

22
Inverted Indexing, Step 2 Quotations Database
  • Record 123
  • Quote field (QU)
  • To be or not to be, that is the question

23
Inverted Indexing, Step 2 Quotations Database
  • To be or not
  • QU002 QU003 QU004
  • to be, that is
  • QU006 QU007 QU008
  • the question
  • QU0010

24
Inverted Indexing, Step 3
  • Generate (parse) a list of significant words with
    a pointer to a records unique identifier
  • Sort these words alphabetically
  • Remove duplicates

25
Inverted Indexing, Step 3 Generate List of
Significant Words
  • Word File Position
  • be 123 QU002
  • or 123 QU003
  • not 123 QU004
  • be 123 QU006
  • that 123 QU007
  • is 123 QU008
  • question 123 QU0010

26
Inverted Indexing, Step 3 Sort List
Alphabetically
  • Word File Position
  • be 123 QU002
  • be 123 QU006
  • is 123 QU008
  • not 123 QU004
  • or 123 QU003
  • question 123 QU0010
  • that 123 QU007

27
Inverted Indexing, Step 3 Remove Duplicates
  • Word File Frequency Position
  • be 123 2 QU002,QU006
  • is 123 1 QU008
  • not 123 1 QU004
  • or 123 1 QU003
  • question 123 1 QU0010
  • that 123 1 QU007

28
Inverted Indexing Word Phrase Indexing
  • Individual words indexed in fields like abstract,
    title
  • Phrases indexed in fields like author, journal
    name
  • Both words and phrases can be indexed in the same
    field
  • Word fragments (in scientific databases)

29
Word Phrase Indexing in Dialog
  • Remember
  • A word in Dialog is a set of alphabetical or
    numeric characters surrounded by either
    punctuation or a space
  • A phrase in Dialog is an entire entry in a
    field
  • British General Electric Co.
  • Basch, Reva
  • Basch, R.

30
Inverted Indexing in the Repetitive Lyrics
Database
  • Step 1 Unique identifier colour of post-it
  • Step 2 Analyze each record for significant words
  • Step 3 Generate alphabetical list of significant
    words with a pointer to the unique identifier

31
Searching Inverted Indexes
  • Presence of term
  • Boolean logic
  • Position of term
  • Field searching
  • Absolute position
  • Proximity
  • Post-coordinated phrase searching
  • Frequency of term
  • Ranking

32
What Next?
  • Dialog lab
  • Last weeks practice exercises
  • Expand, Thesaurus, Multiple Files
  • Next weeks lecture
  • Search strategies and tactics
  • Database publishing trends
  • February 1 class
  • Management of library databases
  • More about database publishing trends
Write a Comment
User Comments (0)
About PowerShow.com