1%20of%2024 - PowerPoint PPT Presentation

About This Presentation
Title:

1%20of%2024

Description:

Question answering vs. Information lack or information need. University of Malta ... http://jimjansen.tripod.com/academic/pubs/jasist2001/jasist2001.html ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 25
Provided by: chr193
Category:
Tags: 20of

less

Transcript and Presenter's Notes

Title: 1%20of%2024


1
CSA1013Historical Perspectives of
Information Search and Retrieval
  • Dr. Christopher Staff
  • Department of Computer Science AI
  • University of Malta

2
Aims and Objectives
  • What is Information Search and Retrieval?
  • Whats the state-of-the-art?
  • How did we get here?
  • What are the issues?
  • Where are we likely to go next?

3
Whats Information Search and Retrieval?
  • Whats information?
  • Structured vs. unstructured
  • Where is it?
  • Question answering vs. Information lack or
    information need

4
Whats the state-of-the-art?
  • Information Retrieval in the real world
  • Web-based search engines
  • Google, AllTheWeb, AltaVista, etc.
  • Web directories
  • Yahoo, Excite, etc.

5
Whats the state-of-the-art?
  • Google, and Google-like search engines
  • Index gt 24 billion web pages (pdf, doc, html, )
  • User expresses Query
  • terms, natural language query, etc
  • System compares query to indexed documents
  • Returns list of relevant documents

6
Whats the state-of-the-art?
  • Recent study by Jansen Spink Jansen shows
  • Query 2.14 terms Spink
  • Queries with 1 term 53!
  • 54 of users are satisfied with first page of
    results (list of 10 documents)
  • 80 of users view not more than 10 - 20 results
  • 27.6 read only one document!
  • 66 read lt 5 documents

7
Has life always been this good?
  • It would seem that were living in information
    heaven
  • Any info we seek is just a couple of query terms
    away
  • In reality, although majority of queries appear
    to be trivial, the reality is quite different

8
Has life always been this good?
  • What if we want to find all relevant information?
    (The Invisible Web)
  • What if we want to find something that is
    difficult to describe?
  • What if we dont know what were looking for?
  • What tools do we use to find info in
    encyclopaedias, dictionaries, newspapers,
    reference manuals, novels and other books?

9
Here beginneth the history lesson
  • People have devised tools to find information
    again ever since we learnt to write things down
  • Think of information stored on your personal
    computers how do you find something that you
    wrote last month, last year?

10
Prehistory!
  • Well, nearly!
  • Early writings
  • Papyrus scrolls
  • No paragraph, page numbers, etc
  • Couldnt scroll to the end to read an index
  • Instead, Greek/Roman libraries used
    sillybus/index of title

11
Greeks/Romans
  • 3BC, Greeks probably use alphabetization in
    Library of Alexandria
  • Around 2BC (Rome), evidence of hierarchies of
    information/classification systems
  • Greeks probably earlier
  • Also, Tables of Contents date from around 2BC
    (Pliny the Elder reports before 79AD)

12
Printing Press
  • Not much else was to happen until 1455, with the
    advent of the printing press
  • Previously, still difficult to refer to
    information within a book, because copies were
    inaccurate
  • Info on one page in one book could be on a
    different page in other copies

13
Indices and the Printing Press
  • Still, alphabetization was on initial letter,
    then on first four letters
  • Not until 18th Century did full alphabetization
    occur!

14
The Second World War and beyond
  • In 1945, Vannevar Bush publishes As We May
    Think in the Atlantic Monthly
  • In 1949, Warren Weaver writes that if Chinese is
    English codification, then Machine Translation
    should be possible
  • These give rise to intelligent and
    statistical (or surface-based) approaches to
    Information Search and Retrieval respectively
    (amongst other things -))

15
Intelligent vs. Surface-based
  • Concepts
  • 1950s
  • Lay in waiting for years, because
    hardware/software not around
  • Words
  • 1950s
  • First approaches were Key Words in Context
    (KWIC)

16
Intelligent vs. Surface-based
  • 1960s
  • Generality in AI (John McCarthy)
  • 1960s
  • Boolean Search
  • Measures of performance effectiveness
  • Thesaural Lookup
  • Vector Space Model

17
Intelligent vs. Surface-based
  • 1970s
  • Expert Systems
  • Still about understanding information and
    reasoning with and about it
  • 1970s
  • Explosion in availability of electronic text
    collections
  • Library Retrieval Systems
  • Full-text indexing
  • Probabilistic IR
  • Relevance Feedback

18
Intelligent vs. Surface-based
  • 1980s
  • Conceptual IR
  • Knowledge Rep Langs
  • Lenats CYC
  • Contextual Reasoning
  • 5th Generation Computing, Japan
  • LSI feeds Statistical IR
  • 1980s
  • OPACs
  • IR used by non-specialists
  • Extended Boolean IR
  • Word Sense Disambiguation
  • Statistical IR (LSI, etc)
  • Internet

19
Intelligent vs. Surface-based
  • 1990s
  • Better language processing
  • information extraction
  • entity name recognition
  • Advances in contextual reasoning, ontologies
  • 1990s
  • WWW (1995 c. 10M pages, 2003 c. 3B!)
  • Multimedia Indexing Retrieval
  • Web-based search engines

20
Intelligent vs. Surface-based
  • 2000s
  • Semantic Web
  • 2000s
  • Faster processors
  • More memory
  • Cheaper storage space
  • More superficial comparisons

21
Intelligent vs. Surface-based
  • The future
  • Computers that can find precisely the information
    you seek
  • Even if the answer is non-obvious
  • Or the answer needs to be the result of reasoning
  • MyLifeBits
  • The future
  • Computers that can approximate the information
    you seek
  • At much less cost
  • At the expense of correctness
  • MyLifeBits

22
(No Transcript)
23
Main Issues
  • Architecture to handle ever increasing numbers of
    docs efficient data structures
  • Freshness, indexing and retrieval speed
    (Efficient algorithms)
  • What is relevance? (Better, cheaper and more
    accurate algorithms to understand what the user
    really wants)

24
Main References
  • Paijmans, J.J., last updated 2004, The Retrieval
    of Information from historical perspective,
    http//pi0959.kub.nl/Paai/Onderw/V-I/Content/histo
    ry.html
  • American Society of Indexers, last updated 2005,
    How Information Retrieval Started,
    http//www.asindexing.org/site/history.shtml
  • Jansen Jansen, B.J., and Spink, A., 2003, An
    Analysis of Web Documents Retrieved and Viewed,
    in Proceedings of the 4th International
    Conference on Internet Computing, Las Vegas,
    Nevada, 23-26 June 2003. http//ist.psu.edu/facult
    y_pages/jjansen/academic/pubs/pages_viewed.pdf
  • Spink Spink, A., et. al., 2001, Searching the
    Web The Public and their Queries, in JASIST
    2001. http//jimjansen.tripod.com/academic/pubs/ja
    sist2001/jasist2001.html
Write a Comment
User Comments (0)
About PowerShow.com