Title: 1 of 24
1CSA1013Historical Perspectives of
Information Search and Retrieval
- Dr. Christopher Staff
- Department of Computer Science AI
- University of Malta
2Aims and Objectives
- What is Information Search and Retrieval?
- Whats the state-of-the-art?
- How did we get here?
- What are the issues?
- Where are we likely to go next?
3Whats Information Search and Retrieval?
- Whats information?
- Structured vs. unstructured
- Where is it?
- Question answering vs. Information lack or
information need
4Whats the state-of-the-art?
- Information Retrieval in the real world
- Web-based search engines
- Google, AllTheWeb, AltaVista, etc.
- Web directories
- Yahoo, Excite, etc.
5Whats the state-of-the-art?
- Google, and Google-like search engines
- Index gt 24 billion web pages (pdf, doc, html, )
- User expresses Query
- terms, natural language query, etc
- System compares query to indexed documents
- Returns list of relevant documents
6Whats the state-of-the-art?
- Recent study by Jansen Spink Jansen shows
- Query 2.14 terms Spink
- Queries with 1 term 53!
- 54 of users are satisfied with first page of
results (list of 10 documents) - 80 of users view not more than 10 - 20 results
- 27.6 read only one document!
- 66 read lt 5 documents
7Has life always been this good?
- It would seem that were living in information
heaven - Any info we seek is just a couple of query terms
away - In reality, although majority of queries appear
to be trivial, the reality is quite different
8Has life always been this good?
- What if we want to find all relevant information?
(The Invisible Web) - What if we want to find something that is
difficult to describe? - What if we dont know what were looking for?
- What tools do we use to find info in
encyclopaedias, dictionaries, newspapers,
reference manuals, novels and other books?
9Here beginneth the history lesson
- People have devised tools to find information
again ever since we learnt to write things down - Think of information stored on your personal
computers how do you find something that you
wrote last month, last year?
10Prehistory!
- Well, nearly!
- Early writings
- Papyrus scrolls
- No paragraph, page numbers, etc
- Couldnt scroll to the end to read an index
- Instead, Greek/Roman libraries used
sillybus/index of title
11Greeks/Romans
- 3BC, Greeks probably use alphabetization in
Library of Alexandria - Around 2BC (Rome), evidence of hierarchies of
information/classification systems - Greeks probably earlier
- Also, Tables of Contents date from around 2BC
(Pliny the Elder reports before 79AD)
12Printing Press
- Not much else was to happen until 1455, with the
advent of the printing press - Previously, still difficult to refer to
information within a book, because copies were
inaccurate - Info on one page in one book could be on a
different page in other copies
13Indices and the Printing Press
- Still, alphabetization was on initial letter,
then on first four letters - Not until 18th Century did full alphabetization
occur!
14The Second World War and beyond
- In 1945, Vannevar Bush publishes As We May
Think in the Atlantic Monthly - In 1949, Warren Weaver writes that if Chinese is
English codification, then Machine Translation
should be possible - These give rise to intelligent and
statistical (or surface-based) approaches to
Information Search and Retrieval respectively
(amongst other things -))
15Intelligent vs. Surface-based
- Concepts
- 1950s
- Lay in waiting for years, because
hardware/software not around
- Words
- 1950s
- First approaches were Key Words in Context
(KWIC)
16Intelligent vs. Surface-based
- 1960s
- Generality in AI (John McCarthy)
- 1960s
- Boolean Search
- Measures of performance effectiveness
- Thesaural Lookup
- Vector Space Model
17Intelligent vs. Surface-based
- 1970s
- Expert Systems
- Still about understanding information and
reasoning with and about it
- 1970s
- Explosion in availability of electronic text
collections - Library Retrieval Systems
- Full-text indexing
- Probabilistic IR
- Relevance Feedback
18Intelligent vs. Surface-based
- 1980s
- Conceptual IR
- Knowledge Rep Langs
- Lenats CYC
- Contextual Reasoning
- 5th Generation Computing, Japan
- LSI feeds Statistical IR
- 1980s
- OPACs
- IR used by non-specialists
- Extended Boolean IR
- Word Sense Disambiguation
- Statistical IR (LSI, etc)
- Internet
19Intelligent vs. Surface-based
- 1990s
- Better language processing
- information extraction
- entity name recognition
- Advances in contextual reasoning, ontologies
- 1990s
- WWW (1995 c. 10M pages, 2003 c. 3B!)
- Multimedia Indexing Retrieval
- Web-based search engines
20Intelligent vs. Surface-based
- 2000s
- Faster processors
- More memory
- Cheaper storage space
- More superficial comparisons
21Intelligent vs. Surface-based
- The future
- Computers that can find precisely the information
you seek - Even if the answer is non-obvious
- Or the answer needs to be the result of reasoning
- MyLifeBits
- The future
- Computers that can approximate the information
you seek - At much less cost
- At the expense of correctness
- MyLifeBits
22(No Transcript)
23Main Issues
- Architecture to handle ever increasing numbers of
docs efficient data structures - Freshness, indexing and retrieval speed
(Efficient algorithms) - What is relevance? (Better, cheaper and more
accurate algorithms to understand what the user
really wants)
24Main References
- Paijmans, J.J., last updated 2004, The Retrieval
of Information from historical perspective,
http//pi0959.kub.nl/Paai/Onderw/V-I/Content/histo
ry.html - American Society of Indexers, last updated 2005,
How Information Retrieval Started,
http//www.asindexing.org/site/history.shtml - Jansen Jansen, B.J., and Spink, A., 2003, An
Analysis of Web Documents Retrieved and Viewed,
in Proceedings of the 4th International
Conference on Internet Computing, Las Vegas,
Nevada, 23-26 June 2003. http//ist.psu.edu/facult
y_pages/jjansen/academic/pubs/pages_viewed.pdf - Spink Spink, A., et. al., 2001, Searching the
Web The Public and their Queries, in JASIST
2001. http//jimjansen.tripod.com/academic/pubs/ja
sist2001/jasist2001.html