Thomas Carnell - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Thomas Carnell

Description:

Investigation into Internet search technology Thomas Carnell South Bank University, London SE1 0AA, UK carnelta_at_sbu.ac.uk – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 20
Provided by: dave198
Category:

less

Transcript and Presenter's Notes

Title: Thomas Carnell


1
Investigation into Internet search technology
  • Thomas Carnell
  • South Bank University,
  • London SE1 0AA, UK
  • carnelta_at_sbu.ac.uk

2
What is a search engine?
  • An Index
  • Categorised information structure
  • Thousands of millions of documents
  • Search mechanism
  • Searches quickly through the index
  • Identifies relevance between a document and a
    users query

3
Scope
  • Identifying whether a piece of information is
    relevant to a users query.
  • Not about computer hardware that enables the
    search engine.
  • Not about language-language translation

4
Objectives
  • Find as many relevant documents as possible.
  • Identify and ignore irrelevant documents.
  • Effectively assess the relevance of a document
  • What is a relevant document?

5
Why Change?
  • Search engines are extremely fast
  • Almost always find results
  • Results are usually relevant

6
The Term Mismatch problem
  • No understanding what the user wants
  • Do not extend the boundaries of the users query
  • No spell checking
  • If we search for football we may not receive
    highly relevant documents about soccer

7
Improving search engines
  • Build intelligence into the search mechanism
  • Query expansion
  • Vector space model
  • Neural Networks

8
Techniques Query Expansion
  • Analyse terms in the users query
  • Alter query and add more terms, or specify not to
    search for irrelevant terms
  • Improve the accuracy of the query

9
Query Expansion Example
10
Query Expansion Issues
  • How do we know what to add?
  • Use dictionaries and thesauri
  • Ask the user
  • Relevance feedback
  • Does it work?

11
Techniques Dimensionality Reduction
  • Avoid the situation where two people refer to the
    same concept using different words.
  • Reduce the vocabulary of the search engine.
  • One implementation uses LSI (Deerwester 1990)
  • The terms football and soccer can both be
    encompassed by a term that represents the
    concept of football

12
Techniques Dimensionality Reduction
  • Does this actually help?
  • Wont a generalisation of information type
    increase the number of results?
  • Is it suited to a typical Internet query?
  • The majority of Internet search queries consist
    of a few words using dimensionality reduction
    would probably lead to excessive numbers of
    results.

13
Techniques Neural Networks
  • Highly robust structures used to create
    associations between queries and documents
  • Different types of neural network (or
    back-propagation network)
  • Spreading Activation Network
  • COSIMIR (COgnitive Similarity Learning in
    Information Retrieval)

14
Techniques Neural Networks
  • Spreading Activation Network

15
Techniques Neural Networks
  • The COSIMIR Model

16
Techniques Neural Networks
  • Both of the model mentioned previously have
    proved to be highly successful.
  • BUT
  • J. Mothe (1994) proved that the spreading
    activation model is actually identical to the
    Vector space model
  • The extremely promising COSIMIR model is
    computationally extremely expensive

17
Conclusions
  • Current search engines are not sufficient
  • The increasing volume of information on the
    Internet will make the problems of current search
    engines ever more significant
  • Some, if not all of these ideas will be
    implemented
  • Search engines will have to change, and soon!

18
References 1
  • Thomas Mandl "Tolerant and Adaptive Information
    Retrieval with Neural Networks"
  • www.shaping-the-future.de/pages/abstracts/abstrac
    t_190.htm
  • Fabio Crestani (1998)Exploiting the Similarity
    of Non-Matching Terms at Retrieval Time
    (University of Glascow)
  • Jörg Tiedemann (1999)
  • http//stp.ling.uu.se/joerg/dh99/dh16/

19
References 2
  • Thomas Mandl "Tolerant and Adaptive Information
    Retrieval with Neural Networks"
  • www.shaping-the-future.de/pages/abstracts/abstrac
    t_190.htm
  • CJ Van Rijsbergen (1979)Information
    Retrieval(University of Glascow)
  • Mark Sanderson (1999)Retrieving with good
    sense(University of Glascow)
Write a Comment
User Comments (0)
About PowerShow.com