Chapter 2 : The Web and the Problem of Search

About This Presentation
Title:

Chapter 2 : The Web and the Problem of Search

Description:

OVR is the number of pages returned by both search engines for typical queries. ... Users typical habits are different (short queries, inspect only top-10 pages) ... –

Number of Views:58
Avg rating:3.0/5.0
Slides: 17
Provided by: RB6
Category:
Tags: chapter | engines | problem | search | top | web

less

Transcript and Presenter's Notes

Title: Chapter 2 : The Web and the Problem of Search


1
Chapter 2 The Web and theProblem of Search
  • The size of the web, and how is it measured.
  • Search engine usage statistics.
  • The bow-tie structure of the web.
  • The small-world web.
  • Web information seeking strategies.
  • A taxonomy of web searches.
  • Web search versus Information Retrieval.
  • Differences between global and local search.
  • Differences between search and navigation.

2
Web size statistics
  • Number of accessible web pages latest estimate,
    May 2005, 11.5 billion.
  • The deep (or hidden or invisible) web contains
    400-550 times more information.
  • Coverage (i.e. the proportion of the web indexed)
    is crucial for search engines.

3
Measuring the size of the web
  • Capture-recapture method
  • SE1 is the number of pages indexed first search
    engine.
  • QSE2 is the number of pages returned by second
    search engine for typical queries.
  • OVR is the number of pages returned by both
    search engines for typical queries.
  • Estimate (SE1 x QSE2)/OVR
  • Estimate of 64.81 million web sites as of June
    2005.

4
Web usage statistics
  • Over 10 of the worlds population were online as
    of late 2004.
  • Number of broadband users is growing (over 50 of
    connected Americans use broadband).
  • Search engine usage as of June 2004
  • Google (41.6), Yahoo! (31.5), MSN (27.4), AOL
    (13.6), Ask Jeeves (7)
  • 200 million hits per day to Google (mid 2004).

5
Tabular Data versus Web Data
  • Figure 2.1 A database table versus a web site

6
Structure of the web
  • Figure 2.2 Map of the Internet (1998)

7
Structure of the web
  • Figure 2.3 Web pages related to dcs.bbk.ac.uk
  • (see www.touchgraph.com)

8
Structure of the web
  • Figure 2.4 Bow-tie shape of the web

9
The small-world web
  • Over 75 of the time there is no directed path
    from one random web page to another.
  • When a directed path exists its average length is
    16 clicks.
  • When an undirected path exists its average length
    is 7 clicks.
  • Short average path between pairs of nodes is
    characteristic of a small-world network.

10
Web information seeking strategies
  • Direct navigation
  • Enter the URL directly into the browser.
  • Navigation within a directory
  • Use a web portal as an entry point to the web.
  • Information seeking on the web is problematic and
    more users are turning to search engines.

11
Navigation using a search engine
  • Figure 2.5 Information seeking

12
A taxonomy of web searches
  • Informational acquire some information about a
    topic from web pages.
  • Navigational find a site to start navigation
    from.
  • Transactional perform some activity mediated by
    a web site.

13
Web search versus Information Retrieval
  • The scale of web search is way beyond traditional
    information retrieval.
  • The web is very dynamic.
  • The web contains an enormous amount of
    duplication.
  • The quality of web pages is not uniform.
  • The range of topics on the web is open.
  • The web is globally distributed.
  • Users typical habits are different (short
    queries, inspect only top-10 pages).
  • The web is hypertextual.

14
Information retrieval evaluation
  • Figure 2.6 Recall versus precision

15
Differences between global and local search
  • Local search engines on web sites have a bad
    reputation.
  • Users often use a web search engine such as
    Google or Yahoo! to find information on web
    sites, rather than the local web site search
    engine.
  • Many companies do not invest in local search.
  • Content management is a problem.
  • Language may be a problem.
  • Information needs on web sites may be different.

16
Differences between search and navigation
  • Search employing a search engine to find
    information.
  • Navigation (or surfing) employing a
    link-following strategy to find information.
  • The web encourages a combination of search,
    navigation and browsing.
Write a Comment
User Comments (0)
About PowerShow.com