WWW%20Search%20and%20Navigation - PowerPoint PPT Presentation

About This Presentation
Title:

WWW%20Search%20and%20Navigation

Description:

Bush 1945, memex trail blazing. Nelson 1965, Xanadu - network of documents ... Compute page navigability potential (PG) Compute page authority ranking (GR) ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 24
Provided by: mar337
Category:

less

Transcript and Presenter's Notes

Title: WWW%20Search%20and%20Navigation


1
WWW Search and Navigation
  • Mark Levene
  • SCIS, Birkbeck College
  • University of London
  • www.dcs.bbk.ac.uk/mark/

2
Talk Overview
  • Hypertext and the navigation problem
  • NavigationZones solution
  • Problems being researched
  • A Demonstration

3
Hypertext and Navigation
  • Long history
  • Bush 1945, memex trail blazing
  • Nelson 1965, Xanadu - network of documents
  • Problem of getting lost in hyperspace
  • Navigation aids
  • Bookmarks
  • History
  • Overview diagrams
  • Recommendations

4
State-of-the-Art Navigation Aids
  • Novel User-Interfaces to visualise web sites
  • Clustering (e.g. Self-Organising Maps)
  • Web data mining finding user patterns
  • Semi-automated navigation, BestTrail algorithm
    motivation to follow

5
Typical corporate search
6
A typical search scenario
  • Submit a query to a search engine
  • Is it too broad / too specific?
  • Does it capture my information needs?
  • Select a URL from the result set
  • Have I made the right choice?
  • Start manual navigation
  • Where - am I? have I come from ? am I going to ?
  • Goto (1) to reformulate the query

7
Content centric approach
8
Problems with standard Search
  • Page level relevance scoring
  • sensitive to query terms
  • No look ahead
  • click and discover
  • No context
  • results are totally isolated
  • No navigation support
  • Users are left on their own to find their way

9
Possible solutions (information retrieval)
  • Improve basic IR
  • Link analysis, e.g. pagerank and HITS
  • Meta data tagging
  • Keywords and taxonomies (semantic web)
  • Natural language
  • QA, sentence analysis, synonyms

10
Possible solutions (information seeking)
  • Suggestion engines
  • Link and content generation
  • Categories and directories
  • Explicit manual construction
  • Automatic classification
  • Machine learning techniques

11
Are these feasible?
  • Re-architecting corporate information
    infrastructure is extremely expensive
  • Sophisticated approaches are not always intuitive
    and are yet to be proven
  • Same problem every couple of years
  • Mergers and acquisitions

12
There is, actually, a better way!
  • Treat sequence of pages, or trails, as
    first-class citizens for search
  • Consider the topology of the area in which you
    are searching
  • Employ navigational aids

13
Context centric approach
14
The information value of a trail is higher than
the sum of it parts!
15
Our approach
  • Provide information retrieval of the highest
    quality and in addition,
  • Find out what is beyond the most relevant pages
    by exploring the area
  • Present users with precise and relevant trails
  • Provide navigation assistance within the UI

16
NavZone user interface
17
NavZone Usability Study
First Monday paper
Task find answers to 5 types of questions
  1. Fact Finding What are the term dates?
  2. Judgement Is CSIS a good place to do
    research?
  3. Fact Comparison Which train station is closest
    to the college?
  4. Judgement Comparison Is the research in deptA
    better than that in deptB?
  5. General Navigational How do you get to the
    checkout?

18
NavZone vs. Google and Compass
of subjects, 4 questions correct
  • 59 Google
  • 75 Compass
  • 83 NavZone

19
Average clicks to complete task
  • 44 Google
  • 40 Compass
  • 27 NavZone
  • NavZone is bandwidth green !

20
Average time taken per task (min)
  • 18 Compass
  • 17 Google
  • 13 NavZone

Wilcoxon Test - Statistically Significant
21
The ingredients of the System
  • State-of-the-art web crawler
  • Highly efficient document Indexer
  • Competitive IR
  • Patent protected trail engine and UI

22
The main ingredients
23
The crawler
  • Pick a URL from the queue
  • Download the page
  • Parse and extract main features
  • Replace URL in queue with outlinks

24
The indexer
  • Compute page statistics for IR
  • Compute page navigability potential (PG)
  • Compute page authority ranking (GR)
  • Build page summary information
  • Build inverted index

25
The trail engine
  • Compute page scores for query
  • Explore graph from good starting nodes
  • Rank candidate trails
  • Build result set

26
Under Development
  • Alternative User-Interfaces
  • Seamless integration with relational databases
    and file systems
  • Data mining and personalisation
  • Mobile/PDA support

27
Open Problem
  • How do we make use of statistical regularities
    that are present in the web to improve search and
    navigation?
  • See, Levene et al. A stochastic model for the
    evolution of the web., Condensed Matter Archive,
    cond-mat/0110016, 2001- many distributions
    related to the web graph follow a power law
Write a Comment
User Comments (0)
About PowerShow.com