1WWW Search and Navigation
- Mark Levene
- SCIS, Birkbeck College
- University of London
- www.dcs.bbk.ac.uk/mark/
2Talk Overview
- Hypertext and the navigation problem
- NavigationZones solution
- Problems being researched
- A Demonstration
3Hypertext and Navigation
- Long history
- Bush 1945, memex trail blazing
- Nelson 1965, Xanadu - network of documents
- Problem of getting lost in hyperspace
- Navigation aids
- Bookmarks
- History
- Overview diagrams
- Recommendations
4State-of-the-Art Navigation Aids
- Novel User-Interfaces to visualise web sites
- Clustering (e.g. Self-Organising Maps)
- Web data mining finding user patterns
- Semi-automated navigation, BestTrail algorithm
motivation to follow
5Typical corporate search
6A typical search scenario
- Submit a query to a search engine
- Is it too broad / too specific?
- Does it capture my information needs?
- Select a URL from the result set
- Have I made the right choice?
- Start manual navigation
- Where - am I? have I come from ? am I going to ?
- Goto (1) to reformulate the query
7Content centric approach
8Problems with standard Search
- Page level relevance scoring
- sensitive to query terms
- No look ahead
- click and discover
- No context
- results are totally isolated
- No navigation support
- Users are left on their own to find their way
9Possible solutions (information retrieval)
- Improve basic IR
- Link analysis, e.g. pagerank and HITS
- Meta data tagging
- Keywords and taxonomies (semantic web)
- Natural language
- QA, sentence analysis, synonyms
10Possible solutions (information seeking)
- Suggestion engines
- Link and content generation
- Categories and directories
- Explicit manual construction
- Automatic classification
- Machine learning techniques
11Are these feasible?
- Re-architecting corporate information
infrastructure is extremely expensive - Sophisticated approaches are not always intuitive
and are yet to be proven - Same problem every couple of years
- Mergers and acquisitions
12There is, actually, a better way!
- Treat sequence of pages, or trails, as
first-class citizens for search - Consider the topology of the area in which you
are searching - Employ navigational aids
13Context centric approach
14 The information value of a trail is higher than
the sum of it parts!
15Our approach
- Provide information retrieval of the highest
quality and in addition, - Find out what is beyond the most relevant pages
by exploring the area - Present users with precise and relevant trails
- Provide navigation assistance within the UI
16NavZone user interface
17NavZone Usability Study
First Monday paper
Task find answers to 5 types of questions
- Fact Finding What are the term dates?
- Judgement Is CSIS a good place to do
research? - Fact Comparison Which train station is closest
to the college? - Judgement Comparison Is the research in deptA
better than that in deptB? - General Navigational How do you get to the
18NavZone vs. Google and Compass
of subjects, 4 questions correct
- 59 Google
- 75 Compass
- 83 NavZone
19Average clicks to complete task
- 44 Google
- 40 Compass
- 27 NavZone
- NavZone is bandwidth green !
20Average time taken per task (min)
- 18 Compass
- 17 Google
- 13 NavZone
Wilcoxon Test - Statistically Significant
21The ingredients of the System
- State-of-the-art web crawler
- Highly efficient document Indexer
- Competitive IR
- Patent protected trail engine and UI
22The main ingredients
23The crawler
- Pick a URL from the queue
- Download the page
- Parse and extract main features
- Replace URL in queue with outlinks
24The indexer
- Compute page statistics for IR
- Compute page navigability potential (PG)
- Compute page authority ranking (GR)
- Build page summary information
- Build inverted index
25The trail engine
- Compute page scores for query
- Explore graph from good starting nodes
- Rank candidate trails
- Build result set
26Under Development
- Alternative User-Interfaces
- Seamless integration with relational databases
and file systems - Data mining and personalisation
- Mobile/PDA support
27Open Problem
- How do we make use of statistical regularities
that are present in the web to improve search and
navigation? - See, Levene et al. A stochastic model for the
evolution of the web., Condensed Matter Archive,
cond-mat/0110016, 2001- many distributions
related to the web graph follow a power law