Smarter Search Engines - PowerPoint PPT Presentation

About This Presentation
Title:

Smarter Search Engines

Description:

There are billions of web pages on the Internet. They vary greatly in quality ... Example: 'The batting lineup for the Boston Red Sox on October 28, 1986' ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 15
Provided by: dantm9
Learn more at: http://web.cs.wpi.edu
Category:
Tags: boston | engines | red | search | smarter | sox

less

Transcript and Presenter's Notes

Title: Smarter Search Engines


1
Smarter Search Engines
  • Using Personalization to Improve Search Results
  • Eugene Cushman
  • Dan Murphy
  • George Stuart
  • Advised by Professor Mark Claypool

2
The Problem
  • There are billions of web pages on the Internet
  • They vary greatly in quality
  • Growth is Exponential
  • Search engines must adapt to keep up

3
Existing Systems
  • Google
  • Layered Architecture
  • PageRank
  • GroupLens
  • Applied to USENET
  • Different domain space
  • Uses collaborative filtering

4
Personalization
  • Qualitative rankings
  • Example Good Low-Fat Dessert Recipes
  • Example Theories of dinosaur extinction
  • Contrast with specific, factual searches
  • Example The batting lineup for the Boston Red
    Sox on October 28, 1986
  • Exploratory versus narrow-band searches

5
Collaborative Filtering
  • Uses aggregate data to predict user preference
  • User A like Foo
  • User B trusts User As preference
  • User B can be predicted to prefer Foo
  • (extremely simplified)
  • Algorithms
  • Pearson
  • Correlation
  • Coefficient

6
Foible the best of both worlds
  • Foible integrates disparate technologies to
    provide a powerful web-searching experience
  • Search Engine Indexing
  • Collaborative Filtering
  • Results in demonstrable improvement in search
    results

7
Foible Architecture
  • Spider
  • Analyzer
  • Cache
  • Collaborative
  • Engine
  • Search Engine
  • Web Interface

8
Web Spider
  • Parallelized Depth-first crawl of web
  • Create lists of nodes by parsing HTML, looking
    for links
  • Starts with link-heavy seed node
  • Custom seed node incorporating search results on
    dinosaurs from Yahoo, Google, and others
  • Foible Statistics
  • Over 27,000 web pages crawled
  • In excess of 500 Megs of web data cached
  • Total database size of 1 Gigabyte
  • 7.269 Million rows in Word Frequency table

9
Analyzer
  • Parses HTML to create describe attributes of web
    page
  • Document Size, Number of Sentences
  • Reading Level (Fog, Flesch-Kincaid)
  • Number of Images
  • Content-to-HTML ratio
  • Number of Links
  • Precomputes word-frequency tables

10
Collaborative Searching
  • Three components of search algorithm
  • Word Frequency
  • Profile Correlation
  • Recommender System
  • Computes ranking of all pages
  • Returns results to user

11
User Study
  • Approximately 50 Users
  • 20 Completed study in its entirety
  • Consisted of 5 Searches
  • Predefined broad topics
  • Users provided explicit feedback
  • Search results presented in two column format
  • Enhanced Collaborative Results
  • Control Word Frequency Only

12
User Study Data 1
13
User Study Data 2
14
Results and Conclusion
  • Users unanimously prefer collaborative ratings to
    non-collaborative
  • Smarter searches produced pages ranked in better
    order according to study
  • Introducing collaborative filtering into
    traditional search engine technology results in
    better search results!
Write a Comment
User Comments (0)
About PowerShow.com