Computing Page Rank in a Distributed Internet Search System - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Computing Page Rank in a Distributed Internet Search System

Description:

Computing Page Rank in a Distributed Internet Search System. Anupam Jain. CS 586 ... Web Crawlers. Central Servers. Parsing. Indexing. Page Ranking. USC Bookstore : ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 25
Provided by: anup8
Category:

less

Transcript and Presenter's Notes

Title: Computing Page Rank in a Distributed Internet Search System


1
Computing Page Rank in a Distributed Internet
Search System
  • Anupam Jain
  • CS 586

2
A Web Search Engine
WWW
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
Behind Google
  • Web Crawlers
  • Central Servers
  • Parsing
  • Indexing

7
Page Ranking
USC Bookstore
My Personal Homepage
USC Football Tickets Page
USC Bookstore Homepage
Trojan Vision Web Site
8
Page Ranking
USC Bookstore
USC Bookstore Homepage
USC Football Tickets Page
My Personal Homepage
USC Trojan Vision Website
9
Whats Wrong
  • Scalability
  • Slow Update
  • The Hidden Web
  • Robot Exclusion Rule
  • High Maintenance
  • - enormous source requirements
  • Large Data Cache
  • Point Failures
  • Network Problems

10
  • Distributed Internet Search Engine Framework

11
(No Transcript)
12
How it works..
  • Query Routing
  • Query Execution (local)
  • Page Ranking
  • Server Ranking
  • Result Fusion

13
Page Rank Review
  • Random Surfer Model
  • C(Ti) - Number of outgoing links of page Ti
  • d Damping factor (0.85)
  • PR (A) Page Rank of Page A
  • PR(A) (1-d) d( PR(Ti)/C (Ti) .. PR
    (Tn)/ C(Tn) )

14
Other factors
  • Occurences of a term on a page
  • Position of the keywords
  • Font size of the terms

15
Server Linkage Structure
16
Server Linkage Structure
  • Experimental Data 1,049,271 pages
  • Intra Server Hyperlinks 865765 pages
  • Inter Server Hyperlinks 255,856 pages

17
Outline of the Algorithm
  • Construct Web Link Graph to compute Local
    PageRank vector
  • Exchange interserver hyperlink information
  • Compute Server Rank Vector
  • Results merged to generate the final link list

18
Evaluation Metrics
  • Kendalls Distance
  • d magnitude of the page set domain
  • p1, p2 page ranks
  • Kdist (p1,p2)
  • K (p1,p2) / d(d-1)/2

19
Local Page Rank
  • Number of referrals (in links)
  • Damping factor users jumping to other pages
  • Out links
  • Random Surfer Model
  • Kendalls Distance

20
Compute Local Page Ranks
  • LPR-2
  • Servers exchange counts of inter-server link
    information
  • LPR-1
  • Remove all inter-server links (ones in pink)
  • Apply PageRank Algorithm

Source Distributed Search System, David Dewitt,
UW-M
21
Server Rank
  • Relative importance of servers
  • Inter server links
  • Kendalls server rank distance
  • Page rank hyperlink exchange
  • Local Ranks lt-gt Global ranks

22
Result Fusion
  • Server Ranks weighted
  • Sorted list of all local page ranks
  • Global Ranks

23
Query Evaluation
  • Meta Terms
  • Subject Terms
  • Positions of terms
  • IR Score
  • Result IR Score Page ranks

24
Looking Ahead
  • Distributed Environment
  • Enhanced flexibility
  • Prevents overwhelming load on server
  • Small cache
  • Updated indexes
  • Cohesive approach
Write a Comment
User Comments (0)
About PowerShow.com