Algorithms for Large Data Sets - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Algorithms for Large Data Sets

Description:

r is a non-negative normalized left eigenvector of B with ... Ex: query: 'automobile manufacturers'; hubs: KBB, car link lists. 16. Mutual Reinforcement ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 24

Provided by: zivbar

Category:

more less

Transcript and Presenter's Notes

Title: Algorithms for Large Data Sets

1
Algorithms for Large Data Sets

Ziv Bar-Yossef

Lecture 4 April 9, 2006
http//www.ee.technion.ac.il/courses/049011
2
Crash Course in AlgebraandMarkov Chains
3
Ranking Algorithms
4
PageRank, Attempt 1

Additional Conditions
r is non-negative r 0
r is normalized r1 1
B normalized adjacency matrix

Then
r is a non-negative normalized left eigenvector
of B with eigenvalue 1

5
PageRank, Attempt 1

Solution exists only if B has eigenvalue 1
Problem B may not have 1 as an eigenvalue
Because some of its rows are 0.
Example

6
PageRank, Attempt 2

? normalization constant
r is a non-negative normalized left eigenvector
of B with eigenvalue 1/?

7
PageRank, Attempt 2

Any nonzero eigenvalue ? of B may give a solution
l 1/?
r any non-negative normalized left eigenvector
of B with eigenvalue ?
Which solution to pick?
Pick a principal eigenvector (i.e.,
corresponding to maximal ?)
How to find a solution?
Power iterations

8
PageRank, Attempt 2

Problem 1 Maximal eigenvalue may have
multiplicity gt 1
Several possible solutions
Happens, for example, when graph is disconnected
Problem 2 Rank accumulates at sinks.
Only sinks or nodes, from which a sink cannot be
reached, can have nonzero rank mass.

9
PageRank, Final Definition

e rank source vector
Standard setting e(p) ?/n for all p (? lt 1)
1 the all 1s vector

Then
r is a non-negative normalized left eigenvector
of (B 1eT) with eigenvalue 1/?

10
PageRank, Final Definition

Any nonzero eigenvalue of (B 1eT) may give a
solution
Pick r to be a principal left eigenvector of (B
1eT)
Will show
Principal eigenvalue has multiplicity 1, for any
graph
There exists a non-negative left eigenvector
Hence, PageRank always exists and is uniquely
defined
Due to rank source vector, rank no longer
accumulates at sinks

11
An Alternative View of PageRankThe Random
Surfer Model

When visiting a page p, a random surfer
With probability 1 - d, selects a random outlink
p ? q and goes to visit q. (focused browsing)
With probability d, jumps to a random web page q.
(loss of interest)
If p has no outlinks, assume it has a self loop.
P probability transition matrix

12
PageRank Random Surfer Model
Suppose
Then

Therefore, r is a principal left eigenvector of
(B 1eT) if and only if it is a principal left
eigenvector of P.

13
PageRank Markov Chains

PageRank vector is normalized principal left
eigenvector of (B 1eT).
Hence, PageRank vector is also a principal left
eigenvector of P
Conclusion PageRank is the unique stationary
distribution of the random surfer Markov Chain.
PageRank(p) r(p) probability of random surfer
visiting page p at the limit.
Note Random jump guarantees Markov Chain is
ergodic.

14
PageRank Computation
In practice about 50 iterations suffices
15
HITS Hubs and Authorities Kleinberg, 1997