PageRank Algorithm - PowerPoint PPT Presentation

1 / 109
About This Presentation
Title:

PageRank Algorithm

Description:

Notice RT is the (left) eigenvector associated with the eigenvalue 1. ... We call these problem, degenerate links, or 'Dead ends' in our random walk model. ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 110
Provided by: mathU4
Category:

less

Transcript and Presenter's Notes

Title: PageRank Algorithm


1
PageRank Algorithm
  • Directed Graphs, probability, and
  • Markov Chains

2
(No Transcript)
3
N 0
4
N 1
5
N 2
6
N 3
7
N 4
8
N 5
9
N 6
10
N 7
11
N 8
12
Ranking 6 5 4 1 3 8 7 2
13
Goal
  • Our goal is find a probabilistic distribution of
    rankings that accurately represent the network
    topology.

14
Link Structure

A
B
C
15
Adjacency Matrix
  • A B C
  • A
  • B
  • C

16
Link Structure

A
B
C
17
Adjacency Matrix
  • A B C
  • A 0 1 1
  • B
  • C

18
Link Structure

A
B
C
19
Adjacency Matrix
  • A B C
  • A 0 1 1
  • B 0 0 1
  • C

20
Link Structure

A
B
C
21
Adjacency Matrix
  • A B C
  • A 0 1 1
  • B 0 0 1
  • C 1 0 0

22
Binary to Probabilistic
  • We need to know if a page links to another page.
  • As well as the relative importance of that
    particular link.
  • So, we transform (or, normalize) the binary
    adjacency matrix to a probabilistic or stochastic
    matrix.

23
Normalization
  • A B C
  • A 0 1 1 2 Links
  • B 0 0 1
  • C 1 0 0

24
Normalization
  • A B C
  • A 0 ½ ½ 2 Links
  • B 0 0 1
  • C 1 0 0

25
Normalization
  • A B C
  • A 0 ½ ½
  • B 0 0 1 1 link
  • C 1 0 0

26
Normalization
  • A B C
  • A 0 ½ ½
  • B 0 0 1
  • C 1 0 0

27
Stochastic Matrix
  • A B C
  • A 0 ½ ½
  • B 0 0 1
  • C 1 0 0

28
The Stationary State
  • R(A) R(C)
  • R(B) ½ R(A)
  • R(C) ½ R(A) R(B)
  • R(A) R(B) R(C) 1

29
Markov
  • ½R(A)
  • ½R(A)
  • R(C) R(B)

A
B
C
30
  • R(A) R(C)
  • R(B) ½ R(A)
  • R(C) ½ R(A) R(B)
  • ½ R(A)
  • ½ R(A)
  • R(C) R(B)

A
B
C
31
Markov Chain
  • We can think of this probability matrix
    intuitively as the path of the random surfer.
  • Then, we measure importance, or Page Rank, R,
    by the probabilities the random surfer will end
    up at this page at any particular point in time.

32
Stationary State
33
Stationary State
  • 0.2
  • 0.2 0.4 0.2

A 0.4
B 0.2
C 0.4
34
The Setup
  • Let R(u) be the rank of page u.
  • nu Number of outlinks from u.
  • So for vs linked to u
  • R(u)

35
The Problem
  • Let R(u) be the rank of page u.
  • nu Number of outlinks from u.
  • So for vs linked to u
  • R(u)
  • N equations and N unknowns is very different when
    N is HUGE (O(106))!

36
Power Method
  • We want RT RTP
  • Notice RT is the (left) eigenvector associated
    with the eigenvalue 1.
  • It is well-known that the dominant eigenvalue in
    a stochastic matrix is 1.

37
Power Method
  • So we iterate
  • XTm 1 XTmP
  • Here, X is any vector, and m is the number of
    iterations.
  • This method converges to the dominant eigenvector
    of P, regardless of x0

38
Power Method
  • If (for all i) ?i lt 1, then
  • XTm P RT where RT P RT
  • This is the stationary state of the Markov Chain.
  • In our case Page Rank!

39
Whats Going On Here?
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Three Disjoint Subsets!
47
Reducible Matrix
48
Uniqueness
  • Note A reducible matrix can have more than one
    eigenvector associated to the dominant
    eigenvalue.
  • So, the power method fails to converge to a
    unique vector.
  • We need irreducibility.

49
Irreducibility
  • Irreducible Stochastic matrix means there exists
    a path from any i to any J.
  • For each pair i, J, there exists an M such that
  • (PM)iJ ? 0
  • Here, the problem is with bidirectional
    connectivity.

50
Disjoint Subsets
  • With an undirected graph, irreducible is
    equivalent to disjoint non-empty subsets.

51
Solution to Disjoint Subsets
  • Here it is possible to rank each of these
    subsets.
  • But then we have the issue of meshing these
    disjoint rankings back together in a meaningful
    way.
  • And

52
The Sink
  • Since we are dealing with a directed graph, we
    also have to be concerned with the elusive
    sink.
  • Much more difficult to pinpoint.

53
The Sink
54
The Issue
  • The question of irreducibility is very difficult
    to answer when a matrix contains thousands or
    even millions of rows and columns.
  • A sink could be thousands, millions itself.

55
The Sink
  • We experimented with several methods to solve or
    alleviate this sink problem.
  • Add a master node which connects to and from all
    the others.
  • This alters structure, but not significantly.

56
Master Node
57
Alternative Method
  • Alternatively, we perturb the whole matrix.
  • Possibility of visiting a page by means other
    than a link (e.g. word of mouth, advertisements,
    etc.)
  • Base rank, if a page has no inlinks.
  • Again, we alter the structure. But, again not
    significantly.

58
Perturbation
59
Perturbation
  • We now have
  • 0 lt a lt 1
  • R(u) a (1- a)
  • This is often chosen as 0.85.

60
Justification
  • By doing this, our matrix is now everywhere
    connected!
  • Stochastic and irreducible.
  • So we can use the famous Power Method

61
Furthermore
  • Continued research
  • Comparison analysis of methods to handle
  • Dead ends (always exist on boundary)
  • Sinks
  • Convergence analysis
  • Master Node
  • Perturbation
  • Implementaion and Results Analysis
  • Analyzing micro and macro network topology
  • Exploring implementation methods

62
THEEND
63
Stochastics and the Dead End
  • What if a node links to no other nodes,
  • ie we have a row of all zeros.
  • THE MATRIX IS NOT STOCHASTIC
  • We call these problem links, or Dead ends in
    our model.
  • Some are actual Dead ends.
  • But, Dead ends always exist at the boundary of
    our data gathering.

64
The Dead End
65
Solutions
  • There are several methods to solve this problem
  • 1) Remove the Deadends
  • Alteration of incoming data.
  • We see it does not drastically affect the page
    rank, in particular of the higher ranked pages.
    These are the most important for the rank
    relevance.

66
(No Transcript)
67
  • 2) Alter the Deadends
  • Alteration of Processing
  • Add a master node.
  • Equally redistribute this PR to ever other page.
  • Again we change the matrix, but we see it, again,
    doesnt affect the top pages.

68
Redistribution
1 1 1 0
69
(No Transcript)
70
Adjacency Matrix
71
(No Transcript)
72
Adjacency Matrix
73
Continuing
74
Adjacency Matrix
75
Adjacency Matrix
76
Adjacency Matrix
77
Matrix Format
  • We have matrixA
  • But, we want
  • Pij

78
Normalization
79
Normalization
80
Normalization
81
Normalization
82
The Stochastic Matrix
83
The Stationary State
84
Mathematically Speaking
  • Let Ai be the binary vector of outlinks from page
    i
  • Ai ( ai1 , ai2, , aiN )
  • Ai 1 AiJ
  • P

85
  • Pi (pi1 , piN)
  • So, P i1 PiJ 1
  • We now have a row stochastic probability matrix
    (ie we are normal)
  • Unless, of course, a page (node) points to no
    others
  • Ai Pi 0

86
Stochastics
  • Let wiT (1/N)
  • where i 1, , N
  • Also, let
  • di 0 if i is not a dead end.
  • 1 if i is a dead end.
  • W dwT
  • ? S W P
  • is a stochastic matrix

87
Mathematically Speaking
  • The stochastic matrix S becomes
  • G a S ( 1 a) D
  • Where D ewT
  • e lt1,1,,1,1gt
  • wiT lt1/N,,1/Ngt as before

88
Stochastics and the Dead End
  • We call these problem, degenerate links, or Dead
    ends in our random walk model.
  • Some are actual Dead ends.
  • But, Dead ends always exist at the boundary of
    our data gathering.

89
Convergence
  • It has been proven that in the case of a
    reducible matrix S with at least 2 irreducible
    subclasses, we actually have ?2s 1. Moreover,
    the eigenvalues of G become ?1 1 gt ?2 a ? ?3
    ? ..
  • Here a dominates the convergence of the matrix!
  • Note a 1, no guaranteed convergence.

90
The Dead End
91
Solutions
  • There are several methods to solve this problem
  • 1) Remove the Deadends
  • Alteration of incoming data.
  • We see it does not drastically affect the page
    rank, in particular of the higher ranked pages.
    These are the most important for the rank
    relevance.

92
(No Transcript)
93
  • 2) Alter the Deadends
  • Alteration of Processing
  • Equally redistribute this PR to ever other page.
  • Again we change the matrix, but we see it, again,
    doesnt affect the top pages.

94
1 1 1 1
95
Stochastics
  • Let wiT (1/N)
  • where i 1, , N
  • Also, let
  • di 0 if i is not a dead end.
  • 1 if i is a dead end.
  • W dwT
  • ? S W P
  • is a stochastic matrix

96
A Stationary State
  • 0.2
  • 0.2 0.4 0.2

A 0.4
B 0.2
C 0.4
97
  • R(A) 0.5 R(B) 0.5 R(C)
  • R(B) R(C)
  • R(C) R(A)

98
Revisited
  • Let R(u) be the rank of page u.
  • nu Number of outlinks from u.
  • So for vs linked to u
  • R(u)

99
Explanation
  • 0 0.5 0.5
  • (R(A) R(B) R(C)) 0 0 1 (R(A) R(B) R(C))
  • 1 0 0

100
A Stationary State
  • 0 0.5 0.5
  • (0.4 0.2 0.4) 0 0 1 (0.4 0.2 0.4)
  • 1 0 0

101
(No Transcript)
102
Implementation
  • Now, of course G is dense.
  • But, the sparse nature of the original adjacency
    matrix can still be exploited.

103
Manipulation
  • Since G aP adwT (1-a)ewT
  • If we multiply both sides by xT, we get
  • xTG axT P xT ((1-a)e ad) wT
  • axT P (1-a a( xT1 - xTP1wT)

104
Manipulation
  • Simply let xmT P ymT , then
  • Let xm1T a ymT 1- a a( xmT 1 -
    ymT1)wT
  • This is equivalent to xmT G xm1T, where we can
    utilize the properties of a stochastic
    irreducible matrix with one recurrent class.

105
  • This is a relatively easy and inexpensive
    calculation at each iteration in the method.
    (Fortunately, since we are dealing with 600000 or
    higher dimensions)

106
A Stationary State
  • 0.2
  • 0.2 0.4 0.2

A 0.4
B 0.2
C 0.4
107
Explanation
  • 0 0.5 0.5
  • (R(A) R(B) R(C)) 0 0 1 (R(A) R(B) R(C))
  • 1 0 0

108
A Stationary State
  • 0 0.5 0.5
  • (0.4 0.2 0.4) 0 0 1 (0.4 0.2 0.4)
  • 1 0 0

109
Displacement
  • 0 0.5 0.5 1/3 1/3 1/3
  • 0.85 0 0 1 0.15 1/3 1/3
    1/3
  • 1 0 0 1/3 1/3 1/3
  • 0.05 0.475 0.475
  • 0.05 0.05 0.9
  • 0.9 0.05 0.05
Write a Comment
User Comments (0)
About PowerShow.com