Title: Fan Chung
1The PageRank of a graph
Fan Chung University of California,
San Diego
2(No Transcript)
3What is PageRank?
- Ranking the vertices of a graph
- Partially ordered sets graph theory
A new graph invariant for dealing with
-Applications --web search,ITdata sets,
xxxxxxxxxxxxxpartitioning algorithms, -Theory
--- correlations among vertices
4fan
5Outline of the talk
- Local cuts and the Cheeger constant
- Four versions of the Cheeger inequality
using
- eigenvectors
- random walks
- PageRank
- heat kernel
- Four partitioning algorithms
- Greens functions and hitting time
6What is PageRank?
What is Rank?
7(No Transcript)
8(No Transcript)
9What is PageRank?
PageRank is defined on any graph.
10An induced subgraph of the collaboration graph
with authors of Erdös number 2.
11A subgraph of the Hollywood graph.
12 A subgraph of a BGP graph
13The Octopus graph
Yahoo IM graph Reid Andersen 2005
14Graph Theory has 250 years of history.
Leonhard Euler 1707-1783
The Bridges of Königsburg
Is it possible to walk over every bridge once and
only once?
15Geometric graphs
Topological graphs
Algebraic graphs
General graphs
16Massive data
Massive graphs
- WWW-graphs
- Call graphs
- Acquaintance graphs
- Graphs from any data a.base
protein interaction network Jawoong Jeong
17Big and bigger graphs
New directions in graph theory
18Many basic questions
- Correlation among vertices?
- The geometry of a network ?
distance, flow, cut,
eigenvalues, rapid mixing,
19Googles answer
The definition for PageRank?
20A measure for the importance of a website
The importance of a website is proportional to
the sum of the importance of all the sites that
link to it.
21A solution for the importance of a website
x
Solve
for
22A solution for the importance of a website
x
Solve
for
x ? A x
Adjacency matrix
23A solution for the importance of a website
x
Solve
for
x ? A x
Eigenvalue problems!
24Graph models
(undirected) graphs
25Graph models
(undirected) graphs
26Graph models
(undirected) graphs
2.3
1.2
1.5
1
2.8
1.1
3.3\
2
1.5
27In a directed graph,
there are two types of importance
authority
hub
Jon Kleinberg 1998
28Two types of the importance of a website
x
Importance as Authorities
y
Importance as Hubs
x r A y
y s A x
T
Solve
and
x rs A A x
T
y rs A A y
T
Singular eigenvalue problems!
29Eigenvalue problem for n x n matrix.
n 30 billion websites
Hard to compute eigenvalues
Even harder to compute eigenvectors
30In the old days, compute for a given
(whole) graph.
In reality, can only afford to compute
locally.
31A traditional algorithm Input a
given graph on n vertices.
Efficient algorithm means polynomial algorithms
n3, n2, n log n, n
New algorithmic paradigm Input access
to a (huge) graph
(e.g., for a vertex v, find its neighbors)
Bounded number of access.
32A traditional algorithm Input a
given graph on n vertices.
Efficient algorithm means polynomial algorithms
Exponential polynomial
n3, n2, n log n, n
New algorithmic paradigm Input access
to a (huge) graph
(e.g., for a vertex v, find its neighbors)
Infinity finite
Bounded number of access.
33The definition of PageRank given by Brin and Page
is based on
random walks.
34Random walks in a graph.
G a graph
P transition probability matrix
the degree of u.
A lazy walk
35Original definition of PageRank
A (bored) surfer
- either surf a random webpage
with probability a
with probability 1- a
a the jumping constant
36Definition of personalized PageRank
Two equivalent ways to define PageRank pr(a,s)
(1)
s the seed as a row vector
a the jumping constant
s
37Definition of PageRank
Two equivalent ways to define PageRank ppr(a,s)
(1)
(2)
s
the (original) PageRank
s
some seed, e.g.,
personalized PageRank
38How good is PageRank as a measure of
correlationship?
Depends on the applications?
How good is the cut?
Isoperimetric properties
39Isoperimetric properties
What is the shortest curve enclosing a unit
area?
In a graph G and an integer m, what is the
minimum cut disconnecting a subgraph of m
vertices?
In a graph G, what is the minimum cut e(S,V-S)
so that e(S,V-S) is the smallest?
_____Vol S
40How good is the cut?
Two types of cuts
S
E(S,V-S)
41e(S,V-S)
e(S,V-S)
_____Vol S
_____ S
Vol S S deg(v)
S S 1
v e S
v e S
V-S
S
42The Cheeger constant for graphs
The Cheeger constant
The volume of S is
hG and its variations are sometimes called
conductance, isoperimetric number,
43The Cheeger inequality
The Cheeger constant
The Cheeger inequality
? the first nontrivial eigenvalue of the
xx(normalized) Laplacian of a connected graph.
44The spectrum of a graph
Many ways to define the spectrum of a graph.
How are the eigenvalues related to
properties of graphs?
45The spectrum of a graph
adjacency matrix
diagonal degree matrix
Gustav Robert Kirchhoff 1824-1887
46The spectrum of a graph
adjacency matrix
diagonal degree matrix
Matrix tree theorem
spanning . trees
Gustav Robert Kirchhoff 1824-1887
47The spectrum of a graph
adjacency matrix
diagonal degree matrix
Random walks Rate of convergence
Gustav Robert Kirchhoff 1824-1887
48The spectrum of a graph
loopless, simple
Discrete Laplace operator
not symmetric in general
symmetric normalized
with eigenvalues
49The spectrum of a graph
Discrete Laplace operator
not symmetric in general
symmetric normalized
with eigenvalues
50dictates many properties of a graph.
- expander
- diameter
- discrepancy
- subgraph containment
- .
Spectral implications for finding good cuts?
51Finding a cut by a sweep
order the vertices
For
Consider sets
and the Cheeger constant of
Define
52Finding a cut by a sweep
Using a sweep by the eigenvector, can reduce the
exponential number of choices of subsets to a
linear number.
53Finding a cut by a sweep
Using a sweep by the eigenvector, can reduce the
exponential number of choices of subsets to a
linear number.
Still, there is a lower bound guarantee by using
the Cheeger inequality.
54Four types of Cheeger inequalities.
Four proofs using
- eigenvectors
- random walks
- PageRank
- heat kernel
Leading to four different one-sweep
partitioning algorithms.
55Four proofs of Cheeger inequalities
- graph spectral method
- random walks
- PageRank
- heat kernel
spectral partition algorithm
local partition algorithms
56Graph partitioning
Local graph partitioning
57What is a local graph partitioning algorithm?
A local graph partitioning algorithm finds a
small cut near the given seed(s) with running
time depending only on the size of the output.
58Examples of local partitioning
59Examples of local partitioning
60Examples of local partitioning
61Examples of local partitioning
62Examples of local partitioning
63(No Transcript)
64Four proofs of Cheeger inequalities
- graph spectral method
- random walks
- PageRank
- heat kernel
spectral partition algorithm
local partition algorithms
65Four proofs of Cheeger inequalities
- graph spectral method
- random walks
- PageRank
- heat kernel
Cheeger 60s, Fiedler 73
Alon 86, JerrumSinclair 89
Lovasz, Simonovits, 90, 93 Spielman, Teng, 04
Andersen, Chung, Lang, 06
Chung, PNAS , 08.
66The Cheeger inequality
Partition algorithm
Using eigenvector
,
the Cheeger inequality can be stated as
where ? is the first non-trivial eigenvalue of
the Laplacian and is the minimum Cheeger
ratio in a sweep using the eigenvector .
67Proof of the Cheeger inequality
from definition
by Cauchy-Schwarz ineq.
from the definition.
summation by parts.
68A Cheeger inequality using random walks
Lovász, Simonovits, 90, 93
Leads to a Cheeger inequality
where is the minimum Cheeger ratio over
sweeps by using a lazy walk of k steps from every
vertex for an appropriate range of k .
69A Cheeger inequality using PageRank
Using the PageRank vector.
Recall the definition of PageRank ppr(a,s)
(1)
(2)
Organize the random walks by a scalar a.
70Random walks versus PageRank
How fast is the convergence to the stationary
distribution?
Choose a to satisfy the required property.
For what k, can one have ?
71A Cheeger inequality using PageRank
with seed as a subset S
Using the PageRank vector
and a Cheeger
inequality can be obtained
where ?S is the Dirichlet eigenvalue of the
Laplacian, and is the minimum Cheeger ratio
over sweeps by using the appropriate personalized
PageRank with seeds S.
72Dirichlet eigenvalues for a subset
over all f satisfying the Dirichlet
boundary condition
for all
73Local Cheeger constant for a subset
74A Cheeger inequality using PageRank
with seed as a subset S
Using the PageRank vector
and a Cheeger
inequality can be obtained
where ?S is the Dirichlet eigenvalue of the
Laplacian, and is the minimum Cheeger ratio
over sweeps by using personalized PageRank with
seed S.
75Algorithmic aspects of PageRank
- Fast approximation algorithm for
x
personalized PageRank
greedy type algorithm, almost linear complexity
Can use the jumping constant to approximate
PageRank with a support of the desired size.
- Errors can be effectively bounded.
76A graph partition algorithm using PageRank
Given a set S with
randomly choose a vertex v in S.
With probability at least
the one-sweep algorithm using
has an initial segment with the Cheeger
constant at most
77Graph partitioning using PageRank vector.
198,430 nodes and 1,133,512 edges
78(No Transcript)
79(No Transcript)
80Kevin Lang 2007
81Four proofs of Cheeger inequalities
- graph spectral method
- random walks
- PageRank
- heat kernel
Fiedler 73, Cheeger, 60s
Alon 86
Lovasz, Simonovits, 90, 93 Spielman, Teng, 04
Andersen, Chung, Lang, 06
Chung, PNAS , 08.
82PageRank versus heat kernel
Geometric sum
Exponential sum
83PageRank versus heat kernel
Geometric sum
Exponential sum
recurrence
Heat equation
84A Cheeger inequality using the heat kernel
Theorem
where is the minimum Cheeger ratio over
sweeps by using heat kernel pagerank over all u
in S.
Theorem For
85Definition of heat kernel
86A Cheeger inequality using the heat kernel
Using the upper and lower bounds,
a Cheeger inequality can be obtained
where ?S is the Dirichlet eigenvalue of the
Laplacian, and is the minimum Cheeger ratio
over sweeps by using heat kernel with seeds S
for appropriate t.
87(No Transcript)
88(No Transcript)
89(No Transcript)
90Many applications of PageRank for problems in
Graph Theory
- Graph drawing using PageRank
- Graph embedding using PageRank
- Pebbing and routing using PageRank
- Covering and packing using PageRank
- Relating graph invariants of subgraphs to the
host graph using PageRank
- Your favorite old problem using PageRank?
91New Directions in Graph Theory for information
networks
- Random graphs with general degrees
- pageranks
- Algorithmic game theory, graphical games
Topics
- Spectral methods
- Probabilistic methods
- Quasirandom
Using
92(No Transcript)
93(No Transcript)