Title: A Random-Surfer Web-Graph Model
1A Random-Surfer Web-Graph Model
Mugizi Rwebangira
- (Joint work with Avrim Blum Hubert Chan)
2The Web as a Graph
Consider the World Wide Web as a graph, with web
pages as nodes and hyperlinks between pages as
edges.
3Studying the Web
- Since the Web emerged there has been a lot of
interest in - Empirically studying properties of the Web Graph.
- Modeling the Web Graph mathematically.
- Benefits of Generative Models
- Simulation When real data is scarce
- Extrapolation How will the graph change?
- Understanding Inspire further research on real
data
4Power Law
f(x) g(x) if Limx?8 f(x)/g(x) 1
e.g (x1) (x2)
The distribution of a random variable X follows a
power law if Prob Xk Ck-a
Example Prob Xk k-2
5Power Law Prob Xk k-2
6Power Law
Prob Xk Ck-a
log Prob Xk log C a log k
Prob Xk k-2
log Prob Xk -2 log k
7Power Law Log-Log plot
8Power Law contd.
More general definition
Prob Xk Ck-a
Particularly useful if X takes on real values.
Sometimes referred to as heavy tailed or
scale free.
9Power Laws in Degree distribution
Let G be a graph.
Let Xk be the proportion of nodes with degree k
in G.
Then if Xk Ck-a we say that G has power law
degree distribution.
10Properties of the Web Graph
A Power-law degree distribution has been observed
in a wide variety of graphs including citation
networks, social networks, protein-protein
interaction networks and so on.
It has also been observed in the Web Graph.
Barabási Albert
11Outline
- Background/Previous Work
- Motivation
- Models
- Theoretical results
- Experimental results
- Conclusions
12Classic Random Graph Models
- In the G(n,p) random graph model
- There are n nodes.
- There is an edge between any two nodes with
probability p.
- Was proposed by Erdös and Renyi in 1960s.
13Online G(n,p)
- In this model each new node makes k connections
to existing nodes uniformly at random.
For this talk we will focus on k 1, hence the
graph will be a tree.
14Online G(n,p)
15Properties of Online G(n,p)
- Edegree of first node 1 1/2 1/31/4 1/n
?(log n)
- Xk Proportion of nodes with degree k
- EXk ?(½k)
NOT POWER LAWED!!
16Online G(n,p) (n100,000, average of 100 runs)
17Preferential Attachment
In the Preferential Attachment model, each
new node connects to the existing nodes with a
probability proportional to their degree.
Barabási Albert
18Preferential Attachment
19Preferential Attachment
Edegree of 1st node vn
Preferential Attachment gives a power-law degree
distribution. Mitzenmacher, Cooper Frieze 03,
KRRSTU00
20Preferential Attachment
21Other Models
Kumar et. al. proposed the copying model.
KRRSTU00 Leskovec et. al. propose a forest
fire model which has some similarites to this
work. LKF05
22Outline
- Background/Previous Work
- Motivation
- Models
- Theoretical results
- Experimental results
- Conclusions
23Motivating Questions
- Why would a new node connect to nodes of high
degree? - Are high degree nodes more attractive?
- Or are there other explanations?
How does a new node find out what the high degree
nodes are?
24Motivating Questions
Motivating Observation
- Suppose each page has a small probability p of
being interesting.
- Suppose a user does a (undirected) random walk
until they - find an interesting page.
- If p is small then this is the same as
preferential attachment.
- What about other processes and directed graphs?
25Outline
- Background/Previous Work
- Motivation
- Models
- Theoretical results
- Experimental results
- Conclusions
26Directed 1-step Random Surfer, p.5
27Directed 1-step Random Surfer
It turns out this model is a mixture of
connecting to nodes uniformly at random and
preferential attachment.
Has a power-law degree distribution.
But taking one step is not very natural.
What about doing a real random walk?
28Directed Coin Flipping model
- Pick a node uniformly at random.
2. Flip a coin of bias p
If HEADS connect to current node, else walk to
neighbor
D
C
NEW NODE
B
A
RANDOM STARTING NODE
1. COIN TOSS TAIL (at node A)
2. COIN TOSS TAIL (at node B)
3. COIN TOSS HEAD (at node C)
29Directed Coin Flipping model
- At time 1, we start with a single node with a
self-loop. - At time t, we choose a node u uniformly at
random. - We then flip a coin of bias p.
- If the coin comes up heads, we connect to the
current node. - Else we walk to a random neighbor and go to step
3.
each page has equal probability p of being
interesting to us
30Outline
- Background/Previous Work
- Motivation
- Models
- Theoretical results
- Experimental results
- Conclusions
31Is Directed Coin-Flipping Power-lawed?
We dont know but we do have some partial
results ...
32Virtual Degree
Definitions
Let li(u) be the number of level i descendents of
node u. l1(u) of children l2(u) of
grandchildren, e.t.c.
Let ? (ß1, ß2,..) be a sequence of real numbers
with ?11.
Then v?(u) 1 ß1 l1(u) ß2 l2(u) ß3
l3(u) Well call v?(u) the Virtual degree
of u with respect to ?.
33Virtual Degree
34Virtual Degree
Easy observation If we set ßi (1-p)i then the
expected increase in deg(u) is proportional to
v(u).
Expected increase in deg(u) p/t (1-p)pl1(u)/t
(1-p)2pl2(u)/t (p/t)v(u)
35Virtual Degree
- Theorem There always exist ßi such that
- For i 1, ßi 1.
- As i ? 8, ßi ?0 exponentially.
- The expected increase in v(u) is proportional to
v(u).
Recurrence ?11, ?2p, ?i1?i
(1-p)?i-1
E.g., for p¾, ?i 1, 3/4, 1/2, 5/16, 3/16,
7/64,...
for p½, ?i 1, 1/2, 0, -1/4, -1/4,
-1/8, 0, 1/16,
36Virtual Degree, continued
Let vt(u) be the virtual degree of node u at time
t and tu be the time when node u first appears.
Theorem For any node u and time t tu,
Evt(u) T((t/tu)p)
So, the expected virtual degrees follow a power
law.
37Actual Degree
We can also obtain lower bounds on the expected
values of the actual degrees
Theorem For any node u and time t tu,
Edegree(u) O((t/tu)p(1-p))
38Outline
- Background/Previous Work
- Motivation
- Models
- Theoretical results
- Experimental results
- Conclusions
39Experiments
- Random graphs of n100,000 nodes
- Compute statistics averaged over 100 runs.
- K1 (Every node has out-degree 1)
40Online Erdös-Renyi
41Directed 1-Step Random Surfer, p3/4
42Directed 1-Step Random Surfer, p1/2
43Directed 1-Step Random Surfer, p1/4
44Directed Coin Flipping, p1/2
45Directed Coin Flipping, p1/4
46Undirected coin flipping, p1/2
47Undirected Coin Flipping p0.05
48Outline
- Background/Previous Work
- Motivation
- Models
- Theoretical results
- Experimental results
- Conclusions
49Conclusions
- Directed random walk models appear to generate
power-laws (and partial theoretical results).
Power laws can naturally emerge, even if all
nodes have the same intrinsic attractiveness.
50Open questions
- Can we prove that the degrees in the directed
coin-flipping model do indeed follow a power law?
- Analyze degree distribution for the undirected
coin-flipping - model with p1/2?
- Suppose page i has interestingness pi. Can we
analyze - the degree as a function of t, i and pi?
51Questions?