Title: The Complex Network of Wikipedia
1The Complex Network of Wikipedia
F. Colaiori, V. Servedio, G. Caldarelli,
AC physics dept., La Sapienza, Rome (Italy) D.
Donato, S. Leonardi computer science dept., La
Sapienza, Rome (Italy) L. Salete
Buriol computer science dept., University of
Porto Alegre, Rio Grande do Sul (Brazil)
2The Complex Network of Wikipedia
Network description Statistical analysis of
Wikipedia Model and interpretation
The Complex Network of Wikipedia
3(No Transcript)
4(No Transcript)
5La rete complessa di Wikipedia
6(No Transcript)
7How does Wikipedia work?
- Thanks to the Wiki technology, a user can
- add new entries to the encyclopedia
- modify the content of existing entries
- modify their connections
- NB in the World Wide Web every user is
responsible only for the out-degree of his web
page.
The Complex Network of Wikipedia
8Nodes and edges in Wikipedia
Network edges are encyclopedia entries Edges
are citations between entries
The Complex Network of Wikipedia
9Statistical Properties
Entries number grows exponentially in time
The Complex Network of Wikipedia
10Statistical Properties
Preliminary results found by Voss and Zlatic show
that Wikipedia is indeed a complex network, with
power law degree distributions. J. Voss,
Proceedings of 10th International Conference of
the International Society for Scientometrics and
Informetrics, (Stockholm, Sweden), 2005. V.
Zlatic, M. Bozicevic, H. Stefancic, and M.
Domazet Phys. Rev. E 74, 016115 (2006)
The Complex Network of Wikipedia
11Degree distribution
The Complex Network of Wikipedia
12Preferential attachment
To detect the preferential attachment, we have
adopted the method introduced by Newman (2001)
one builds the histogram ?(k) of the degree of
vertices acquiring new edges at each time step t
weighing their contribution by a factor
n(k,t)/N(t), where N(t) is the number of nodes
at time t n(k,t) is the number of nodes with
degree k at time t If ?(k) has an approximatedly
linear behaviour, therefore perhaps we can
conclude that there is preferential attachment.
The Complex Network of Wikipedia
13Preferential attachment
Circles english Triangles portuguese Filled
in-degree White out-degree
The Complex Network of Wikipedia
14Lack of correlations (in-in)
english portuguese
The Complex Network of Wikipedia
15A model for Wikipedia
- At each time step one adds a node and M edges.
The direction of edges is a random variable - 1. with probability R1 the edge leaves the new
node and points an existing node chosen with
probability proportional to its in-degree.
The Complex Network of Wikipedia
16A model for Wikipedia
- At each time step one adds a node and M edges.
The direction of edges is a random variable - 2. with probability R2 the edge points the new
node and leaves an existing node chosen with
probability proportional to its out-degree.
The Complex Network of Wikipedia
17A model for Wikipedia
- At each time step one adds a node and M edges.
The direction of edges is a random variable - 3. with probability R3 1 R1 - R2 the edge
points an existing node with probability
proportional to its in-degree and leaves and
leaves an existing node chosen with probability
proportional to its out-degree.
The Complex Network of Wikipedia
18Parameters in real data
- The parameters have a physical meaning and can
been measured on real data. In the english case,
for instance, this yields - R1 0.026, R2 0.091
- in the data we have, M 10
The Complex Network of Wikipedia
19Rate equation for in- e out-degree
- By approximating discrete time variation by
derivativatives with respect to the continuous
variable t, one can write and solve the following
rate equations for the in- and out-degree - dkin /dt (R1R3) kin t-1
- dkout /dt (R2R3) kout t-1
The Complex Network of Wikipedia
20Distribution of in- e out-degree
- By solving the rate equation, one obtains the
time evolutions and, with little algebra, the
distributions of the in- and out-degree
The Complex Network of Wikipedia
21Distribution of in- e out-degree
- Such distributions can be checked against real
data, by plugging the real data coefficients
R1,2,3 into the theoretical equations.
The Complex Network of Wikipedia
22Distribution of in- e out-degree
The Complex Network of Wikipedia
23Correlations
- The rate equations allow one to compute also the
indegree-indegree correlations
The Complex Network of Wikipedia
24Lack of correlations
Model
model 0.5
The Complex Network of Wikipedia
25Naïf interpretation
- Hypothesis
- in-degree popularity
- out-degree quality
- If the probability of increasing the in-degree
depends on the in-degree itself, it means that in
Wikipedia popularity prevails over quality. As in
the World Wide Web?
The Complex Network of Wikipedia
26Community structure
- Wikipedia displays a strong community structure
The Complex Network of Wikipedia
27Conclusions
- Wikipedia entries form a complex network with
preferential attachment, power law distribution
for both in- and out-degree and lack of
correlations - Preferential attachment explains the main
statistical properties - A naif interpretations would imply that the Wiki
technology is not enough to provide a better
dissemination of information with respect to the
World Wide Web. - More understanding is needed for the community
structure.
The Complex Network of Wikipedia
28Thank You
Reference Preferential attachment in the growth
of social networks The internet encyclopedia
Wikipedia A.C., V. D. P. Servedio, F. Colaiori,
L. S. Buriol, D. Donato, S. Leonardi, and G.
Caldarelli Phys. Rev. E 74, 036116 (2006)
The Complex Network of Wikipedia
29(No Transcript)
30(No Transcript)
31(No Transcript)