Title: PowerLaws in Distributed Systems
1Power-Laws inDistributed Systems
- Mikko Vapa
- mikko.vapa_at_jyu.fi
- TIES427 Distributed Systems
2Contents
- Network models
- Stanley Milgrams studies on social networks
- Normal distribution and random graphs
- Power-law distribution and scale-free graphs
- Systems with power-law properties
- Fault-tolerance characteristics of random and
scale-free distributed systems
3Network Models Why?
- Using network models it is possible to understand
and foresee the structure of a specific network
and behavior - How the distances between nodes grow when network
grows? Is it possible to keep the distance low in
a large network by adding links to strategic
points? Where? - How resilient is the network for failures of
nodes and links? How many nodes can be removed
from a connectednetwork until it breaks down to
isolated clusters? - What would be the best way to route messages from
one node to another, if the structure of the
network is known only locally? - Is it possible to identify from a network (for
example WWW) using only neighbor information such
nodes that have important content or belong to a
related group of nodes? - Orponen, P., Internet ja muut informaatioverkosto
t, Tieteen päivät 2005
4Research Experiment on Social Networks
- In 1967 Stanley Milgram conducted a social
experiment to find out what is the distance
between any two people in the United States - 160 people around the states were selected as
starting points and 2 people as destinations for
letters (destination identified by name, photo
and address of the person) - The letters could only be passed from hand to
hand between acquaintances
5Six Degrees of Separation
- 42 of the 160 letters arrived to the destination
and the median number of intermediate persons was
5.5 - The result became to be known as Six Degrees of
Separation - Considering the amount of people in USA (about
100 million those days) the distance was
considered very short - Thus a saying Its a small world!
- Milgram S., The Small World Problem, Psychology
Today 1(1), 60-67 (1967)
6Six Degrees of Separation
- The results of the study raised a fundamental
question - Why is the distance so short?
- The question was left unanswered for years
- In the meanwhile some progress was going on in
the graph theory
7Normal Distribution
- Traditional graph theory has been based on
normally distributed (also called as bell curve)
graphs Erdös Renyi, 1959 Solomonoff
Rapoport, 1951
Where µ mean s standard deviation
8Random Graphs
- Bell curve graphs can be generated by randomly
connecting links between nodes - If number of links in a graph follows bell curve
distribution then - each node has nearly the same number of links
(mean standard deviation) - the number of nodes in a network can be
approximated by equation n kd, wheren
number of nodes, k average number of links for
each node and d network distance - thus network distance between any two nodes is
approximated by equation d log n / log k
9Random Graphs
- Many networks were modelled as random graphs
- social networks
- distributed systems (Internet, WWW)
- virus infection pathways etc.
- But the distribution was found to be inadequate
for describing real-world networks where the link
distribution between nodes is not equal
10Indications of Power-Law Structure in WWW
- Albert, Hawoong and Barabási found in 1999 that
World Wide Web links do not follow normal
distribution - There are hubs that gather many links and there
are many web pages that are only linked to few
pages - The power-law structure of the WWW was found
11Power-Law Distribution
- The number of web pages with exactly k incoming
links, denoted by N(k), follows N(k) k ?,
where the parameter ? is the degree exponent - For WWW incoming links the ? was found to be 2.1
- For outgoing links ? 2.5
- To illustrate the difference between bell curve
and power-law distribution lets compare them
using highway and airport maps
12Bell Curve and Power-Law Distributions
13Bell Curve and Power-Law Distributions
- It seems that power-law distribution has many
nodes with only few links and few nodes with many
links - This characteristic is also called as 80/20 rule
(for example 20 of customers bring 80 of net
sales)
14Scale-Free Model
- To understand how these kind of networks form and
how they behave a scale-free network model was
developed - Note that already in 1955 Simon described the
Matthew effect as a rule that a scientific
credit does not go to the person who proposes the
new results but to person who has most influence
in the network and in 1965 De Solla Price
interpreted this as a cumulative advantage
principle - Now scale-free model provided a tool for
analysing such behavior
15Scale-Free Model
- Scale-free link distribution follows power-law
- the proportion of nodes having a given number of
links n is P(n) 1 /n k - has no term related to the size of the network
(no characteristic scale as in random graph) - therefore the name scale-free
- most nodes have only few connections
- some have a lot of links
- important for binding disparate regions together
- guarantees short paths between nodes in the
network - guarantees multiple paths between any two node
16Scale-Free Model
- The model uses growth and preferential attachment
for generating the network - New node is connected to two nodes
- The two nodes are selected based on their number
of links with a probability - The nodes that are early in the network acquire
most of the neighbours (Rich gets richer
principle)
17Scale-Free Model
- First there are two nodes (A and B) and the third
one (C) will connect to them - Fourth one (D) connects with a probability of
(endpoints)/(all endpoints) 2/6 to an existing
node - Fifth one (E) connects with a probability of 3/10
to A and B and with 2/10 to C and D and so on
B
B
B
B
B
D
D
D
A
A
A
A
A
E
E
C
C
C
C
18Scale-Free Network
- Scale-free network of 50 nodes (1, 2, 3 and 4 are
the rich ones)
19Systems withPower-Law Properties
- Surprisingly many systems follow power-law, for
example - Internet (intra-domain routing and inter-domain
routing topologies) - World Wide Web
- Peer-to-Peer networks (Gnutella, Freenet)
- E-mail users
- Telephone call graphs
- Molecules and chemical reactions in living
organisms (H2O, ATP, ADP and CO2 molecules as
hubs)
20Internet
- The power-law characteristics of the Internet was
found in 1999 - The power-law topology applies both in the router
and autonomous system domain levels - Faloutsos M., Faloutsos P. and Faloutsos C., On
power-law relationships of the Internet
topology, Computer Communication Review
29(4)251-262 (1999)
21Internet
- Router-level and inter-domain level (autonomous
systems of Border Gateway routing protocol) - By knowing the structure of Internet the average
number of links for router-level ltkgt 3.5 and
inter-domain level ltkgt 2.6 could be estimated
as well as the diameter (for router-level d 9
and for autonomous systems d 4)
22Internet
- The degree of Internets autonomous systems
follows power-law with exponent -2.16 (and
router-level with exponent 2.48) - The number of nodes having degree k 1/k2.16
log(number of nodes)
-2.16
log(degree)
Orponen, 2005
23(No Transcript)
24Web
- Also because WWW hyperlinks follow directed
power-law network structure the diameter of 19
hops between any document could be calculated and
the average number of outgoing links estimated as
ltkgt 7 - Albert R., Hawoong J., Barabási A.-L., Diameter
of the World Wide Web, Nature 401130-131 (1999)
25Web
- The number of web pages with k outgoing links
1/k2.4 - The number of web pages with k web pages pointing
at them 1/k2.1
Orponen, 2005
26Web
- Note that because of directed nature of the web
links the system also has IN, OUT and Central
Core/Strongly Connected Components (SCC) - IN continent is hard to index for search engines
- Broder et al., 1999
27Web
28Web
- The network growth models (for example scale-free
model) explain well the average degree, degree
distribution and diameter - However, they do not yet explain the clustering
neighbor nodes are usually also neighbors to each
other - No simple model yet exists for explaining this
behavior
Orponen, 2005
29Systems and Their Degree Exponents
- Different kind of systems can be compared using
their degree exponent
30Other Characteristics
- In addition to being searchable (Milgrams
experiment) and having low network diameter
power-law structure has also fault-tolerance
benefits - For example resilient systems should be stable
even though a high number of connections are
broken - Useful property in distributed systems
31Fault-Tolerance
- Two types of failure scenarios
- random failures where a destroyed node is
selected randomly - targeted attack where the highest connected node
is selected for destruction - Scale-free networks have good resilience on
random failures, but are fragile under attacks
(system has an Achilles Heel)
32Fault-Tolerance
- In scale-free networks, the few numbered hubs are
in important role for communication - Albert et al. 2000 the average distance between
nodes in case of random failures and targeted
attack
Average distance between nodes
15
Targeted attacks
10
5
Random failures
2
1
Amount of removed nodes
Orponen, 2005
33Random Failuresand Attacks
- Random and scale-free networks under different
failure scenarios
y network diameter x fraction of
nodes destroyed
Source R. Albert, H. Jeong, A.-L. Barabasi,
Error and attack tolerance of complex networks
34Random Failuresand Attacks
- Random network has a critical point (fc) under
both failure scenarios - Scale-free only under attacks
y1 average size of the isolated
clusters ltsgt y2 relative size of
the largest cluster to all nodes
S x fraction of nodes destroyed
y1
y1
y2
y2
Source R. Albert, H. Jeong, A.-L. Barabasi,
Error and attack tolerance of complex networks,
2000
35Fault-Tolerance Internet and WWW
- Internet and WWW failures follow the same pattern
as scale-free networks
y network diameter x fraction of
nodes destroyed
Source R. Albert, H. Jeong, A.-L. Barabasi,
Error and attack tolerance of complex networks
36Summary
- Many phenomenas have been identified to follow a
power-law curve - Power-laws are common in distributed systems
(result of a natural processes growth and
preferential attachment) - Work is going on to develop algorithms, which
utilize these properties - Reference Barabási, A.-L., Linked The New
Science of Networks, Perseus Publishing, 2002