PowerLaws in Distributed Systems - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

PowerLaws in Distributed Systems

Description:

Six Degrees of Separation ... to be known as Six Degrees of Separation ... Traditional graph theory has been based on normally distributed (also called as ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 37

Provided by: mikv

Category:

more less

Transcript and Presenter's Notes

Title: PowerLaws in Distributed Systems

1
Power-Laws inDistributed Systems

Mikko Vapa
mikko.vapa_at_jyu.fi
TIES427 Distributed Systems

2
Contents

Network models
Stanley Milgrams studies on social networks
Normal distribution and random graphs
Power-law distribution and scale-free graphs
Systems with power-law properties
Fault-tolerance characteristics of random and
scale-free distributed systems

3
Network Models Why?

Using network models it is possible to understand
and foresee the structure of a specific network
and behavior
How the distances between nodes grow when network
grows? Is it possible to keep the distance low in
a large network by adding links to strategic
points? Where?
How resilient is the network for failures of
nodes and links? How many nodes can be removed
from a connectednetwork until it breaks down to
isolated clusters?
What would be the best way to route messages from
one node to another, if the structure of the
network is known only locally?
Is it possible to identify from a network (for
example WWW) using only neighbor information such
nodes that have important content or belong to a
related group of nodes?
Orponen, P., Internet ja muut informaatioverkosto
t, Tieteen päivät 2005

4
Research Experiment on Social Networks

In 1967 Stanley Milgram conducted a social
experiment to find out what is the distance
between any two people in the United States
160 people around the states were selected as
starting points and 2 people as destinations for
letters (destination identified by name, photo
and address of the person)
The letters could only be passed from hand to
hand between acquaintances

5
Six Degrees of Separation

42 of the 160 letters arrived to the destination
and the median number of intermediate persons was
5.5
The result became to be known as Six Degrees of
Separation
Considering the amount of people in USA (about
100 million those days) the distance was
considered very short
Thus a saying Its a small world!
Milgram S., The Small World Problem, Psychology
Today 1(1), 60-67 (1967)

6
Six Degrees of Separation

The results of the study raised a fundamental
question
Why is the distance so short?
The question was left unanswered for years
In the meanwhile some progress was going on in
the graph theory

7
Normal Distribution

Traditional graph theory has been based on
normally distributed (also called as bell curve)
graphs Erdös Renyi, 1959 Solomonoff
Rapoport, 1951

Where µ mean s standard deviation
8
Random Graphs

Bell curve graphs can be generated by randomly
connecting links between nodes
If number of links in a graph follows bell curve
distribution then
each node has nearly the same number of links
(mean standard deviation)
the number of nodes in a network can be
approximated by equation n kd, wheren
number of nodes, k average number of links for
each node and d network distance
thus network distance between any two nodes is
approximated by equation d log n / log k

9
Random Graphs

Many networks were modelled as random graphs
social networks
distributed systems (Internet, WWW)
virus infection pathways etc.
But the distribution was found to be inadequate
for describing real-world networks where the link
distribution between nodes is not equal

10
Indications of Power-Law Structure in WWW

Albert, Hawoong and Barabási found in 1999 that
World Wide Web links do not follow normal
distribution
There are hubs that gather many links and there
are many web pages that are only linked to few
pages
The power-law structure of the WWW was found

11
Power-Law Distribution

The number of web pages with exactly k incoming
links, denoted by N(k), follows N(k) k ?,
where the parameter ? is the degree exponent
For WWW incoming links the ? was found to be 2.1
For outgoing links ? 2.5
To illustrate the difference between bell curve
and power-law distribution lets compare them
using highway and airport maps

12
Bell Curve and Power-Law Distributions
13
Bell Curve and Power-Law Distributions

It seems that power-law distribution has many
nodes with only few links and few nodes with many
links
This characteristic is also called as 80/20 rule
(for example 20 of customers bring 80 of net
sales)

14
Scale-Free Model

To understand how these kind of networks form and
how they behave a scale-free network model was
developed
Note that already in 1955 Simon described the
Matthew effect as a rule that a scientific
credit does not go to the person who proposes the
new results but to person who has most influence
in the network and in 1965 De Solla Price
interpreted this as a cumulative advantage
principle
Now scale-free model provided a tool for
analysing such behavior

15
Scale-Free Model

Scale-free link distribution follows power-law
the proportion of nodes having a given number of
links n is P(n) 1 /n k
has no term related to the size of the network
(no characteristic scale as in random graph)
therefore the name scale-free
most nodes have only few connections
some have a lot of links
important for binding disparate regions together
guarantees short paths between nodes in the
network
guarantees multiple paths between any two node

16
Scale-Free Model

The model uses growth and preferential attachment
for generating the network
New node is connected to two nodes
The two nodes are selected based on their number
of links with a probability
The nodes that are early in the network acquire
most of the neighbours (Rich gets richer
principle)

17
Scale-Free Model

First there are two nodes (A and B) and the third
one (C) will connect to them
Fourth one (D) connects with a probability of
(endpoints)/(all endpoints) 2/6 to an existing
node
Fifth one (E) connects with a probability of 3/10
to A and B and with 2/10 to C and D and so on

B
B
B
B
B
D
D
D
A
A
A
A
A
E
E
C
C
C
C
18
Scale-Free Network

Scale-free network of 50 nodes (1, 2, 3 and 4 are
the rich ones)

19
Systems withPower-Law Properties

Surprisingly many systems follow power-law, for
example
Internet (intra-domain routing and inter-domain
routing topologies)
World Wide Web
Peer-to-Peer networks (Gnutella, Freenet)
E-mail users
Telephone call graphs
Molecules and chemical reactions in living
organisms (H2O, ATP, ADP and CO2 molecules as
hubs)

20
Internet

The power-law characteristics of the Internet was
found in 1999
The power-law topology applies both in the router
and autonomous system domain levels
Faloutsos M., Faloutsos P. and Faloutsos C., On
power-law relationships of the Internet
topology, Computer Communication Review
29(4)251-262 (1999)

21
Internet

Router-level and inter-domain level (autonomous
systems of Border Gateway routing protocol)
By knowing the structure of Internet the average
number of links for router-level ltkgt 3.5 and
inter-domain level ltkgt 2.6 could be estimated
as well as the diameter (for router-level d 9
and for autonomous systems d 4)

22
Internet

The degree of Internets autonomous systems
follows power-law with exponent -2.16 (and
router-level with exponent 2.48)
The number of nodes having degree k 1/k2.16

log(number of nodes)
-2.16
log(degree)
Orponen, 2005
23
(No Transcript)
24
Web

Also because WWW hyperlinks follow directed
power-law network structure the diameter of 19
hops between any document could be calculated and
the average number of outgoing links estimated as
ltkgt 7
Albert R., Hawoong J., Barabási A.-L., Diameter
of the World Wide Web, Nature 401130-131 (1999)

25
Web

The number of web pages with k outgoing links
1/k2.4
The number of web pages with k web pages pointing
at them 1/k2.1

Orponen, 2005
26
Web

Note that because of directed nature of the web
links the system also has IN, OUT and Central
Core/Strongly Connected Components (SCC)
IN continent is hard to index for search engines
Broder et al., 1999

27
Web
28
Web

The network growth models (for example scale-free
model) explain well the average degree, degree
distribution and diameter
However, they do not yet explain the clustering
neighbor nodes are usually also neighbors to each
other
No simple model yet exists for explaining this
behavior

Orponen, 2005
29
Systems and Their Degree Exponents

Different kind of systems can be compared using
their degree exponent

30
Other Characteristics

In addition to being searchable (Milgrams
experiment) and having low network diameter
power-law structure has also fault-tolerance
benefits
For example resilient systems should be stable
even though a high number of connections are
broken
Useful property in distributed systems

31
Fault-Tolerance

Two types of failure scenarios
random failures where a destroyed node is
selected randomly
targeted attack where the highest connected node
is selected for destruction
Scale-free networks have good resilience on
random failures, but are fragile under attacks
(system has an Achilles Heel)

32
Fault-Tolerance

In scale-free networks, the few numbered hubs are
in important role for communication
Albert et al. 2000 the average distance between
nodes in case of random failures and targeted
attack

Average distance between nodes
15
Targeted attacks
10
5
Random failures
2
1
Amount of removed nodes
Orponen, 2005
33
Random Failuresand Attacks

Random and scale-free networks under different
failure scenarios

y network diameter x fraction of
nodes destroyed
Source R. Albert, H. Jeong, A.-L. Barabasi,
Error and attack tolerance of complex networks
34
Random Failuresand Attacks

Random network has a critical point (fc) under
both failure scenarios
Scale-free only under attacks

y1 average size of the isolated
clusters ltsgt y2 relative size of
the largest cluster to all nodes
S x fraction of nodes destroyed
y1
y1
y2
y2
Source R. Albert, H. Jeong, A.-L. Barabasi,
Error and attack tolerance of complex networks,
2000
35
Fault-Tolerance Internet and WWW

Internet and WWW failures follow the same pattern
as scale-free networks

y network diameter x fraction of
nodes destroyed
Source R. Albert, H. Jeong, A.-L. Barabasi,
Error and attack tolerance of complex networks
36
Summary

Many phenomenas have been identified to follow a
power-law curve
Power-laws are common in distributed systems
(result of a natural processes growth and
preferential attachment)
Work is going on to develop algorithms, which
utilize these properties
Reference Barabási, A.-L., Linked The New
Science of Networks, Perseus Publishing, 2002

Write a Comment

User Comments (0)