Title: Network Science:
1Network ScienceUniversal Structure and Models
of Formation
- Networked Life
- CIS 112
- Spring 2009
- Prof. Michael Kearns
2Natural Networks and Universality
- Consider the many kinds of networks we have
examined - social, technological, business, economic,
content, - These networks tend to share certain informal
properties - large scale continual growth
- distributed, organic growth vertices decide
who to link to - interaction (largely) restricted to links
- mixture of local and long-distance connections
- abstract notions of distance geographical,
content, social, - Do natural networks share more quantitative
universals? - What would these universals be?
- How can we make them precise and measure them?
- How can we explain their universality?
- This is the domain of network science
3Some Interesting Quantities
- Connected components
- how many, and how large?
- Network diameter
- the small-world phenomenon
- Clustering
- to what extent do links tend to cluster
locally? - what is the balance between local and
long-distance connections? - what roles do the two types of links play?
- Degree distribution
- what is the typical degree in the network?
- what is the overall distribution?
- Etc. etc. etc.
4A Canonical Natural Network has
- Few connected components
- often only 1 or a small number (compared to
network size) - Small diameter
- often a constant independent of network size
(like 6) - or perhaps growing only very slowly with network
size - typically look at average exclude infinite
distances - A high degree of edge clustering
- considerably more so than for a random network
- in tension with small diameter
- A heavy-tailed degree distribution
- a small but reliable number of high-degree
vertices - quantifies Gladwells connectors
- often of power law form
5Some Models of Network Formation
- Random graphs (Erdos-Renyi model)
- gives few components and small diameter
- does not give high clustering or heavy-tailed
degree distributions - is the mathematically most well-studied and
understood model - Watts-Strogatz and related models
- give few components, small diameter and high
clustering - does not give heavy-tailed degree distributions
- Preferential attachment
- gives few components, small diameter and
heavy-tailed distribution - does not give high clustering
- Hierarchical networks
- few components, small diameter, high clustering,
heavy-tailed - Affiliation networks
- models group-actor formation
- Nothing magic about any of the measures or
models
6Combining and Formalizing Familiar Ideas
- Explaining universal behavior through statistical
models - our models will always generate many networks
- almost all of them will share certain properties
(universals) - Explaining tipping through incremental growth
- we gradually add edges, or gradually increase
edge probability p - many properties will emerge very suddenly during
this process
prob. NW connected
number of edges
7Approximate Roadmap
- Examine a series of models of network formation
- macroscopic properties they do and do not entail
- tipping behavior during network formation
- pros and cons of each model
- Examine some real life case studies
- Study some dynamics issues (e.g.
seach/navigation) - Move on to an in-depth study of the web as
network
8Models of Network Formationand Their Properties
9Probabilistic Models of Networks
- Network formation models we will study are
probabilistic or statistical - later in the course economic formation models
- They can generate networks of any size
- we will typically ask what happens when N is very
large or N ? infinity - They often have various parameters that can be
set/chosen - size of network generated
- probability of an edge being present or absent
- fraction of long-distance vs. local connections
- etc. etc. etc.
- The models each generate a distribution over
networks - Statements are always statistical in nature
- with high probability, diameter is small
- on average, degree distribution has heavy tail
10Optional Background on Probability and
StatisticsNext three slides.
11Probability and Random Variables
- A random variable X is simply a variable that
probabilistically assumes values in some set - set of possible values sometimes called the
sample space S of X - sample space may be small and simple, or large
and complex - S Heads, Tails X is outcome of a coin flip
- S 0,1,,U.S. population size X is number
voting democratic - S all networks of size N X is generated by
Erdos-Renyi - Behavior of X determined by its distribution (or
density) - for each specific value x in S, specify PrX x
- these probabilities sum to exactly 1 (mutually
exclusive outcomes) - complex sample spaces (such as large networks)
- distribution often defined implicitly by simpler
components - might specify the probability that each edge
appears independently - this induces a probability distribution over
networks - may be difficult to compute induced distribution
12Some Basic Notions and Laws
- Independence
- let X and Y be random variables
- independence for any x and y, PrXx Yy
PrXxPrYy - intuition value of X does not influence value
of Y, and vice-versa - dependence
- e.g. X, Y coin flips, but Y is always opposite of
X - Expected (mean) value of X
- only makes sense for numeric random variables
- average value of X according to its
distribution - formally, EX S (PrX x x), sum is over
all x in S - often denoted by m
- always true EX Y EX EY
- for independent random variables EXY
EXEY - Variance of X
- Var(X) E(X m)2 often denoted by s2
- standard deviation is sqrt(Var(X)) s
13Convergence to Expectations
- Let X1, X2,, Xn be
- independent random variables
- with the same distribution PrXx
- expectation m EX and variance s2
- independent and identically distributed (i.i.d.)
- essentially n repeated trials of the same
experiment - natural to examine r.v. Z (1/n) S Xi, where sum
is over i1,,n - example number of heads in a sequence of coin
flips - example degree of a vertex in the random graph
model - EZ EX what can we say about the
distribution of Z? - Central Limit Theorem
- as n becomes large, Z becomes normally
distributed - with expectation m and variance s2/n
- heres a demo
14The Erdos-Renyi Model
15The Erdos-Renyi (E-R) Model(Random Networks)
- A model in which all edges
- are equally probable and appear independently
- Two parameters NW size N gt 1 and edge
probability p - each edge (u,v) appears with probability p, is
absent with probability 1-p - N(N-1)/2 independent trials of a biased coin flip
- results in a probability distribution over
networks of size N - especially easy to generate networks from this
distribution - About the simplest (dumbest?) imaginable
formation model - The usual regime of interest is when p 1/N, N
is large - e.g. p 1/2N, p 1/N, p 2/N, p150/N, p
log(N)/N, etc. - in expectation, each vertex will have a small
number of neighbors ( pN) - Gladwells Magic Number 150 and cognitive
bounds on degree - mathematical interest just near the boundary of
connectivity - will then examine what happens when N ? infinity
- can thus study properties of large networks with
bounded degree - Degree distribution of a typical E-R network G
- draw G according to E-R with N, p look at a
random vertex u in G - what is Prdeg(u) k for any fixed k? (or
histogram of degrees) - Poisson distribution with mean l p(N-1) pN
16The Poisson Distribution
- The Poisson distribution
- often used to model counts of events
- number of phone calls placed in a given time
period - number of times a neuron fires in a given time
period - single free parameter l
- probability of exactly x events
- exp(-l) lx/x!
- mean and variance are both l
- here are some examples again compare to heavy
tails - similar to a normal (bell-shaped) distribution,
but only takes on positive, integer values
17Another Version of Erdos-Renyi
- In Erdos-Renyi
- expected number of edges in the network
pN(N-1)/2 m - actual number of edges will be extremely close
to m - so suppose we instead of fixing p, we fix the
number of edges m - Incremental Erdos-Renyi model
- start with N vertices and no edges
- at each time step, add a new edge, up to m edges
total - choose new edge randomly from among all missing
edges - Allows study of the evolution or emergence of
properties - as the number of edges m grows (in relation to N)
- equivalently, as p is increased (in relation to
N) - again, lets look at Erdos-Renyi giant component
demo - For our purposes, these models are equivalent
under pN(N-1)/2 m
18The Evolution of a Random Network
- We have a large number N of vertices
- We start randomly adding edges one at a time (or
increasing p) - At what point will the network
- have at least one large connected component?
- have a single connected component?
- have small diameter?
- have a large clique?
- How gradually or suddenly do these properties
appear?
19Monotone Network Properties
- Often interested in monotone network properties
- suppose G has the property (e.g. G is connected)
- now add edges to G to obtain G
- then G must have the property also (e.g. G is
connected) - Examples
- G is connected
- G has diameter lt d (not exactly d)
- G has a clique of size gt k (not exactly k)
- Interesting/nontrivial monotone properties
- G has no edges ? G does not have the property
- G has all edges (complete) ? G has the property
- so we know as p goes from 0 or 1, property
emerges
20Formalizing Tippingfor Monotone Properties
- Consider the standard Erdos-Renyi model
- each edge appears with probability p, absent with
probability 1-p - Pick a monotone property P of networks (e.g.
being connected) - Say that P has a tipping point at q if
- when p lt q, probability network obeys P is 0
- when p gt q probability network obeys P is 1
- Aside to math weenies
- formalize by asking that probabilities converge
to 0 or 1 as N ? infinity - Incremental E-R version
- replace q by tipping number of edges
- A purely structural definition of tipping
- tipping results from incremental increase in
connectivity - No obvious reason any given property should tip
21So Which Properties Tip?
- The following properties all have tipping points
- having a giant component
- being connected
- having small diameter
- in fact
- 1996 All monotone network properties have
tipping points! - So at least in one setting, tipping is the rule,
not the exception - Demo look at the following progression
- giant component ? connectivity ? small diameter
- in Incremental Erdos-Renyi model (add one new
edge at a time) - with remarkable consistency (N 50)
- giant component 40 edges, connected 100,
small diameter 180 - Number of possible edges N(N-1)/2 1225
- example 1 example 2 example 3 example 4
example 5
22More Precise
- Connected component of size gt N/2
- tipping point p 1/N
- note full connectivity virtually impossible
- Fully connected
- tipping point is p log(N)/N
- NW remains extremely sparse only log(N) edges
per vertex - Small diameter
- tipping point is p 2/sqrt(N) for diameter 2
- fraction of possible edges still 2/sqrt(N) ? 0
- generates very small worlds
- Upshot right around/beyond p 1/N, lots
suddenly happens
23Other Tipping Points
- Perfect matchings
- consider only even N
- tipping point is p log(N)/N
- same as for connectivity!
- Cliques
- k-clique tipping point is p 1/N(2/k-1)
- edges appear immediately triangles at N/2 etc.
24Erdos-Renyi Summary
- A model in which all connections are equally
likely - each of the N(N-1)/2 edges chosen randomly
independently - As we add edges, a precise sequence of events
unfolds - network acquires a giant component
- network becomes connected
- network acquires small diameter
- etc. etc. etc.
- Properties appear very suddenly (tipping,
thresholds) - and this is the rule, not the exception!
- All statements are mathematically precise
- All happen shortly around/after edge density p
1/N - very efficient use of edges!
- But is this how natural networks form?
- If not, which aspects are unrealistic?
- maybe all edges are not equally likely
25The Clustering Coefficient of a Network
- Let nbr(u) denote the set of neighbors of u in a
network - all vertices v such that the edge (u,v) is in the
graph - The clustering coefficient of u
- let k nbr(u) (i.e., number of neighbors of u
degree of u) - choose(k,2) max possible of edges between
vertices in nbr(u) - c(u) (actual of edges between vertices in
nbr(u))/choose(k,2) - 0 lt c(u) lt 1 measure of cliquishness of us
neighborhood - Clustering coefficient of a graph
- average of c(u) over all vertices u
u
k 4 choose(k,2) 6 c(u) 4/6 0.666
26Erdos-Renyi Clustering Coefficient
- Generate a network G according to Erdos Renyi
with N, p - Examine a typical vertex u in G
- choose u at random among all vertices in G
- what do we expect c(u) to be?
- Answer exactly p!
- In E-R, typical c(u) entirely determined by
overall density - Baseline for comparison with more clustered
models - Erdos-Renyi has no bias towards clustered or
local edges - Clustering coefficient meaningless in isolation
- Must compare to the background rate of
connectivity
27Caveman and Solaria
- Erdos-Renyi
- sharing a common neighbor makes two vertices no
more likely to be directly connected than two
very distant vertices - every edge appears entirely independently of
existing structure - But in many settings, the opposite is true
- you tend to meet new friends through your old
friends - two web pages pointing to a third might share a
topic - two companies selling goods to a third are in
related industries - Watts Caveman world
- overall density of edges is low
- but two vertices with a common neighbor are
likely connected - Watts Solaria world
- overall density of edges low no special bias
towards local edges - like Erdos-Renyi
28Making it More Precise the a-model
- An incremental formation model
- Pick network size N
- Throw down a few random seed edges
- Then for each pair of vertices u and v
- compute probability of adding edge between u and
v - probability will depend on current network
structure - the more common neighbors u and v have, more
likely to add edge - provide knobs that let us adjust how weak/strong
the effect is
29Making it More Precise the a-model
smaller a
a 1
p (1-p)(x/N)a
larger a
30(No Transcript)
31Small Worlds and Occams Razor
- For small a, should generate large clustering
coefficients - after all, we programmed the model to do so!
- But we do not want a new model for every little
property - Erdos-Renyi ? small diameter
- a-model ? high clustering coefficient
- etc. etc. etc.
- In the interests of Occams Razor, we would like
to find - a single, simple model of network generation
- that simultaneously captures many properties
- Watts small world small diameter and high
clustering - here is a figure showing that this can be
captured in the a-model
32An Alternative Model
- The a-model programmed high clustering into the
formation process - and then we got small diamter for free (at
certain a) - A different model
- start with all vertices arranged on a ring or
cycle - connect each vertex to all others that are within
k steps - with probability p, rewire each local connection
to a random vertex - Initial cyclical structure models local or
geographic connectivity - Long-distance rewiring models long-distance
connectivity - p0 high clustering, high diameter
- p1 low clustering, low diameter (E-R)
- In between look at this simulation
- Which of these models do you prefer?
- sociology vs. math
33Meanwhile, Back in the Real World
- Watts examines three real networks as case
studies - the Kevin Bacon graph
- the Western states power grid
- the C. elegans nervous system
- For each of these networks, he
- computes its size, diameter, and clustering
coefficient - compares diameter and clustering to best
Erdos-Renyi approx. - shows that the best a-model approximation is
better - important to be fair to each model by finding
best fit - Overall moral
- if we care only about diameter and clustering, a
is better than E-R
34(No Transcript)
35Case 1 Kevin Bacon Graph
- Vertices actors and actresses
- Edge between u and v if they appeared in a film
together - Here is the data
36Case 2 Western States Power Grid
- Vertices power stations in Western U.S.
- Edges high-voltage power transmission lines
- Here is the network and data
37Case 3 C. Elegans Nervous System
- Vertices neurons in the C. elegans worm
- Edges axons/synapses between neurons
- Here is the network and data
38Two More Examples
- M. Newman on scientific collaboration networks
- coauthorship networks in several distinct
communities - differences in degrees (papers per author)
- empirical verification of
- giant components
- small diameter (mean distance)
- high clustering coefficient
- Alberich et al. on the Marvel Universe
- purely fictional social network
- two characters linked if they appeared together
in an issue - empirical verification of
- heavy-tailed distribution of degrees (issues and
characters) - giant component
- rather small clustering coefficient