Network Science: - PowerPoint PPT Presentation

About This Presentation

Title:

Network Science:

Description:

often a constant independent of network size (like 6... They can generate networks of any size ... results in a probability distribution over networks of size N ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 37

Provided by: CIS4

Learn more at: https://www.cis.upenn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Network Science:

1
Network ScienceUniversal Structure and Models
of Formation

Networked Life
CIS 112
Spring 2009
Prof. Michael Kearns

2
Natural Networks and Universality

Consider the many kinds of networks we have
examined
social, technological, business, economic,
content,
These networks tend to share certain informal
properties
large scale continual growth
distributed, organic growth vertices decide
who to link to
interaction (largely) restricted to links
mixture of local and long-distance connections
abstract notions of distance geographical,
content, social,
Do natural networks share more quantitative
universals?
What would these universals be?
How can we make them precise and measure them?
How can we explain their universality?
This is the domain of network science

3
Some Interesting Quantities

Connected components
how many, and how large?
Network diameter
the small-world phenomenon
Clustering
to what extent do links tend to cluster
locally?
what is the balance between local and
long-distance connections?
what roles do the two types of links play?
Degree distribution
what is the typical degree in the network?
what is the overall distribution?
Etc. etc. etc.

4
A Canonical Natural Network has

Few connected components
often only 1 or a small number (compared to
network size)
Small diameter
often a constant independent of network size
(like 6)
or perhaps growing only very slowly with network
size
typically look at average exclude infinite
distances
A high degree of edge clustering
considerably more so than for a random network
in tension with small diameter
A heavy-tailed degree distribution
a small but reliable number of high-degree
vertices
quantifies Gladwells connectors
often of power law form

5
Some Models of Network Formation

Random graphs (Erdos-Renyi model)
gives few components and small diameter
does not give high clustering or heavy-tailed
degree distributions
is the mathematically most well-studied and
understood model
Watts-Strogatz and related models
give few components, small diameter and high
clustering
does not give heavy-tailed degree distributions
Preferential attachment
gives few components, small diameter and
heavy-tailed distribution
does not give high clustering
Hierarchical networks
few components, small diameter, high clustering,
heavy-tailed
Affiliation networks
models group-actor formation
Nothing magic about any of the measures or
models

6
Combining and Formalizing Familiar Ideas

Explaining universal behavior through statistical
models
our models will always generate many networks
almost all of them will share certain properties
(universals)
Explaining tipping through incremental growth
we gradually add edges, or gradually increase
edge probability p
many properties will emerge very suddenly during
this process

prob. NW connected
number of edges
7
Approximate Roadmap

Examine a series of models of network formation
macroscopic properties they do and do not entail
tipping behavior during network formation
pros and cons of each model
Examine some real life case studies
Study some dynamics issues (e.g.
seach/navigation)
Move on to an in-depth study of the web as
network

8
Models of Network Formationand Their Properties
9
Probabilistic Models of Networks

Network formation models we will study are
probabilistic or statistical
later in the course economic formation models
They can generate networks of any size
we will typically ask what happens when N is very
large or N ? infinity
They often have various parameters that can be
set/chosen
size of network generated
probability of an edge being present or absent
fraction of long-distance vs. local connections
etc. etc. etc.
The models each generate a distribution over
networks
Statements are always statistical in nature
with high probability, diameter is small
on average, degree distribution has heavy tail

10
Optional Background on Probability and
StatisticsNext three slides.
11
Probability and Random Variables

A random variable X is simply a variable that
probabilistically assumes values in some set
set of possible values sometimes called the
sample space S of X
sample space may be small and simple, or large
and complex
S Heads, Tails X is outcome of a coin flip
S 0,1,,U.S. population size X is number
voting democratic
S all networks of size N X is generated by
Erdos-Renyi
Behavior of X determined by its distribution (or
density)
for each specific value x in S, specify PrX x
these probabilities sum to exactly 1 (mutually
exclusive outcomes)
complex sample spaces (such as large networks)
distribution often defined implicitly by simpler
components
might specify the probability that each edge
appears independently
this induces a probability distribution over
networks
may be difficult to compute induced distribution

12
Some Basic Notions and Laws

Independence
let X and Y be random variables
independence for any x and y, PrXx Yy
PrXxPrYy
intuition value of X does not influence value
of Y, and vice-versa
dependence
e.g. X, Y coin flips, but Y is always opposite of
X
Expected (mean) value of X
only makes sense for numeric random variables
average value of X according to its
distribution
formally, EX S (PrX x x), sum is over
all x in S
often denoted by m
always true EX Y EX EY
for independent random variables EXY
EXEY
Variance of X
Var(X) E(X m)2 often denoted by s2
standard deviation is sqrt(Var(X)) s

13
Convergence to Expectations

Let X1, X2,, Xn be
independent random variables
with the same distribution PrXx
expectation m EX and variance s2
independent and identically distributed (i.i.d.)
essentially n repeated trials of the same
experiment
natural to examine r.v. Z (1/n) S Xi, where sum
is over i1,,n
example number of heads in a sequence of coin
flips
example degree of a vertex in the random graph
model
EZ EX what can we say about the
distribution of Z?
Central Limit Theorem
as n becomes large, Z becomes normally
distributed
with expectation m and variance s2/n
heres a demo

14
The Erdos-Renyi Model
15
The Erdos-Renyi (E-R) Model(Random Networks)

A model in which all edges
are equally probable and appear independently
Two parameters NW size N gt 1 and edge
probability p
each edge (u,v) appears with probability p, is
absent with probability 1-p
N(N-1)/2 independent trials of a biased coin flip
results in a probability distribution over
networks of size N
especially easy to generate networks from this
distribution
About the simplest (dumbest?) imaginable
formation model
The usual regime of interest is when p 1/N, N
is large
e.g. p 1/2N, p 1/N, p 2/N, p150/N, p
log(N)/N, etc.
in expectation, each vertex will have a small
number of neighbors ( pN)
Gladwells Magic Number 150 and cognitive
bounds on degree
mathematical interest just near the boundary of
connectivity
will then examine what happens when N ? infinity
can thus study properties of large networks with
bounded degree
Degree distribution of a typical E-R network G
draw G according to E-R with N, p look at a
random vertex u in G
what is Prdeg(u) k for any fixed k? (or
histogram of degrees)
Poisson distribution with mean l p(N-1) pN

16
The Poisson Distribution

The Poisson distribution
often used to model counts of events
number of phone calls placed in a given time
period
number of times a neuron fires in a given time
period
single free parameter l
probability of exactly x events
exp(-l) lx/x!
mean and variance are both l
here are some examples again compare to heavy
tails
similar to a normal (bell-shaped) distribution,
but only takes on positive, integer values

17
Another Version of Erdos-Renyi

In Erdos-Renyi
expected number of edges in the network
pN(N-1)/2 m
actual number of edges will be extremely close
to m
so suppose we instead of fixing p, we fix the
number of edges m
Incremental Erdos-Renyi model
start with N vertices and no edges
at each time step, add a new edge, up to m edges
total
choose new edge randomly from among all missing
edges
Allows study of the evolution or emergence of
properties
as the number of edges m grows (in relation to N)
equivalently, as p is increased (in relation to
N)
again, lets look at Erdos-Renyi giant component
demo
For our purposes, these models are equivalent
under pN(N-1)/2 m

18
The Evolution of a Random Network

We have a large number N of vertices
We start randomly adding edges one at a time (or
increasing p)
At what point will the network
have at least one large connected component?
have a single connected component?
have small diameter?
have a large clique?
How gradually or suddenly do these properties
appear?

19
Monotone Network Properties

Often interested in monotone network properties
suppose G has the property (e.g. G is connected)
now add edges to G to obtain G
then G must have the property also (e.g. G is
connected)
Examples
G is connected
G has diameter lt d (not exactly d)
G has a clique of size gt k (not exactly k)
Interesting/nontrivial monotone properties
G has no edges ? G does not have the property
G has all edges (complete) ? G has the property
so we know as p goes from 0 or 1, property
emerges

20
Formalizing Tippingfor Monotone Properties

Consider the standard Erdos-Renyi model
each edge appears with probability p, absent with
probability 1-p
Pick a monotone property P of networks (e.g.
being connected)
Say that P has a tipping point at q if
when p lt q, probability network obeys P is 0
when p gt q probability network obeys P is 1
Aside to math weenies
formalize by asking that probabilities converge
to 0 or 1 as N ? infinity
Incremental E-R version
replace q by tipping number of edges
A purely structural definition of tipping
tipping results from incremental increase in
connectivity
No obvious reason any given property should tip

21
So Which Properties Tip?

The following properties all have tipping points
having a giant component
being connected
having small diameter
in fact
1996 All monotone network properties have
tipping points!
So at least in one setting, tipping is the rule,
not the exception
Demo look at the following progression
giant component ? connectivity ? small diameter
in Incremental Erdos-Renyi model (add one new
edge at a time)
with remarkable consistency (N 50)
giant component 40 edges, connected 100,
small diameter 180
Number of possible edges N(N-1)/2 1225
example 1 example 2 example 3 example 4
example 5

22
More Precise

Connected component of size gt N/2
tipping point p 1/N
note full connectivity virtually impossible
Fully connected
tipping point is p log(N)/N
NW remains extremely sparse only log(N) edges
per vertex
Small diameter
tipping point is p 2/sqrt(N) for diameter 2
fraction of possible edges still 2/sqrt(N) ? 0
generates very small worlds
Upshot right around/beyond p 1/N, lots
suddenly happens

23
Other Tipping Points

Perfect matchings
consider only even N
tipping point is p log(N)/N
same as for connectivity!
Cliques
k-clique tipping point is p 1/N(2/k-1)
edges appear immediately triangles at N/2 etc.

24
Erdos-Renyi Summary

A model in which all connections are equally
likely
each of the N(N-1)/2 edges chosen randomly
independently
As we add edges, a precise sequence of events
unfolds
network acquires a giant component
network becomes connected
network acquires small diameter
etc. etc. etc.
Properties appear very suddenly (tipping,
thresholds)
and this is the rule, not the exception!
All statements are mathematically precise
All happen shortly around/after edge density p
1/N
very efficient use of edges!
But is this how natural networks form?
If not, which aspects are unrealistic?
maybe all edges are not equally likely

25
The Clustering Coefficient of a Network

Let nbr(u) denote the set of neighbors of u in a
network
all vertices v such that the edge (u,v) is in the
graph
The clustering coefficient of u
let k nbr(u) (i.e., number of neighbors of u
degree of u)
choose(k,2) max possible of edges between
vertices in nbr(u)
c(u) (actual of edges between vertices in
nbr(u))/choose(k,2)
0 lt c(u) lt 1 measure of cliquishness of us
neighborhood
Clustering coefficient of a graph
average of c(u) over all vertices u

u
k 4 choose(k,2) 6 c(u) 4/6 0.666
26
Erdos-Renyi Clustering Coefficient

Generate a network G according to Erdos Renyi
with N, p
Examine a typical vertex u in G
choose u at random among all vertices in G
what do we expect c(u) to be?
Answer exactly p!
In E-R, typical c(u) entirely determined by
overall density
Baseline for comparison with more clustered
models
Erdos-Renyi has no bias towards clustered or
local edges
Clustering coefficient meaningless in isolation
Must compare to the background rate of
connectivity

27
Caveman and Solaria

Erdos-Renyi
sharing a common neighbor makes two vertices no
more likely to be directly connected than two
very distant vertices
every edge appears entirely independently of
existing structure
But in many settings, the opposite is true
you tend to meet new friends through your old
friends
two web pages pointing to a third might share a
topic
two companies selling goods to a third are in
related industries
Watts Caveman world
overall density of edges is low
but two vertices with a common neighbor are
likely connected
Watts Solaria world
overall density of edges low no special bias
towards local edges
like Erdos-Renyi