Network Science: - PowerPoint PPT Presentation

About This Presentation
Title:

Network Science:

Description:

often a constant independent of network size (like 6... They can generate networks of any size ... results in a probability distribution over networks of size N ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 37
Provided by: CIS4
Category:
Tags: network | science | size

less

Transcript and Presenter's Notes

Title: Network Science:


1
Network ScienceUniversal Structure and Models
of Formation
  • Networked Life
  • CIS 112
  • Spring 2009
  • Prof. Michael Kearns

2
Natural Networks and Universality
  • Consider the many kinds of networks we have
    examined
  • social, technological, business, economic,
    content,
  • These networks tend to share certain informal
    properties
  • large scale continual growth
  • distributed, organic growth vertices decide
    who to link to
  • interaction (largely) restricted to links
  • mixture of local and long-distance connections
  • abstract notions of distance geographical,
    content, social,
  • Do natural networks share more quantitative
    universals?
  • What would these universals be?
  • How can we make them precise and measure them?
  • How can we explain their universality?
  • This is the domain of network science

3
Some Interesting Quantities
  • Connected components
  • how many, and how large?
  • Network diameter
  • the small-world phenomenon
  • Clustering
  • to what extent do links tend to cluster
    locally?
  • what is the balance between local and
    long-distance connections?
  • what roles do the two types of links play?
  • Degree distribution
  • what is the typical degree in the network?
  • what is the overall distribution?
  • Etc. etc. etc.

4
A Canonical Natural Network has
  • Few connected components
  • often only 1 or a small number (compared to
    network size)
  • Small diameter
  • often a constant independent of network size
    (like 6)
  • or perhaps growing only very slowly with network
    size
  • typically look at average exclude infinite
    distances
  • A high degree of edge clustering
  • considerably more so than for a random network
  • in tension with small diameter
  • A heavy-tailed degree distribution
  • a small but reliable number of high-degree
    vertices
  • quantifies Gladwells connectors
  • often of power law form

5
Some Models of Network Formation
  • Random graphs (Erdos-Renyi model)
  • gives few components and small diameter
  • does not give high clustering or heavy-tailed
    degree distributions
  • is the mathematically most well-studied and
    understood model
  • Watts-Strogatz and related models
  • give few components, small diameter and high
    clustering
  • does not give heavy-tailed degree distributions
  • Preferential attachment
  • gives few components, small diameter and
    heavy-tailed distribution
  • does not give high clustering
  • Hierarchical networks
  • few components, small diameter, high clustering,
    heavy-tailed
  • Affiliation networks
  • models group-actor formation
  • Nothing magic about any of the measures or
    models

6
Combining and Formalizing Familiar Ideas
  • Explaining universal behavior through statistical
    models
  • our models will always generate many networks
  • almost all of them will share certain properties
    (universals)
  • Explaining tipping through incremental growth
  • we gradually add edges, or gradually increase
    edge probability p
  • many properties will emerge very suddenly during
    this process

prob. NW connected
number of edges
7
Approximate Roadmap
  • Examine a series of models of network formation
  • macroscopic properties they do and do not entail
  • tipping behavior during network formation
  • pros and cons of each model
  • Examine some real life case studies
  • Study some dynamics issues (e.g.
    seach/navigation)
  • Move on to an in-depth study of the web as
    network

8
Models of Network Formationand Their Properties
9
Probabilistic Models of Networks
  • Network formation models we will study are
    probabilistic or statistical
  • later in the course economic formation models
  • They can generate networks of any size
  • we will typically ask what happens when N is very
    large or N ? infinity
  • They often have various parameters that can be
    set/chosen
  • size of network generated
  • probability of an edge being present or absent
  • fraction of long-distance vs. local connections
  • etc. etc. etc.
  • The models each generate a distribution over
    networks
  • Statements are always statistical in nature
  • with high probability, diameter is small
  • on average, degree distribution has heavy tail

10
Optional Background on Probability and
StatisticsNext three slides.
11
Probability and Random Variables
  • A random variable X is simply a variable that
    probabilistically assumes values in some set
  • set of possible values sometimes called the
    sample space S of X
  • sample space may be small and simple, or large
    and complex
  • S Heads, Tails X is outcome of a coin flip
  • S 0,1,,U.S. population size X is number
    voting democratic
  • S all networks of size N X is generated by
    Erdos-Renyi
  • Behavior of X determined by its distribution (or
    density)
  • for each specific value x in S, specify PrX x
  • these probabilities sum to exactly 1 (mutually
    exclusive outcomes)
  • complex sample spaces (such as large networks)
  • distribution often defined implicitly by simpler
    components
  • might specify the probability that each edge
    appears independently
  • this induces a probability distribution over
    networks
  • may be difficult to compute induced distribution

12
Some Basic Notions and Laws
  • Independence
  • let X and Y be random variables
  • independence for any x and y, PrXx Yy
    PrXxPrYy
  • intuition value of X does not influence value
    of Y, and vice-versa
  • dependence
  • e.g. X, Y coin flips, but Y is always opposite of
    X
  • Expected (mean) value of X
  • only makes sense for numeric random variables
  • average value of X according to its
    distribution
  • formally, EX S (PrX x x), sum is over
    all x in S
  • often denoted by m
  • always true EX Y EX EY
  • for independent random variables EXY
    EXEY
  • Variance of X
  • Var(X) E(X m)2 often denoted by s2
  • standard deviation is sqrt(Var(X)) s

13
Convergence to Expectations
  • Let X1, X2,, Xn be
  • independent random variables
  • with the same distribution PrXx
  • expectation m EX and variance s2
  • independent and identically distributed (i.i.d.)
  • essentially n repeated trials of the same
    experiment
  • natural to examine r.v. Z (1/n) S Xi, where sum
    is over i1,,n
  • example number of heads in a sequence of coin
    flips
  • example degree of a vertex in the random graph
    model
  • EZ EX what can we say about the
    distribution of Z?
  • Central Limit Theorem
  • as n becomes large, Z becomes normally
    distributed
  • with expectation m and variance s2/n
  • heres a demo

14
The Erdos-Renyi Model
15
The Erdos-Renyi (E-R) Model(Random Networks)
  • A model in which all edges
  • are equally probable and appear independently
  • Two parameters NW size N gt 1 and edge
    probability p
  • each edge (u,v) appears with probability p, is
    absent with probability 1-p
  • N(N-1)/2 independent trials of a biased coin flip
  • results in a probability distribution over
    networks of size N
  • especially easy to generate networks from this
    distribution
  • About the simplest (dumbest?) imaginable
    formation model
  • The usual regime of interest is when p 1/N, N
    is large
  • e.g. p 1/2N, p 1/N, p 2/N, p150/N, p
    log(N)/N, etc.
  • in expectation, each vertex will have a small
    number of neighbors ( pN)
  • Gladwells Magic Number 150 and cognitive
    bounds on degree
  • mathematical interest just near the boundary of
    connectivity
  • will then examine what happens when N ? infinity
  • can thus study properties of large networks with
    bounded degree
  • Degree distribution of a typical E-R network G
  • draw G according to E-R with N, p look at a
    random vertex u in G
  • what is Prdeg(u) k for any fixed k? (or
    histogram of degrees)
  • Poisson distribution with mean l p(N-1) pN

16
The Poisson Distribution
  • The Poisson distribution
  • often used to model counts of events
  • number of phone calls placed in a given time
    period
  • number of times a neuron fires in a given time
    period
  • single free parameter l
  • probability of exactly x events
  • exp(-l) lx/x!
  • mean and variance are both l
  • here are some examples again compare to heavy
    tails
  • similar to a normal (bell-shaped) distribution,
    but only takes on positive, integer values

17
Another Version of Erdos-Renyi
  • In Erdos-Renyi
  • expected number of edges in the network
    pN(N-1)/2 m
  • actual number of edges will be extremely close
    to m
  • so suppose we instead of fixing p, we fix the
    number of edges m
  • Incremental Erdos-Renyi model
  • start with N vertices and no edges
  • at each time step, add a new edge, up to m edges
    total
  • choose new edge randomly from among all missing
    edges
  • Allows study of the evolution or emergence of
    properties
  • as the number of edges m grows (in relation to N)
  • equivalently, as p is increased (in relation to
    N)
  • again, lets look at Erdos-Renyi giant component
    demo
  • For our purposes, these models are equivalent
    under pN(N-1)/2 m

18
The Evolution of a Random Network
  • We have a large number N of vertices
  • We start randomly adding edges one at a time (or
    increasing p)
  • At what point will the network
  • have at least one large connected component?
  • have a single connected component?
  • have small diameter?
  • have a large clique?
  • How gradually or suddenly do these properties
    appear?

19
Monotone Network Properties
  • Often interested in monotone network properties
  • suppose G has the property (e.g. G is connected)
  • now add edges to G to obtain G
  • then G must have the property also (e.g. G is
    connected)
  • Examples
  • G is connected
  • G has diameter lt d (not exactly d)
  • G has a clique of size gt k (not exactly k)
  • Interesting/nontrivial monotone properties
  • G has no edges ? G does not have the property
  • G has all edges (complete) ? G has the property
  • so we know as p goes from 0 or 1, property
    emerges

20
Formalizing Tippingfor Monotone Properties
  • Consider the standard Erdos-Renyi model
  • each edge appears with probability p, absent with
    probability 1-p
  • Pick a monotone property P of networks (e.g.
    being connected)
  • Say that P has a tipping point at q if
  • when p lt q, probability network obeys P is 0
  • when p gt q probability network obeys P is 1
  • Aside to math weenies
  • formalize by asking that probabilities converge
    to 0 or 1 as N ? infinity
  • Incremental E-R version
  • replace q by tipping number of edges
  • A purely structural definition of tipping
  • tipping results from incremental increase in
    connectivity
  • No obvious reason any given property should tip

21
So Which Properties Tip?
  • The following properties all have tipping points
  • having a giant component
  • being connected
  • having small diameter
  • in fact
  • 1996 All monotone network properties have
    tipping points!
  • So at least in one setting, tipping is the rule,
    not the exception
  • Demo look at the following progression
  • giant component ? connectivity ? small diameter
  • in Incremental Erdos-Renyi model (add one new
    edge at a time)
  • with remarkable consistency (N 50)
  • giant component 40 edges, connected 100,
    small diameter 180
  • Number of possible edges N(N-1)/2 1225
  • example 1 example 2 example 3 example 4
    example 5

22
More Precise
  • Connected component of size gt N/2
  • tipping point p 1/N
  • note full connectivity virtually impossible
  • Fully connected
  • tipping point is p log(N)/N
  • NW remains extremely sparse only log(N) edges
    per vertex
  • Small diameter
  • tipping point is p 2/sqrt(N) for diameter 2
  • fraction of possible edges still 2/sqrt(N) ? 0
  • generates very small worlds
  • Upshot right around/beyond p 1/N, lots
    suddenly happens

23
Other Tipping Points
  • Perfect matchings
  • consider only even N
  • tipping point is p log(N)/N
  • same as for connectivity!
  • Cliques
  • k-clique tipping point is p 1/N(2/k-1)
  • edges appear immediately triangles at N/2 etc.

24
Erdos-Renyi Summary
  • A model in which all connections are equally
    likely
  • each of the N(N-1)/2 edges chosen randomly
    independently
  • As we add edges, a precise sequence of events
    unfolds
  • network acquires a giant component
  • network becomes connected
  • network acquires small diameter
  • etc. etc. etc.
  • Properties appear very suddenly (tipping,
    thresholds)
  • and this is the rule, not the exception!
  • All statements are mathematically precise
  • All happen shortly around/after edge density p
    1/N
  • very efficient use of edges!
  • But is this how natural networks form?
  • If not, which aspects are unrealistic?
  • maybe all edges are not equally likely

25
The Clustering Coefficient of a Network
  • Let nbr(u) denote the set of neighbors of u in a
    network
  • all vertices v such that the edge (u,v) is in the
    graph
  • The clustering coefficient of u
  • let k nbr(u) (i.e., number of neighbors of u
    degree of u)
  • choose(k,2) max possible of edges between
    vertices in nbr(u)
  • c(u) (actual of edges between vertices in
    nbr(u))/choose(k,2)
  • 0 lt c(u) lt 1 measure of cliquishness of us
    neighborhood
  • Clustering coefficient of a graph
  • average of c(u) over all vertices u

u
k 4 choose(k,2) 6 c(u) 4/6 0.666
26
Erdos-Renyi Clustering Coefficient
  • Generate a network G according to Erdos Renyi
    with N, p
  • Examine a typical vertex u in G
  • choose u at random among all vertices in G
  • what do we expect c(u) to be?
  • Answer exactly p!
  • In E-R, typical c(u) entirely determined by
    overall density
  • Baseline for comparison with more clustered
    models
  • Erdos-Renyi has no bias towards clustered or
    local edges
  • Clustering coefficient meaningless in isolation
  • Must compare to the background rate of
    connectivity

27
Caveman and Solaria
  • Erdos-Renyi
  • sharing a common neighbor makes two vertices no
    more likely to be directly connected than two
    very distant vertices
  • every edge appears entirely independently of
    existing structure
  • But in many settings, the opposite is true
  • you tend to meet new friends through your old
    friends
  • two web pages pointing to a third might share a
    topic
  • two companies selling goods to a third are in
    related industries
  • Watts Caveman world
  • overall density of edges is low
  • but two vertices with a common neighbor are
    likely connected
  • Watts Solaria world
  • overall density of edges low no special bias
    towards local edges
  • like Erdos-Renyi

28
Making it More Precise the a-model
  • An incremental formation model
  • Pick network size N
  • Throw down a few random seed edges
  • Then for each pair of vertices u and v
  • compute probability of adding edge between u and
    v
  • probability will depend on current network
    structure
  • the more common neighbors u and v have, more
    likely to add edge
  • provide knobs that let us adjust how weak/strong
    the effect is

29
Making it More Precise the a-model
smaller a
a 1
p (1-p)(x/N)a
larger a
30
(No Transcript)
31
Small Worlds and Occams Razor
  • For small a, should generate large clustering
    coefficients
  • after all, we programmed the model to do so!
  • But we do not want a new model for every little
    property
  • Erdos-Renyi ? small diameter
  • a-model ? high clustering coefficient
  • etc. etc. etc.
  • In the interests of Occams Razor, we would like
    to find
  • a single, simple model of network generation
  • that simultaneously captures many properties
  • Watts small world small diameter and high
    clustering
  • here is a figure showing that this can be
    captured in the a-model

32
An Alternative Model
  • The a-model programmed high clustering into the
    formation process
  • and then we got small diamter for free (at
    certain a)
  • A different model
  • start with all vertices arranged on a ring or
    cycle
  • connect each vertex to all others that are within
    k steps
  • with probability p, rewire each local connection
    to a random vertex
  • Initial cyclical structure models local or
    geographic connectivity
  • Long-distance rewiring models long-distance
    connectivity
  • p0 high clustering, high diameter
  • p1 low clustering, low diameter (E-R)
  • In between look at this simulation
  • Which of these models do you prefer?
  • sociology vs. math

33
Meanwhile, Back in the Real World
  • Watts examines three real networks as case
    studies
  • the Kevin Bacon graph
  • the Western states power grid
  • the C. elegans nervous system
  • For each of these networks, he
  • computes its size, diameter, and clustering
    coefficient
  • compares diameter and clustering to best
    Erdos-Renyi approx.
  • shows that the best a-model approximation is
    better
  • important to be fair to each model by finding
    best fit
  • Overall moral
  • if we care only about diameter and clustering, a
    is better than E-R

34
(No Transcript)
35
Case 1 Kevin Bacon Graph
  • Vertices actors and actresses
  • Edge between u and v if they appeared in a film
    together
  • Here is the data

36
Case 2 Western States Power Grid
  • Vertices power stations in Western U.S.
  • Edges high-voltage power transmission lines
  • Here is the network and data

37
Case 3 C. Elegans Nervous System
  • Vertices neurons in the C. elegans worm
  • Edges axons/synapses between neurons
  • Here is the network and data

38
Two More Examples
  • M. Newman on scientific collaboration networks
  • coauthorship networks in several distinct
    communities
  • differences in degrees (papers per author)
  • empirical verification of
  • giant components
  • small diameter (mean distance)
  • high clustering coefficient
  • Alberich et al. on the Marvel Universe
  • purely fictional social network
  • two characters linked if they appeared together
    in an issue
  • empirical verification of
  • heavy-tailed distribution of degrees (issues and
    characters)
  • giant component
  • rather small clustering coefficient
Write a Comment
User Comments (0)
About PowerShow.com