Network Statistics - PowerPoint PPT Presentation

About This Presentation

Network Statistics


Example: 6 degrees of separation? ... statistical significance of counts. Vertex degrees ... Normal approximation for joint distribution of some vertex degrees ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 17
Provided by: gesiner


Transcript and Presenter's Notes

Title: Network Statistics

Network Statistics
  • Gesine Reinert

Yeast protein interactions
Summary statistics
  • Vertex degree distribution (the degree of a
    vertex is the number of vertices connected with
    it via an edge)
  • Clustering coefficient the average proportion of
    neighbours of a vertex that are themselves
  • Shortest distance between two vertices - also
    average shortest distance, maximal distance,
    average of inverse distance (efficiency)
  • Betweenness of a vertex the number of shortest
    paths that go through a given vertex (similarly
    for edge)

Some examples for real networks(in averages)
Network size vertex degree shortest path Shortest path in fitted random graph Clustering Clustering in random graph
Film actors 225,226 61 3.65 2.99 0.79 0.00027
MEDLINE coauthorship 1,520,251 18.1 4.6 4.91 0.43 1.8 x 10-4
E.Coli substrate graph 282 7.35 2.9 3.04 0.32 0.026
C.Elegans 282 14 2.65 2.25 0.28 0.05
Underlying model assumptions
  • Network consisting of vertices and edges
  • Randomness in edges
  • Here assume edges undirected, no self-loops, no
    multiple edges

Main model 1 Random Graph
  • Bernoulli random graph (ErdösRenyi 1959, 1960)
  • L vertices, any two connected by an edge with
    probability p, independent of each other
  • need not be connected
  • Phase transition for edge probability p(L)
    (log L)/L the random graph becomes connected.

Main model 2 Watts-Strogatz Small World (1998)
L vertices, each connected are to m nearest
neighbours, in addition random links, each
probability p (originally, rewiring edges
instead of adding edges was proposed, but then
the resulting network need not be connected)
Main model 3 Scale-free network
  • Network growth models start with one vertex new
    vertex attaches to existing vertices by
    preferential attachment vertex tends choose
    vertex according to vertex degree
    (BarabasiAlbert 1999, Price 1965)

Watts-Strogatz Small World
  • Amenable to mathematical analysis
  • More realistic than random graphs
  • Shortest path length
  • Motif counts
  • Vertex degrees
  • Predicting links
  • Generalization hard-wired links only present
    with a certain probability

Shortest path length
  • Put ?2 (L-2m-1) p, where p is the probability of
    a shortcut
  • Approximation continuous model gives
  • Expected shortest path length is approximately
  • 1/? 1/2 log (L ?) 0.2886
  • ( distribution, Barbour R.)
  • In the discrete case, the distribution may be
    concentrated on one or two points.

Example 6 degrees of separation?
  • If the number of vertices is L200,000,000, and
    we observe l6, then we can estimate ? as
    approximately 1.54
  • This gives for L60,000,000 that the expected
    shortest path length is approximately 5.81
  • For L100,000 it gives approximately 3.73
  • For L6,500,000,000 it gives approximately 7.33

Motif counts
  • Triangles relate to clustering coefficient
  • Cycles biologically relevant
  • Distributions approximately compound Poisson
  • Can get joint distribution for cycle counts of
    different lengths (also using compound Poisson)
  • Goal assess statistical significance of counts

Vertex degrees
  • Random graph superimposed on hard-wired networks
  • Poisson approximation for number of vertices with
    degree at least k, say
  • Normal approximation for joint distribution of
    some vertex degrees
  • Goal assess scale-free appearance

Predicting links
  • Use Bayesian analysis and biochemical properties
    to predict which proteins might interact
  • Use H.pylori interactions to construct prior for
    E.coli interactions
  • Assess whether small-world structure if so, use
    parametric model

Statistical significance
  • Clustering coefficient, vertex degrees, shortest
    path length are not independent
  • Long-term goal joint distribution of summary
    statistics to assess whether networks are similar
    or not

  • Research students
  • Kaisheng Lin (motif counts, metabolic networks
    vertex degrees)
  • Pao-Yang Chen (protein interaction networks)
  • KimHuat Lim (epidemics on networks)
  • Collaborators
  • Andrew Barbour (shortest path length)
  • Charlotte Deane (protein interaction networks)
  • Susan Holmes (bottlenecks)
Write a Comment
User Comments (0)