The Probability Mechanics of Social Networks - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

The Probability Mechanics of Social Networks

Description:

Center for Computational Analysis of Social and Organizational Systems ... Degrees of nodes (Newman, Scott, Wasserman and Faust) ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 25
Provided by: PatrickW156
Category:

less

Transcript and Presenter's Notes

Title: The Probability Mechanics of Social Networks


1
The Probability Mechanics of Social Networks
  • Ian McCulloh
  • Carnegie Mellon University
  • Pittsburgh, PA 15213
  • Joshua Lospinoso
  • United States Military Academy
  • West Point, NY 10996

2
Agenda
  • Background
  • Motivation
  • Probability Spaces
  • Complications with Social Networks
  • Previous work
  • Network Probability Matrices
  • Statistical Distributions
  • Applications
  • References

3
Background
  • Two broad areas of modeling deterministic and
    stochastic.
  • Differential Equations
  • Predator prey systems
  • Regression Analysis
  • Economics
  • Design Analysis of Experiments
  • Social Network Analysis (SNA) should be
    interested in what regression analysis ignores
    and differential equations assume away.

4
Motivation
  • Regression analysis enjoys a century of study and
    rigorous research.
  • SNA community needs the same rigorous set of
    probability mechanics.
  • Enable application of statistical techniques
  • Error analysis
  • Design of efficient sociological experiments
  • There are approaches currently being refined to
    answer this call, and all rely on assumptions
    about underlying entity and edge behaviors.
  • We wish to provide a framework flexible to a wide
    array of these initial assumptions.

5
What is a probability space?
  • Introduced by famous statistician Andrey
    Kolmogorov, probability space is the foundation
    of all probability theory
  • A probability space contains sample outcomes, an
    outcome space, and an associated probability
  • (SampleOutcomes,OutcomeSpace,AssocProb)
  • Consider the simple example of two flips of a
    coin
  • Assume the two flips are independent and the coin
    either lands head or tails.
  • Possible outcomes (read outcome space F) for two
    flips are as follows, where H is heads, T is
    tails, and outcomes are of the form
    Trial1,Trial2
  • H,H H,T T,H T,T
  • There are four elements (read sample outcomes
    Omega) in the outcome space (F), so the
    associated probability (P) of each outcome is ¼
    .25

6
Probability Spaces and Social Network Analysis
  • At its core, a social network is stochastically
    arrayed, for human behavior governs their
    construction and maintenance.
  • If we accept the notion that there is a
    probability p that two nodes form an edge (given
    sufficient conditions), adjacency matrices must
    have probability spaces.
  • These probability spaces must be explored to
    truly understand and research dynamic networks.

7
Definitions
  • The following terms depend largely on the domain
    and convenience
  • Vertices (aka nodes, entities)
  • Edges (relationships)
  • Adjacency matrix a data structure which holds
    edge information
  • Weighted, directed

8
Complications
  • Simulation can sometimes exhaustively explore the
    probability space of a system.
  • Unfortunately, graphs get extremely large very
    quickly.
  • n of nodes and e of possible edges
  • Fixing n and e, of possible graphs (network
    structures) is
  • And if we only fix n, of possible graphs is
    given by
  • For example, a network of 30 nodes has 7.87 x
    10261 configurations.

9
Previous Work with Random Graphsand Social
Networks
  • Based on assumptions about node and edge
    behavior, researchers have postulated about how
    dynamic networks array themselves (termed random
    graphs)
  • Degrees of nodes (Newman, Scott, Wasserman and
    Faust)
  • If each p in an adjacency matrix is equal, then
    the degree of each node as well as network size
    follow a binomial distribution which
    asymptotically approaches a Poisson distribution.
  • Empirical work shows that the equal p assumption
    is too strong for many applications
  • Scale free graphs (Yule-Simon distribution) such
    as the internet
  • McCulloh et. al. 2007
  • Small world, six degrees of separation (Travers
    et. al. 1969)
  • Watts and Strogratz (1998) propose clustering
    coefficient to explore Scale Free
  • Translation consider the neighborhood of a node
    i (consists of each node, or neighbor, directly
    connected to i). The clustering coefficient of
    node i is the ratio of connections among its
    neighborhood to the total number of possible
    connections in its neighborhood

10
Previous Work with Probability Spacesand Social
Networks
  • Albert and Barabasi (2002)
  • Using varied datasets, Albert and Barabasi show
    that empirical social networks have higher
    clustering coefficients than random networks of
    equal dimensions.
  • Many empirical social networks follow a power-law
    statistical distribution (as measured by node
    degree)
  • The verdict is still out is demonstrating that
    node degree is distributed as a power-law
    distribution sufficient to apply scale-free
    properties?
  • Instead of analyzing the degree distribution, we
    propose that it may be advantageous to estimate
    the stochastic process that dynamically generates
    degree over time.

11
Statistical Distributionsand Social Networks
  • We introduce the idea of a Network Probability
    Matrix (NPM), which describes a network of size
    N15

12
Statistical Distributionsand Social Networks
  • To illustrate how our original NPM may be applied
    to the scale-free question
  • This NPM models three groups which interact
    within neighborhood with an 80 probability and
    outside neighborhood 20.
  • The clustering coefficient is .463 for this
    graph, compared to a clustering coefficient of
    .329 for an NPM with equal probabilities
    (according to a Monte-Carlo simulation of the
    previous NPM).
  • The conclusion should be intuitive and obvious.
    Instead of starting with a theoretical NPM, we
    would like to generate NPMs from real world data
    to draw similar conclusions.
  • So what do Network Probability Matrices (NPMs)
    look like in real dynamic networks?

13
Distribution Theory and Social Networks
  • We can use statistical distributions to generate
    adjacency matrices over time.
  • One source of variation in SNA experiments is
    that researchers must choose how to define edges
    in an interaction matrix.
  • McCulloh et. al. (2007) studied email traffic to
    analyze shifts in network structure.
  • In order to define edges, the researchers had to
    decide on what blocks of time to analyze.
  • We looked at the email data from this experiment
    and fitted well known statistical distributions
    using parameter estimation techniques to each
    directed edge.
  • All of the arrival times were log-normally
    distributed

14
Distribution Theory and Social Networks
  • The probability that an email is sent from i to j
    within some period of time t is
  • (p, as a function of t, is a CDF f is the PDF
    that best fits cell ij in an NPM)

15
Distribution Theory and Social Networks
  • The probability that two emails are sent from i
    to j within some period of time t is

16
Distribution Theory and Social Networks
  • The probability that x emails are sent from i to
    j within some period of time t is

17
From NPM to Adjacency
  • An adjacency matrix is normally treated as a
    structure of scalar values.
  • It is imperative to understand that an adjacency
    matrix is a function of many elements, including
    definitional considerations (weighted, directed)
    and time.
  • If we accept the notion of an NPM, the adjacency
    matrix is a structure of random variables.
  • Analogy you may remember from stochastic
    processes that if arrival times are distributed
    exponentially, then the amount of arrivals in a
    given interval is distributed Poisson.
  • The NPM can be regarded as the exponential
    distribution and the adjacency matrix as the
    Poisson in this case.

18
From NPM to Adjacency
  • Analytically, an adjacency matrix can be derived
    and would require future research of varying
    complexity.
  • Simulations are much more practical for applied
    work and can be constructed for very specific
    applications. We are developing an extension of
    this for a variety of applications (sampling
    distributions for network measures, network
    perturbations, etc.)

19
Why does it matter to you?
  • Understanding the probability space of the
    adjacency matrix and how it relates to the
    definition of an edge and aids both applied and
    theoretical research
  • Hypothesis testing/confidence intervals on
    network measures
  • Mitigation of time chunking considerations by
    researchers.
  • Organizational Simulations
  • Error analysis (in measurement)
  • A flexible framework for analytics based on
    myriad initial assumptions

20
Future work
  • We are currently refining an algorithmic approach
    to determining the sampling distributions of
    network measures given an NPM we are interested
    in a practical approach to create sampling
    distributions of network measures over time.
  • An application of this techique to McCullohs
    IkeNET data is undergoing final revision
  • Developing the closed form relationships between
    NPMs and adjacency matrices could provide a
    platform for exploring random graphs.

21
Back Ups
  • Works Cited
  • Albert, R. and Barabasi, A. (2002) Statistical
    Mechanics of Complex Networks. Reviews of Modern
    Physics, 74 47-97.
  • Albert, R. and Barabasi, A. (1999) Emergence of
    Scaling in Random Networks. Science,
    286509-512.
  • Barabasi, A. (2003) Linked How Everything is
    Connected to Everything Else and What It Means
    for Business, Science, and Everyday Life. Plume,
    New York. ISBN 0-452-28439-2
  • Barabási, A. (2003) Scale-Free Networks.
    Scientific American, 28860-69.
  • Dorogovtsev, S.N. and Mendes, J.F.F. (2003).
    Evolution of Networks from biological networks
    to the Internet and WWW, Oxford University Press.
    ISBN 0-19-851590-1
  • Dorogovtsev, S.N. and Mendes, J.F.F. and
    Samukhin, A.N., (2000) "Structure of Growing
    Networks Exact Solution of the
    Barabási--Albert's Model", Physical Review
    Letters, 85, 4633
  • Erdos, P., and Rényi, A. (1960) On the Evolution
    of Random Graphs. Mathematical Institute of the
    Hungarian Academy of Science. 5, 17-61.
  • Faloutsos, M., Faloutsos, P. and Faloutsos, C.
    (1999) On power-law relationships of the
    internet topology Computer Communication Review,
    29, 251.
  • Guare, J. (1990) Six Degrees of Separation A
    Play (Vintage Books, New York).
  • Milgram, S. (1967) The small world problem.
    Psychology Today, 2, 6067.
  • Newman, M. (2005) The Mathematics of Complex
    Networks. Unpublished paper.
  • Newman, M. (2003) The Structure and Function of
    Complex Networks. SIAM Review, 45(2) 167-256.
  • Travers, Jeffrey Stanley Milgram. (1969) "An
    Experimental Study of the Small World Problem."
    Sociometry, 32, 4 425-443.
  • Watts, D.J. and Strogatz, S.H. (1998) Collective
    dynamics of small-world networks. Nature,
    393(6684) 440-2.

22
Statistical Distributions
  • Statistical distributions of stochastic processes
    are rarely known in practice, and often assumed
    to be normal
  • Statistical distributions can be estimated by
    using a variety of techniques (Maximum Likelihood
    Estimation, Least Squares Estimation, Method of
    Moments)
  • If we know the distributions associated with a
    stochastic process, a whole world of statistical
    tools is made available.

23
Statistical Distributions
  • For a random variable X
  • A probability density function f(X) defines the
    relative probability that X takes on a certain
    value.
  • A cumulative density function F(X) defines the
    probability (from 0 to 1) that X takes on a value
    less than or equal to a certain value.

24
Distribution Fitting
  • Maximum Likelihood Estimation
  • Maximizes the likelihood function (the geometric
    sum of the probability density function of the
    empirical data and a given set of parameters)
  • Least squares estimation
  • Minimizes the sum of squared error between the
    empirical density function and a cumulative
    density function of a given set of parameters
  • Method of Moments Estimation
  • Fits a set of moment equations (number equal to
    the number of parameters required) by solving the
    set for the parameters and substituting sample
    statistics for the moments.
Write a Comment
User Comments (0)
About PowerShow.com