The Probability Mechanics of Social Networks - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

The Probability Mechanics of Social Networks

Description:

Center for Computational Analysis of Social and Organizational Systems ... Degrees of nodes (Newman, Scott, Wasserman and Faust) ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 25

Provided by: PatrickW156

Category:

more less

Transcript and Presenter's Notes

Title: The Probability Mechanics of Social Networks

1
The Probability Mechanics of Social Networks

Ian McCulloh
Carnegie Mellon University
Pittsburgh, PA 15213
Joshua Lospinoso
United States Military Academy
West Point, NY 10996

2
Agenda

Background
Motivation
Probability Spaces
Complications with Social Networks
Previous work
Network Probability Matrices
Statistical Distributions
Applications
References

3
Background

Two broad areas of modeling deterministic and
stochastic.
Differential Equations
Predator prey systems
Regression Analysis
Economics
Design Analysis of Experiments
Social Network Analysis (SNA) should be
interested in what regression analysis ignores
and differential equations assume away.

4
Motivation

Regression analysis enjoys a century of study and
rigorous research.
SNA community needs the same rigorous set of
probability mechanics.
Enable application of statistical techniques
Error analysis
Design of efficient sociological experiments
There are approaches currently being refined to
answer this call, and all rely on assumptions
about underlying entity and edge behaviors.
We wish to provide a framework flexible to a wide
array of these initial assumptions.

5
What is a probability space?

Introduced by famous statistician Andrey
Kolmogorov, probability space is the foundation
of all probability theory
A probability space contains sample outcomes, an
outcome space, and an associated probability
(SampleOutcomes,OutcomeSpace,AssocProb)
Consider the simple example of two flips of a
coin
Assume the two flips are independent and the coin
either lands head or tails.
Possible outcomes (read outcome space F) for two
flips are as follows, where H is heads, T is
tails, and outcomes are of the form
Trial1,Trial2
H,H H,T T,H T,T
There are four elements (read sample outcomes
Omega) in the outcome space (F), so the
associated probability (P) of each outcome is ¼
.25

6
Probability Spaces and Social Network Analysis

At its core, a social network is stochastically
arrayed, for human behavior governs their
construction and maintenance.
If we accept the notion that there is a
probability p that two nodes form an edge (given
sufficient conditions), adjacency matrices must
have probability spaces.
These probability spaces must be explored to
truly understand and research dynamic networks.

7
Definitions

The following terms depend largely on the domain
and convenience
Vertices (aka nodes, entities)
Edges (relationships)
Adjacency matrix a data structure which holds
edge information
Weighted, directed

8
Complications

Simulation can sometimes exhaustively explore the
probability space of a system.
Unfortunately, graphs get extremely large very
quickly.
n of nodes and e of possible edges
Fixing n and e, of possible graphs (network
structures) is
And if we only fix n, of possible graphs is
given by
For example, a network of 30 nodes has 7.87 x
10261 configurations.

9
Previous Work with Random Graphsand Social
Networks

Based on assumptions about node and edge
behavior, researchers have postulated about how
dynamic networks array themselves (termed random
graphs)
Degrees of nodes (Newman, Scott, Wasserman and
Faust)
If each p in an adjacency matrix is equal, then
the degree of each node as well as network size
follow a binomial distribution which
asymptotically approaches a Poisson distribution.
Empirical work shows that the equal p assumption
is too strong for many applications
Scale free graphs (Yule-Simon distribution) such
as the internet
McCulloh et. al. 2007
Small world, six degrees of separation (Travers
et. al. 1969)
Watts and Strogratz (1998) propose clustering
coefficient to explore Scale Free
Translation consider the neighborhood of a node
i (consists of each node, or neighbor, directly
connected to i). The clustering coefficient of
node i is the ratio of connections among its
neighborhood to the total number of possible
connections in its neighborhood

10
Previous Work with Probability Spacesand Social
Networks

Albert and Barabasi (2002)
Using varied datasets, Albert and Barabasi show
that empirical social networks have higher
clustering coefficients than random networks of
equal dimensions.
Many empirical social networks follow a power-law
statistical distribution (as measured by node
degree)
The verdict is still out is demonstrating that
node degree is distributed as a power-law
distribution sufficient to apply scale-free
properties?
Instead of analyzing the degree distribution, we
propose that it may be advantageous to estimate
the stochastic process that dynamically generates
degree over time.

11
Statistical Distributionsand Social Networks

We introduce the idea of a Network Probability
Matrix (NPM), which describes a network of size
N15

12
Statistical Distributionsand Social Networks

To illustrate how our original NPM may be applied
to the scale-free question
This NPM models three groups which interact
within neighborhood with an 80 probability and
outside neighborhood 20.
The clustering coefficient is .463 for this
graph, compared to a clustering coefficient of
.329 for an NPM with equal probabilities
(according to a Monte-Carlo simulation of the
previous NPM).
The conclusion should be intuitive and obvious.
Instead of starting with a theoretical NPM, we
would like to generate NPMs from real world data
to draw similar conclusions.
So what do Network Probability Matrices (NPMs)
look like in real dynamic networks?

13
Distribution Theory and Social Networks

We can use statistical distributions to generate
adjacency matrices over time.
One source of variation in SNA experiments is
that researchers must choose how to define edges
in an interaction matrix.
McCulloh et. al. (2007) studied email traffic to
analyze shifts in network structure.
In order to define edges, the researchers had to
decide on what blocks of time to analyze.
We looked at the email data from this experiment
and fitted well known statistical distributions
using parameter estimation techniques to each
directed edge.
All of the arrival times were log-normally
distributed

14
Distribution Theory and Social Networks

The probability that an email is sent from i to j
within some period of time t is
(p, as a function of t, is a CDF f is the PDF
that best fits cell ij in an NPM)

15
Distribution Theory and Social Networks

The probability that two emails are sent from i
to j within some period of time t is

16
Distribution Theory and Social Networks

The probability that x emails are sent from i to
j within some period of time t is

17
From NPM to Adjacency

An adjacency matrix is normally treated as a
structure of scalar values.
It is imperative to understand that an adjacency
matrix is a function of many elements, including
definitional considerations (weighted, directed)
and time.
If we accept the notion of an NPM, the adjacency
matrix is a structure of random variables.
Analogy you may remember from stochastic
processes that if arrival times are distributed
exponentially, then the amount of arrivals in a
given interval is distributed Poisson.
The NPM can be regarded as the exponential
distribution and the adjacency matrix as the
Poisson in this case.

18
From NPM to Adjacency

Analytically, an adjacency matrix can be derived
and would require future research of varying
complexity.
Simulations are much more practical for applied
work and can be constructed for very specific
applications. We are developing an extension of
this for a variety of applications (sampling
distributions for network measures, network
perturbations, etc.)

19
Why does it matter to you?

Understanding the probability space of the
adjacency matrix and how it relates to the
definition of an edge and aids both applied and
theoretical research
Hypothesis testing/confidence intervals on
network measures
Mitigation of time chunking considerations by
researchers.
Organizational Simulations
Error analysis (in measurement)
A flexible framework for analytics based on
myriad initial assumptions

20
Future work

We are currently refining an algorithmic approach
to determining the sampling distributions of
network measures given an NPM we are interested
in a practical approach to create sampling
distributions of network measures over time.
An application of this techique to McCullohs
IkeNET data is undergoing final revision
Developing the closed form relationships between
NPMs and adjacency matrices could provide a
platform for exploring random graphs.

21
Back Ups

Works Cited
Albert, R. and Barabasi, A. (2002) Statistical
Mechanics of Complex Networks. Reviews of Modern
Physics, 74 47-97.
Albert, R. and Barabasi, A. (1999) Emergence of
Scaling in Random Networks. Science,
286509-512.
Barabasi, A. (2003) Linked How Everything is
Connected to Everything Else and What It Means
for Business, Science, and Everyday Life. Plume,
New York. ISBN 0-452-28439-2
Barabási, A. (2003) Scale-Free Networks.
Scientific American, 28860-69.
Dorogovtsev, S.N. and Mendes, J.F.F. (2003).
Evolution of Networks from biological networks
to the Internet and WWW, Oxford University Press.
ISBN 0-19-851590-1
Dorogovtsev, S.N. and Mendes, J.F.F. and
Samukhin, A.N., (2000) "Structure of Growing
Networks Exact Solution of the
Barabási--Albert's Model", Physical Review
Letters, 85, 4633
Erdos, P., and Rényi, A. (1960) On the Evolution
of Random Graphs. Mathematical Institute of the
Hungarian Academy of Science. 5, 17-61.
Faloutsos, M., Faloutsos, P. and Faloutsos, C.
(1999) On power-law relationships of the
internet topology Computer Communication Review,
29, 251.
Guare, J. (1990) Six Degrees of Separation A
Play (Vintage Books, New York).
Milgram, S. (1967) The small world problem.
Psychology Today, 2, 6067.
Newman, M. (2005) The Mathematics of Complex
Networks. Unpublished paper.
Newman, M. (2003) The Structure and Function of
Complex Networks. SIAM Review, 45(2) 167-256.
Travers, Jeffrey Stanley Milgram. (1969) "An
Experimental Study of the Small World Problem."
Sociometry, 32, 4 425-443.
Watts, D.J. and Strogatz, S.H. (1998) Collective
dynamics of small-world networks. Nature,
393(6684) 440-2.

22
Statistical Distributions

Statistical distributions of stochastic processes
are rarely known in practice, and often assumed
to be normal
Statistical distributions can be estimated by
using a variety of techniques (Maximum Likelihood
Estimation, Least Squares Estimation, Method of
Moments)
If we know the distributions associated with a
stochastic process, a whole world of statistical
tools is made available.

23
Statistical Distributions

For a random variable X
A probability density function f(X) defines the
relative probability that X takes on a certain
value.
A cumulative density function F(X) defines the
probability (from 0 to 1) that X takes on a value
less than or equal to a certain value.

24
Distribution Fitting

Maximum Likelihood Estimation
Maximizes the likelihood function (the geometric
sum of the probability density function of the
empirical data and a given set of parameters)
Least squares estimation
Minimizes the sum of squared error between the
empirical density function and a cumulative
density function of a given set of parameters
Method of Moments Estimation
Fits a set of moment equations (number equal to
the number of parameters required) by solving the
set for the parameters and substituting sample
statistics for the moments.