Title: Introduction to ERGMp model
1Introduction to ERGM/p model
- Kayo Fujimoto, Ph.D.
- Based on presentation slides by Nosh Contractor
and Mengxiao Zhu
2Four parts of ERGM
- Observed network data
- Network statistics (or counts) of each
configuration - ERG Modeling
- Conditional probability and Change statistics
- Estimation and Simulation
- Estimate Parameters by Simulation
- Method MCMC ML estimation
- Goodness of fit test (convergence t-test)
- Compare observed and simulated graphs
- Recent development in ERGM
- New model specification
3Exponential Random Graph Model(ERGM)
- ERGMs take the form of a probability distribution
of graphs - Y is a set of tie indicator variables Y
- y is a realization, the observed network
- g(y) is a vector of network statistics
- ? is a parameter vector corresponding to g(y)
- k(?) is a normalizing factor calculated by
summing up - exp?g(y) over all possible network
configurations
4Observed network
- Graph statistics (or counts) of each configuration
5Network Statistics Examplesfor Undirected
Networks
Example
Edge 6 2-Star 1316011 3-Star
010405 4-Star 1 Triangle 2
b
c
a
d
e
6A Simple Example of ERGM
7A Simple ERG model
- Predict network using edge count
- ? can take different values
- ? 0, ? -0.69, ? 0.69
- L(y) can the following values
- L(y) 0, L(y) 1, L(y) 2, L(y) 3
8Example 1 ? 0, L0
? 0
9Example 1 ? 0, L1
? 0
10Example 1 ? 0, L2
? 0
11Example 1 ? 0, L3
? 0
12Example 1 ? 0
? 0
13Example 2 ? -0.69
? -0.69
14Example 3 ? 0.69
?0.69
? 0.69
15Why Change Statistics?
16ERG modeling
- Conditional Probability and Change Statistics
17Conditional Probability vs. Total Probability
- Total probability of the whole network
- It is impossible to calculate when the
size of the network gets large - Introduce the Conditional Probability of edges
- Reduce sample space
18Avoid the Calculation on Sample Space
- Conditional Probability of an Edge to exist
- Conditional Probability of an Edge to be absent
is - Logit p model model log odds ratio of Yij exists
19Change Statistics (logit p model)
- From the end of last slide, we have
- Define Change Statistics as
- Model log odds of a tie being present to absent
20Estimation and Simulation
- (Monte Carlo Markov Chain Maximum Likelihood
Method)
21Review Maximum Likelihood Estimation (MLE)
- Likelihood functions
- Estimate parameter ? given the observed network.
- Maximum Likelihood Estimation
- Find ? values such that the observed statistics
are equal to the expected statistics - Approximate MLE by simulation
22Procedures for simulating ERG distribution
- Markov Chain Monte Carlo Maximum Likelihood
Estimation (MCMCMLE) - 1. Simulate a distribution of random graphs from
a starting set of parameter values - 2. Refine the parameter values by comparing the
distribution of graphs against the observed graph - 3. Repeat this process until the parameter
estimate stabilize
23Convergence T-statistics
- Test adequacy of parameter values estimated
- T-statistics for each configuration
- T lt.1? good fit
- NOTE If the parameter estimates do not converge,
the model is degenerate
24A Simple Example of MCMCMLE
- Model
- Observed Network y
- Goal Find ? value such that the observed number
of edges are equal to the expected number of
edges
25If ? can be chosen from the following 3 cases,
?-0.69 is preferred because it gives the highest
probability for the observed network
- Given the observed Network y
26Markov dependence (Frank and Strauss, 1986)
- Potential ties are dependent only if they share a
common actor - Two possible network ties are conditionally
independent unless they share a common actor - Once homogeneity assumption is imposed, we obtain
the following configurations
27Markov random graph models(non-directed networks)
Two-star(?2)
Density or edge(?)
Triangle(?)
Three-star(?3)
28Problems of degeneracy for Markov random models
- Certain parameter values place almost all of the
probability mass on either the empty or the full
graph - Simulation studies showed that Markov random
graph models are degenerate for many empirical
networks with high level of clustering - A few very high degree nodes
- Some regions of high triangulation
29Two possibilities for the degeneracy problem
(Snijders, et al 2006)
- Makov dependence assumption may be too
restrictive - The representation of transitivity by the total
number of triangles might be too simplistic - ? New specification of higher order network
dependency
30New development in ERGM
- Partial conditional dependence assumption and
- new model specification
31Partial conditional dependence(Social circuit
dependence)
- Two possible network ties being conditionally
dependent if their observation would lead to a
4-cycle
i
k
possible edges observed edges
j
l
32Partial conditional dependence(Example)
Daughter B
Daughter A
Father B
Father A
33Difference between the two types of dependence
assumptions
Markov dependence assumptions
Partial conditional dependence assumptions
i
k
k
i
j
l
j
l
potential tie ties which affect the formation
of the potential tie ties with no effect on the
potential tie
34New Specifications of ERGM
- Represent structural parameters similar to the
Markov parameters - Effects are incorporated within the one
configuration parameter - Three new statistics for non-directed network
- Alternating k-stars
- Alternating k-triangles
- Alternating independent two-paths
35Examples of new specifications
- Alternating k-star configuration (degree distn)
- Alternating k-triangle (tendency to form triads)
- Alternating k-two-path (tendency to form cycles)
36Interpretation of the parameter
- Positive alternating k-star parameter
- Networks with some higher degree nodes are highly
probable. ? Core-periphery structure - Positive alternating k-triangle parameter
- Triangulation in the network as well as
tendencies for triangles themselves group
together in larger higher order clump - Positive alternating k-path parameter
- Tendency for 4-cycles in the network
37Summary for model construction
- Random variables
- Each network tie (Yij) among nodes of a network
- A random tie variable Yij1 if a tie form i to j
exist, Yij0 otherwise - yij the observed value of the variable Yij
- Dependence assumptions
- Define contingencies among network variables
- Determine the type of parameters in the model
- Ties also depends on node-level attributes
(homophily) - Homogeneity assumption
- Simplify parameters by imposing homogeneity
constraints. - Estimation procedures
- Find the best parameter values based on the
observed network - Use simulation (MCMLE)
38Software for ERGM
- SIENA (Snijders, and colleagues)
- PNet (Robbins, and colleagues)
- Statnet (Butts, and colleagues)
39Reference
- Harrigan, Nicholas. Exponential Rnadom Graph
(ERG) models and their application to the study
of corporate elites. - Robins, Garry (manuscript). Exponential Random
Graph (p) models for social Networks, published
in Melnet website. - Robins, G., Pattison, P. Kalish, y. Lusher, D.
(2007). An introduction to exponential random
graph (p) models for social networks. Social
Networks, 29, 173-191. - Snijders, T.A.B., Pattison, P., Robins, G,
Hancock M. (2006). New specifications for
exponential random graph models. Sociological
Methodology, 36 99-153.
40Thank you for your attention