Statistical%20Analysis%20of%20Social%20Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical%20Analysis%20of%20Social%20Networks

Description:

Assumes that 'a researcher is interested in some descriptive statistic ... 5 CASTELLAN 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0. 6 GINORI 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 70
Provided by: sociologyr
Learn more at: https://people.duke.edu
Category:

less

Transcript and Presenter's Notes

Title: Statistical%20Analysis%20of%20Social%20Networks


1
Statistical Analysis of Social Networks
  1. From description to Inference Confidence
    intervals for measures
  2. QAP models
  3. Networks as independent variables
  4. Networks as dependent variables
  5. P and Markov Chain Monte Carlo (MCMC)

2
Statistical Analysis of Social Networks
Confidence Intervals Bootstraps and
Jackknifes (Snijders Borgatti, 1999)
Goal Useful to have an indication of how
precise a given description is, particularly when
making comparisons between groups. Assumes that
a researcher is interested in some descriptive
statistic and wishes to have a standard error
for this descriptive statistic without making
implausibly strong assumptions about how the
network came about.
3
Confidence Intervals Bootstraps and
Jackknifes (Snijders Borgatti, 1999)
Jackknifes.
Given a dataset w. N sample elements, N
artificial datasets are created by deleting each
sample element in turn from the observed
dataset. In standard practice, the formula for
the standard error is then
4
Jackknifes Example on regular data
  • Obs i x s1 s2 s3 s4 s5 s6
    s7 s8 s9 s10
  • 1 1 0.85 . 0.85 0.85 0.85 0.85 0.85
    0.85 0.85 0.85 0.85
  • 2 2 0.70 0.70 . 0.70 0.70 0.70 0.70
    0.70 0.70 0.70 0.70
  • 3 3 1.00 1.00 1.00 . 1.00 1.00 1.00
    1.00 1.00 1.00 1.00
  • 4 4 0.59 0.59 0.59 0.59 . 0.59 0.59
    0.59 0.59 0.59 0.59
  • 5 5 0.22 0.22 0.22 0.22 0.22 . 0.22
    0.22 0.22 0.22 0.22
  • 6 6 0.69 0.69 0.69 0.69 0.69 0.69 .
    0.69 0.69 0.69 0.69
  • 7 7 0.43 0.43 0.43 0.43 0.43 0.43 0.43
    . 0.43 0.43 0.43
  • 8 8 0.32 0.32 0.32 0.32 0.32 0.32 0.32
    0.32 . 0.32 0.32
  • 9 9 0.50 0.50 0.50 0.50 0.50 0.50 0.50
    0.50 0.50 . 0.50
  • 10 0.67 0.67 0.67 0.67 0.67 0.67 0.67
    0.67 0.67 0.67 .
  • MEAN 0.60 0.57 0.58 0.55 0.60 0.64 0.59
    0.61 0.63 0.61 0.59

5
Jackknifes Example on regular data
SEj 0.0753 SE 0.0753
6
Jackknifes For networks
For networks,we need to adjust the scaling
parameter
Where Z-i is the network statistic calculated
without vertex i, and Z- is the average of Z-1
Z-N. This procedure will work for any network
statistic Z, and UCINET will use it to test
differences in network density.
7
Jackknifes For networks
An example based on the Trade data. Density,
Std. Errors and confidence intervals for each
matrix.
DIP_DEN DIP_SEJ DIP_UB DIP_LB 0.6684783
0.0636125 0.7931588 0.5437978 CRUDE_DEN
CRUDE_SEJ CRUDE_UB CRUDE_LB 0.5561594 0.0676669
0.6887866 0.4235323 FOOD_DEN FOOD_SEJ
FOOD_UB FOOD_LB 0.5561594 0.0633776 0.6803794
0.4319394 MAN_DEN MAN_SEJ MAN_UB
MAN_LB 0.5615942 0.0724143 0.7035263 0.4196621
MIN_DEN MIN_SEJ MIN_UB MIN_LB 0.2445652
0.0530224 0.3484891 0.1406414
8
Bootstrap
In general, bootstrap techniques effectively
treat the given sample as the population, then
draw samples, with replacement, from the observed
distribution. For networks, we draw random
samples of the vertices, creating a new network Y
If i(k) i(h), then randomly fill in the dyads
based from the set of all possible dyads (I.e.
fill in this cell with a random draw from the
population).
9
Bootstrap
  • For each bootstrap sample
  • Draw N random numbers, with replacement, from 1
    to N, denoted i(1)..i(N)
  • Construct Y based on i(1)..i(N)
  • Calculate the statistic of interest, called Zm,
  • Repeat this process M (thousands) of times.

10
Bootstraps Comparing density
11
Bootstraps Comparing density
BOOTSTRAP PAIRED SAMPLE T-TEST -------------------
--------------------------------------------------
----------- Density of trade_min is
0.2446 Density of trade_dip is 0.6685 Difference
in density is -0.4239 Number of bootstrap
samples 5000 Variance of ties for trade_min
0.1851 Variance of ties for trade_dip
0.2220 Classical standard error of difference
0.0272 Classical t-test (indep samples)
-15.6096 Estimated bootstrap standard error for
density of trade_min 0.0458 Estimated bootstrap
standard error for density of trade_dip
0.0553 Bootstrap standard error of the difference
(indep samples) 0.0719 95 confidence interval
for the difference (indep samples) -0.5648,
-0.2831 bootstrap t-statistic (indep samples)
-5.8994 Bootstrap SE for the difference (paired
samples) 0.0430 95 bootstrap CI for the
difference (paired samples) -0.5082,
-0.3396 t-statistic -9.8547 Average bootstrap
difference -0.3972 Proportion of absolute
differences as large as observed
0.0002 Proportion of differences as large as
observed 1.0000 Proportion of differences as
large as observed 0.0002
12
Measurement Sensitivity
  • A related question How confident can you be in
    any measure on an observed network, given the
    likelihood that observed ties are, in fact,
    observed with error?
  • Implies that some of the observed 0s are in fact
    1s and some of the 1s are in fact 0s.
  • Suggests that we view the network not as a binary
    array of 0s and 1s, but instead a set of
    probabiliites, such that
  • Pij f(Aij)
  • We can then calculate the statistic of interest M
    times under different realizations of the network
    given Pij and get a distribution of the statistic
    of interest.

13
Measurement Sensitivity
  • It seems a reasonable approach to assessing the
    effect of measurement error on the ties in a
    network is to ask how would the network measures
    change if the observed ties differed from those
    observed. This question can be answered simply
    with Monte Carlo simulations on the observed
    network. Thus, the procedure I propose is to
  • Generate a probability matrix from the set of
    observed ties,
  • Generate many realizations of the network based
    on these underlying probabilities, and
  • Compare the distribution of generated statistics
    to those observed in the data.
  • How do we set pij?
  • Range based on observed features (Sensitivity
    analysis)
  • Outcome of a model based on observed patterns
    (ERGM)

14
Measurement Sensitivity
As an example, consider the problem of defining
friendship ties in highschools. Should we
count nominations that are not reciprocated?
15
Measurement Sensitivity
Reciprocated
All ties
16
Measurement Sensitivity
17
Measurement Sensitivity
18
Measurement Sensitivity
19
Measurement Sensitivity
20
Measurement Sensitivity
21
Measurement Sensitivity
22
Statistical Analysis of Social Networks
Comparing multiple networks QAP
  • The substantive question is how one set of
    relations (or dyadic attributes) relates to
    another.
  • For example
  • Do marriage ties correlate with business ties in
    the Medici family network?
  • Are friendship relations correlated with joint
    membership in a club?

(hidden)
23
Assessing the correlation is straight forward, as
we simply correlate each corresponding cell of
the two matrices
Dyads 1 2 0 0 1 3 0 0 1 4 0 0 1 5 0
0 1 6 0 0 1 7 0 0 1 8 0 0 1 9 1 0 1
10 0 0 1 11 0 0 1 12 0 0 1 13 0 0 1 14 0
0 1 15 0 0 1 16 0 0 2 1 0 0 2 3 0 0 2
4 0 0 2 5 0 0 2 6 1 0 2 7 1 0 2 8 0
0 2 9 1 0 2 10 0 0 2 11 0 0 2 12 0 0 2
13 0 0 2 14 0 0 2 15 0 0 2 16 0 0
Marriage 1 ACCIAIUOL 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 2 ALBIZZI 0 0 0 0 0 1 1 0 1 0 0 0 0 0
0 0 3 BARBADORI 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
0 4 BISCHERI 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
5 CASTELLAN 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 6
GINORI 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7
GUADAGNI 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 8
LAMBERTES 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 9
MEDICI 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 10
PAZZI 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 11
PERUZZI 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 12
PUCCI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13
RIDOLFI 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 14
SALVIATI 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 15
STROZZI 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 16
TORNABUON 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0
Business 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 1 1 0
0 1 0 1 0 0 0 0 0 4 0 0 0 0 0 0 1 1 0 0 1 0 0 0
0 0 5 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 6 0 0 1
0 0 0 0 0 1 0 0 0 0 0 0 0 7 0 0 0 1 0 0 0 1 0 0
0 0 0 0 0 0 8 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0
9 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 1 10 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 11 0 0 1 1 1 0 0 1 0 0 0 0 0
0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 16 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Correlation 1 0.3718679 0.3718679
1
(hidden)
24
Comparing multiple networks QAP
But is the observed value statistically
significant? Cant use standard inference, since
the assumptions are violated. Instead, we use a
permutation approach. Essentially, we are
asking whether the observed correlation is large
(small) compared to that which we would get if
the assignment of variables to nodes were random,
but the interdependencies within variables were
maintained. Do this by randomly sorting the rows
and columns of the matrix, then re-estimating the
correlation.
(hidden)
25
Comparing multiple networks QAP
When you permute, you have to permute both the
rows and the columns simultaneously to maintain
the interdependencies in the data
ID ORIG A 0 1 2 3 4 B 0 0 1 2 3 C 0 0 0 1 2 D
0 0 0 0 1 E 0 0 0 0 0
Sorted A 0 3 1 2 4 D 0 0 0 0 1 B 0 2 0 1
3 C 0 1 0 0 2 E 0 0 0 0 0
(hidden)
26
Comparing multiple networks QAP
  • Procedure
  • Calculate the observed correlation
  • for K iterations do
  • a) randomly sort one of the matrices
  • b) recalculate the correlation
  • c) store the outcome
  • 3. compare the observed correlation to the
    distribution of correlations created by the
    random permutations.

(hidden)
27
Comparing multiple networks QAP
(hidden)
28
Running QAP in UCINET
(hidden)
29
QAP MATRIX CORRELATION ---------------------------
--------------------------------------------------
--- Observed matrix
PadgBUS Structure matrix PadgMAR
of Permutations 2500 Random seed
356 Univariate statistics
1 2 PadgBUS
PadgMAR ------- ------- 1
Mean 0.125 0.167 2 Std Dev 0.331
0.373 3 Sum 30.000 40.000 4 Variance
0.109 0.139 5 SSQ 30.000 40.000 6
MCSSQ 26.250 33.333 7 Euc Norm 5.477
6.325 8 Minimum 0.000 0.000 9 Maximum
1.000 1.000 10 N of Obs 240.000
240.000 Hubert's gamma 16.000 Bivariate
Statistics 1
2 3 4 5 6
7 Value
Signif Avg SD P(Large) P(Small)
NPerm ---------
--------- --------- --------- --------- ---------
--------- 1 Pearson Correlation 0.372
0.000 0.001 0.092 0.000 1.000
2500.000 2 Simple Matching 0.842
0.000 0.750 0.027 0.000 1.000
2500.000 3 Jaccard Coefficient 0.296
0.000 0.079 0.046 0.000 1.000
2500.000 4 Goodman-Kruskal Gamma 0.797
0.000 -0.064 0.382 0.000 1.000
2500.000 5 Hamming Distance 38.000
0.000 59.908 5.581 1.000 0.000
2500.000
(hidden)
30
Running QAP in UCINET Regression
Using the same logic,we can estimate alternative
models, such as regression. Only complication is
that you need to permute all of the independent
matrices in the same way each iteration.
(hidden)
31
Simple Example
NODE ADJMAT SAMERCE
SAMESEX 1 0 1 1 1 0 0 0 0 0 0 1 0 0 1
0 0 0 1 0 0 1 1 0 0 1 1 0 2 1 0 1 0 0 0 1
0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1
3 1 1 0 0 1 0 1 0 0 0 0 0 1 0 1 1 1 0 1 0
0 1 0 0 1 1 0 4 1 0 0 0 1 0 0 0 0 0 0 1 0
0 1 1 1 0 1 0 1 0 0 0 1 1 0 5 0 0 1 1 0 1
0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1
6 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
1 0 0 1 0 0 0 1 7 0 1 1 0 0 0 0 0 0 0 0 1
1 0 1 0 1 0 1 0 1 1 0 0 0 1 0 8 0 0 0 0 1
1 0 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 0 0 1 0
0 9 0 0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 0
0 1 0 0 1 1 0 0 0
(hidden)
32
Simple Example
Distance (Dijabs(Yi-Yj) .000 .277 .228 .181 .278
.298 .095 .307 .481 .277 .000 .049 .096 .555 .575
.182 .584 .758 .228 .049 .000 .047 .506 .526 .134
.535 .710 .181 .096 .047 .000 .459 .479 .087 .488
.663 .278 .555 .506 .459 .000 .020 .372 .029
.204 .298 .575 .526 .479 .020 .000 .392 .009
.184 .095 .182 .134 .087 .372 .392 .000 .401
.576 .307 .584 .535 .488 .029 .009 .401 .000
.175 .481 .758 .710 .663 .204 .184 .576 .175 .000
Y 0.32 0.59 0.54 0.50 0.04 0.02 0.41
0.01 -0.17
(hidden)
33
Simple Example
(hidden)
34
Simple Example, continuous dep. variable
of permutations 2000 Diagonal
valid? NO Random seed
995 Dependent variable
EX_SIM Expected values
C\moody\Classes\soc884\examples\UCINET\mrqap-pred
icted Independent variables EX_SSEX
EX_SRCE
EX_ADJ Number of valid observations
among the X variables 72 N 72 Number of
permutations performed 1999 MODEL FIT R-square
Adj R-Sqr Probability of Obs --------
--------- ----------- ----------- 0.289
0.269 0.059 72 REGRESSION
COEFFICIENTS Un-stdized
Stdized Proportion Proportion
Independent Coefficient Coefficient Significance
As Large As Small ----------- -----------
----------- ------------ ----------- -----------
Intercept 0.460139 0.000000 0.034
0.034 0.966 EX_SSEX -0.073787
-0.170620 0.140 0.860 0.140
EX_SRCE -0.020472 -0.047338 0.272
0.728 0.272 EX_ADJ -0.239896
-0.536211 0.012 0.988 0.012
(hidden)
35
We can also model the network as a dependent
variable. In the next example, I model diplomacy
as a function of trade, using the country
data. Note that using UCINET, this gives a
linear probability model, OLS on a 0-1 dependent
variable. This is often not optimal
(hidden)
36
MULTIPLE REGRESSION QAP TRADE DIPLOMACY ----------
--------------------------------------------------
-------------------- of permutations
2000 Diagonal valid? NO Random
seed 1000 Dependent variable
trade_dip Expected values
C\moody\Classes\soc884\examples\UCINET\mrqap-pred
icted Independent variables TRADE_FOOD
TRADE_CRUDE
TRADE_MAN
TRADE_MIN Number of valid observations
among the X variables 552 N 552 Number of
permutations performed 1999 MODEL FIT R-square
Adj R-Sqr Probability of Obs --------
--------- ----------- ----------- 0.317
0.314 0.000 552 REGRESSION
COEFFICIENTS Un-stdized
Stdized Proportion Proportion
Independent Coefficient Coefficient Significance
As Large As Small --------------
----------- ----------- ------------ -----------
----------- Intercept 0.339308
0.000000 1.000 1.000 0.000
TRADE_FOOD 0.049975 0.052744 0.200
0.200 0.800 TRADE_CRUDE
0.109233 0.115284 0.033 0.033
0.967 TRADE_MAN 0.367435 0.387285
0.000 0.000 1.000
TRADE_MIN 0.140151 0.127965 0.058
0.058 0.942 Expected values saved as
dataset mrqap-predicted Valid observations saved
as dataset mrqap-valid
(hidden)
37
One solution is to use a QAP Logit model. Note
(yet) implemented in UCINET, but you can use DAMN
or SAS. Lets look at the country trade data
again, using a logit model.
DAMN REGRESSION PARAMETERS man
coefficient 1.80356 p-value0.00000 food
coefficient 0.30928 p-value0.14867
crude coefficient 0.59624
p-value0.00867 min coefficient
1.61641 p-value0.00000 constant coefficient
-0.78063 p-value0.00000 Final Likelihood
-252.366402
QAP Results Logit model coefficients Significance
results are equal to the proportion of QAP
parameters that are as small as the observed
parameters. PARAMETERS COEF
SIG INTERCEPT -.949 .0010 TMAN 1.863
.9980 TFOOD .3914 .8420 TCRUDE .6667
.9800 TMIN 1.658 .9980
(hidden)
38
Modeling Social Networks parametrically p
approaches
A long research tradition in statistics and
random graph theory has lead to parametric models
of networks. These are models of the entire
graph, though as we will see they often work on
the dyads in the graph to be estimated. Substanti
vely, the approach is to ask whether the graph in
question is an element of the class of all random
graphs with the given known elements. For
example, all graphs with 5 nodes and 3 edges,
or, put probabilistically, the probability of
observing the current graph given the conditions.
39
Random Graphs and Conditional Expectations
The basis for the statistical modeling of graphs
rests on random graph theory. Simply put, Random
graph theory asks what properties do we expect
when ties (Xij) form at random. The simplest
random graph is the Bernoulli random graph, where
Xij is a constant and independent says simply
that each edge in the graph has an independent
probability of being on. Typically this is
an uninteresting distribution of graphs, and we
want to know what the graph looks like
conditional on other features of the graph.
40
Random Graphs and Conditional Expectations
A Bernoulli graph is only conditional on the
expected number of edges. So effectively we ask
What is the probability of observing the graph
we have, given the set of all possible graphs
with the same number of edges. We might,
instead, want to condition on the degree
distribution (sent or received) or all graphs
with a particular dyad distribution (same number
of Mutual, Asymmetric and Null dyads). Closed
form solutions for some graph statistics (like
the triad census) are known for out-degree,
in-degree and MAN (but not all 3 simultaneously).
41
Random Graphs and Conditional Expectations
PAJEK gives you the unconditional expected values
--------------------------------------------------
---------------------------- Triadic Census 2.
i\people\jwm\s884\homework\prison.net
(67) ---------------------------------------------
---------------------------------
Working... ---------------------------------------
------------------------------------- Type
Number of triads (ni) Expected (ei)
(ni-ei)/ei ----------------------------------
------------------------------------------ 1 -
003 39221 37227.47
0.05 2 - 012
5860 9587.83
-0.39 3 - 102 2336
205.78 10.35 4 -
021D 61 205.78
-0.70 5 - 021U
80 205.78
-0.61 6 - 021C 103
411.55 -0.75 7 -
111D 105 17.67
4.94 8 - 111U
69 17.67
2.91 9 - 030T 13
17.67 -0.26 10 -
030C 1 5.89
-0.83 11 - 201
12 0.38
30.65 12 - 120D 15
0.38 38.56 13 -
120U 7 0.38
17.46 14 - 120C
5 0.76
5.59 15 - 210 12
0.03 367.67 16 -
300 5 0.00
21471.04 -------------------------
--------------------------------------------------
- Chi-Square 137414.3919 6 cells (37.50)
have expected frequencies less than 5. The
minimum expected cell frequency is 0.00.
42
Random Graphs and Conditional Expectations
SPAN gives you the (XMAN) distributions
Triad Census T TPCNT PU EVT
VARTU STDDIF 003 39221 0.8187 0.8194 39251
427.69 -1.472 012 5860 0.1223 0.1213 5810.8
1053.5 1.5156 102 2336 0.0488 0.0476 2278.7
321.01 3.1954 021D 61 0.0013 0.0015 70.949
67.37 -1.212 021U 80 0.0017 0.0015 70.949
67.37 1.1027 021C 103 0.0022 0.003 141.9
127.58 -3.444 111D 105 0.0022 0.0023 112.39
103.57 -0.727 111U 69 0.0014 0.0023 112.39
103.57 -4.264 030T 13 0.0003 0.0001 3.4292
3.3956 5.1939 030C 1 209E-7 239E-7 1.1431
1.1393 -0.134 201 12 0.0003 0.0009 42.974
38.123 -5.017 120D 15 0.0003 286E-7 1.3717
1.368 11.652 120U 7 0.0001 286E-7 1.3717
1.368 4.8122 120C 5 0.0001 573E-7 2.7433
2.7285 1.3662 210 12 0.0003 442E-7 2.1186
2.1023 6.8151 300 5 0.0001 549E-8 0.2631
0.2621 9.2522
43
Modeling Social Networks parametrically p
approaches
The earliest approaches are based on simple
random graph theory, but theres been a flurry of
activity in the last 10 years or so. Key
references - Holland and Leinhardt (1981) JASA -
Frank and Strauss (1986) JASA - Wasserman and
Faust (1994) Chap 15 16 - Wasserman and
Pattison (1996)
Thanks to Mark Handcock for sharing some
figures/slides about these models.
44
Modeling Social Networks parametrically p
approaches
Where q is a vector of parameters (like
regression coefficients) z is a vector of network
statistics, conditioning the graph k is a
normalizing constant, to ensure the probabilities
sum to 1.
45
Modeling Social Networks parametrically p
approaches
The simplest graph is a Bernoulli random
graph,where each Xij is independent
Where qij logitP(Xij 1) k(q) P1 exp(ij
)
Note this is one of the few cases where k(q) can
be written.
46
Modeling Social Networks parametrically p
approaches
Typically, we add a homogeneity condition, so
that all isomorphic graphs are equally likely.
The homogeneous bernulli graph model
Where k(q) 1 exp(q)g
47
Modeling Social Networks parametrically p
approaches
If we want to condition on anything much more
complicated than density, the normalizing
constant ends up being a problem. We need a way
to express the probability of the graph that
doesnt depend on that constant. First some
terms
48
Modeling Social Networks parametrically p
approaches
49
Modeling Social Networks parametrically p
approaches
Note that we can now model the conditional
probability of the graph, as a function of a set
of difference statistics, without reference to
the normalizing constant. The model, then,
simply reduces to a logit model on the dyads.
50
Modeling Social Networks parametrically p
approaches
Fitting p models.
I highly recommend working through the p primer
examples, which can be found at http//kentucky.
psych.uiuc.edu/pstar/index.html Including A
Practical Guide To Fitting p Social Network
Models Via Logistic Regression The site includes
the PREPSTAR program for creating the variables
of interest. The following example draws from
this work.
51
Modeling Social Networks parametrically p
approaches Fitting models

We can model this network based on parameters for
overall degree of Choice (?), Differential Choice
Within Positions (?W), Mutuality(?),
Differential Mutuality Within Positions (?W), and
Transitivity (?T). The vector of model
parameters to be estimated is ? ? ?W ?
?W ?T .
52
Modeling Social Networks parametrically p
approaches Fitting models
The first step is to calculate the vector of
change statistics. This is done by first
calculating the value of the statistic if the ij
tie is present, then if it is absent, then take
the difference. The program PREPSTAR does this
for you (see also pspar for large
networks http//www.sfu.ca/richards/Pages/pspar.
html)
For example, the simple choice parameter is Xij,
so if forced present Xij1, if absent, Xij0, the
difference is going to be 1. Since this is true
for every dyad, it is a constant, equivalent to
the model intercept.
53
  • The model described above would be written in WP
    notation as
  • z1(x) L ?i,j Xij is the statistic for the
    Choice parameter, ?,
  • z2(x) LW ?i,j Xij ?ij is the statistic for
    the Choice Within Positions parameter, ?W,
  • z3(x) M ?iltj Xij Xji is the statistic for
    the Mutuality parameter, ?,
  • z4(x) MW ?iltj Xij Xji ?ij is the statistic
    for the Mutuality Within Positions
    parameter, ?W,
  • z5(x) TT ?i,j,k Xij Xjk Xik is the
    statistic for the Transitivity parameter, ?T.
  •  
  • Note that the indicator variable ?ij1 if actors
    i and j are in the same position, and 0 otherwise.

54
Looking over the first few cases
Obs i j tie L L_W M M_W
T_T --- --- --- --- --- --- --- ---
--- 1 1 2 1 1 1 1 1
2 2 1 3 1 1 1 0 0
3 3 1 4 0 1 0 0 0
1 4 1 5 0 1 0 0 0
0 5 1 6 0 1 0 0 0
2 6 2 1 1 1 1 1 1
1 7 2 3 1 1 1 1 1
2 8 2 4 0 1 0 0 0
2 9 2 5 0 1 0 0 0
0 10 2 6 0 1 0 0 0
3 11 3 1 0 1 1 1 1
3 12 3 2 1 1 1 1 1
1 13 3 4 1 1 0 0 0
3 14 3 5 0 1 0 0 0
2 15 3 6 1 1 0 1 0
2 16 4 1 0 1 0 0 0
0 17 4 2 0 1 0 0 0
1 18 4 3 0 1 0 1 0
3 19 4 5 1 1 1 1 1
0 20 4 6 1 1 1 1 1
1
55
Modeling Social Networks parametrically p
approaches Fitting models
proc logistic descending tie l lw m mw tt /
noint run
56
Modeling Social Networks parametrically p
approaches Fitting models
One practical problem is that the resulting
values are often quite correlated, making
estimation difficult. This is particularly
difficult with star parameters.
lw m mw
tt lw 1.00000 0.58333
0.80178 0.15830
0.0007 lt.0001 0.4034 m
0.58333 1.00000 0.80178
-0.02435 0.0007
lt.0001 0.8984 mw 0.80178
0.80178 1.00000 -0.11716
lt.0001 lt.0001
0.5375 tt 0.15830 -0.02435
-0.11716 1.00000 0.4034
0.8984 0.5375
57
Modeling Social Networks parametrically p
approaches Fitting models
  • Parameters that are often fit include
  • Expansiveness and attractiveness parameters.
    dummies for each sender/receiver in the network
  • Degree distribution
  • Mutuality
  • Group membership (and all other parameters by
    group)
  • Transitivity / Intransitivity
  • K-in-stars, k-out-stars
  • Cyclicity

58
Modeling Social Networks parametrically Exponenti
al Random Graph Models
In practice, p models are difficult to estimate,
and we have no good sense of how approximate the
PMLE is. A recent generalization is to use MCMC
methods to better estimate the parameters. The
following slides are courtesy of David Hunter at
Penn State.
59
Modeling Social Networks parametrically Exponenti
al Random Graph Models Degeneracy
"Assessing Degeneracy in Statistical Models of
Social Networks" Mark S. Handcock, CSSS Working
Paper 39
60
Modeling Social Networks parametrically Exponenti
al Random Graph Models Degeneracy
"Assessing Degeneracy in Statistical Models of
Social Networks" Mark S. Handcock, CSSS Working
Paper 39
61
Modeling Social Networks parametrically Exponenti
al Random Graph Models Degeneracy
"Assessing Degeneracy in Statistical Models of
Social Networks" Mark S. Handcock, CSSS Working
Paper 39
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
Generating Random Graph Samples
A conceptual merge between exponential random
graph models and QAP/sensitivity models is to
attempt to identify a sample of graphs from the
universe you are trying to model.
That is, generate X empirically, then compare
z(x) to see how likely a measure on x would be
given X. The difficulty, however, is generating
X.
74
Generating Random Graph Samples
The first option would be to generate all
isomorphic graphs within a given
constraint. This is possible for small graphs,
but the number gets large fast. For a network
with 3 nodes, there are 16 possible directed
graphs. For a network with 4 nodes, there are
218, for 5 nodes 9608, for 6 nodes1,540,944, and
so on So, the best approach is to sample from
the universe, but, of course, if you had the
universe you wouldnt need to sample from it.
How do you sample from a population you havent
observed? (a) use a construction algorithm
that generates a random graph with known
constraints (b) use a ERGM model like above.
75
Generating Random Graph Samples
Tom Snijders has a program called ZO (Zero-One)
for doing this. http//stat.gamma.rug.nl/snijders
/ The program only works well for smallish
networks (less than 100)
76
Generating Random Networks with Structural
Constraints.
General strategy Assign arcs at
random within the cells of an adjacency matrix
until the desired graph is
achieved.
Process. 1) Define the pool of open arcs.
Any cells of the g by g matrix which are
structurally zero are not allowed.
5
3
4
5
1
3
0
2
6
7
5
3
4
5
1
3
0
2
6
7
77
Generating Random Networks with Structural
Constraints.
2) Randomly draw an element from the available
set.
5
3
4
5
1
3
0
2
6
7
5
3
4
5
1
3
0
2
6
7
78
Generating Random Networks with Structural
Constraints.
3) Check to see if selected cell meets the
structural condition.
4) If a condition is met,then remove any
implicated cells from the pool.
5
5
3
3
4
4
5
5
1
1
3
3
0
0
2
2
6
6
7
7
5
3
4
5
1
3
0
2
6
7
5
3
4
5
1
3
0
2
6
7
79
Generating Random Networks with Structural
Constraints.
5) Check for Identification Does the last arc
imply the set of arcs for another?
5
3
4
5
1
3
0
2
6
7
5
3
4
5
1
3
0
2
6
7
In this example, there are only 7 available spots
left in the last row, equal to the number needed
to fill that row condition.
80
Generating Random Networks with Structural
Constraints.
Process 1) Identify the pool of open cells.
2) Randomly draw an arc from this pool. 3)
Check the structural conditions against this
arc. 4) If structural conditions are met, then
remove implied cells from the pool. 5)
Check for identification of other arcs.
  • Types of constraints
  • Structural Patterns, such as the in and out
    degree, prohibition against cycles, etc.
  • Category Mixing Constraints. Nodes in category
    i restricted to nodes from category j.
  • Event Counts. Number of mutual arcs, number of
    ties between group i and j, etc.

81
Social Relations at Holy Trinity School.
7th Grade
8th Grade
9th Grade
10th Grade
11th Grade
12th Grade
g 74 l 466 d .086 M108 Transitivity
.357 Mean Degree 6.3
82
Number of Mutual Dyads
2000 Networks, with fixed In and Out Degree
250
200
150
Z.O.
Number of Networks
100
RANFIX
50
0
5
10
15
20
25
30
35
40
Number of Mutual Dyads
83
Distribution of Selected Triad Types
Simulations compared to Observed
2000 random networks, with fixed in and out
degree.
350
300
250
200
Count
150
100
50
0
Observed.
201
030T
300
210
120C
120U
120D
84
Romantic Networks
85
Romantic Networks
Write a Comment
User Comments (0)
About PowerShow.com