Title: Social Sub-groups
1Social Sub-groups
- Overview
- Background
- How do we characterize the social structure of a
group? - Theorists from Simmel to Homans have approached
the question and this week we look at how network
researchers have operationalized the concept of
group. - Ken Frank and Jeffrey Yasumoto
- A discussion of how action is situated within and
between social groups. - Nice application of a group-detection algorithm
on interesting data. - Linton Freeman
- UC-Irvine. Long-standing editor of the journal
Social Networks. - Writes today on the theoretical necessities of a
group. - Methods Graph theoretic elements of group
structure Using UCI-NET -
2Social Sub-groups
Frank Yasumoto Action and Structure
...subgroups may define the essential components
that contextualize actors social ties and
relations. The predominance of subgroups in the
literature, ...leaves unanswered how and why
rational actors simultaneously sustain their
subgroups and the linkages between them.
3Social Sub-groups
Frank Yasumoto Action and Structure
They argue that actors seek social capital,
defined as the access to resources through social
ties, and emphasize two mechanisms a)
Reciprocity Transactions Actors seek to
build obligations with others, and thereby
gain in the ability to extract resources. b)
Enforceable Trust Social capital is
generated by individual members
disciplined compliance with group
expectations. An indirect, group level
effect, that comes through the judicious
non-use of negative action. (p.646)
4Social Sub-groups
Frank Yasumoto Action and Structure
They expect to find evidence of enforceable trust
within social subgroups and evidence of
reciprocity between such groups. To do so, they
must identify primary subgroups within the
network. The do so using a density based
criterion. Franks algorithm iteratively assigns
nodes to subgroups until a maximum ratio of
within to between group ties is reached. This
results in a block diagonal adjacency matrix,
where most of the ties fall along the diagonal.
5Relations among the French Financial Elite (as
drawn by FY)
6As drawn by PAJEK
7Relations among the French Financial Elite Group
to group density table
8Relations among the French Financial Elite
Given a subgroup structure, how do these groups
relate to social capital? Enforceable trust
Look for acts of hostility. A hostile act was
any action on the part of one actor that would
deprive another actor of access to
resources. Note that these were rare. Only 15
overall, likely indicating some level of cohesion
in the system as a whole. On the whole, they
find that -- net of other focal features and
direct ties -- being members of the same
sub-group lowers the probability of a negative
action between the dyad
9Relations among the French Financial Elite
They repeat the exercise with positive
support. They find that supportive actions are
better predicted by friendship (reciprocity) than
by subgroup membership. They conclude that this
supports the hypothesis that the potential for
enforceable trust within subgroups reduces the
relative need to pursue social capital through
reciprocity transactions within subgroups.
(p.647) Instead, they find that support occurs
between subgroups.
10Social Sub-groups
Lin Freeman The sociological concept of Group
Focus on collectivities that are Relatively
small, informal, and involve close personal
ties. What we would call Primary
Groups What (network) structure characterizes
such a group?
Goal Identify (a) non-overlapping groups that
allow one to (b) identify internal group
structure.
11Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
1) Assign people to equivalence classes that are
hierarchically nested
12Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
To assign people to a class, you must first
identify the strength of the relation between
each pair. Winships model says that you define
proximity based on interaction such that
13Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
In words, this means that whatever metric you
define, a person is closer to themselves than to
anyone else, that the relation be symmetric, and
that triads be transitive (which, given the
symmetric condition, means that they be
complete). You can then identify partitions by
scaling the proximity, such that these three
conditions are met.
14Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
15Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
total
A-G
H-K
A-C
D-G
16Social Sub-groups
Lin Freeman The sociological concept of Group
Granovetters Model
Proceed exactly as in Winship, but treat
intransitivity differently when looking at strong
or weak ties.
If x and y are strongly connected, and y and z
are strongly connected, then x and z should be at
least weakly connected.
17Social Sub-groups
Lin Freeman The sociological concept of Group
Granovetters Model
An example of a graph fitting the prohibition
against G-intransitive relations.
18Social Sub-groups
The Davis - Old South Example
19Social Sub-groups
The Davis - Old South Example Ties gt 2
20Social Sub-groups
The Davis - Old South Example Ties gt 3
21Social Sub-groups
The Davis - Old South Example Ties gt 4
22Social Sub-groups
The Davis - Old South Example Ties gt 5
23Social Sub-groups
Lin Freeman The sociological concept of Group
Freeman argues that the G-intransitivity model
fits the data best for each of the 7 groups he
studies. Substantively, the types of groups
this model predicts are very similar to those
predicted by the general transitivity model,
except re-cast as a valued relation. Empirically,
if you want to identify groups based on levels
like this, you can use PAJEK and walk through the
model in just the same way as we did with Old
South or you can use UCI-NET IV (Available in
the SRL)
24Methods How do we identify primary groups in a
network?
- A) Graph theoretical methods Cliques and
extensions of cliques - Cliques
- k-cores
- k-plexes
- Freeman (1992) Models
- K-components (we talked about these already)
- B) Algorithmic methods search through a network
trying to maximize for a particular pattern (I.e.
like Frank Yasumoto) - Adjust assignment of actors to groups until a
particular pattern of ties (block diagonal,
usually) is identified. - Standard models
- - Factions (UCI-NET)
- - NEGOPY (Richards)
- - KliqueFinder (Frank)
- - RNM (Moody)
- - CROWDS (Moody)
- - General Distance Clustering Methods
25Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
Start with a clique. A clique is defined as a
maximal subgraph in which every member of the
graph is connected to every other member of the
graph. Cliques are collections of nodes where
density 1.0.
- Properties of cliques
- Density 1.0
- Everyone connected to n-1 alters
- Distance between every pair is 1
- Ratio of within group ties to between group ties
is infinite - All triads are transitive
26Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
In practice, complete cliques are not very
useful. They tend to overlap heavily and are
limited in their size.
Graph theorists have thus relaxed the complete
connectivity requirement (with varying degrees of
success). See the Moody White paper on cohesion
for a discussion of many of these attempts.
27Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
k-cores Every person connected to at least k
other people.
Ideally, they would look something like this
(here two 3-cores). However, adding a single
tie from A to B would make the whole graph a
3-core
28Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
Extensions of this idea include
K-plex Every member connected to at least n-k
other people in the graph (recall in a clique
everyone is connected to n-1, so this relaxes
that condition. n-clique Every person is
connected by a path of N or less (recall a clique
is with distance 1). N-clan same as an
n-clique, but all paths must be inside the
group. Ive never had much luck with any of
these methods empirically. Real data is usually
too messy to work well. You should try them, and
gain some intuition for yourself. The place to
start is in UCINET.
29Methods How do we identify primary groups in a
network?
Graph Theoretic concepts in UCINET PAJEK
UCINET will compute all of the best-known graph
theoretic treatments for subgroups
30Methods How do we identify primary groups in a
network?
Consider running different methods on a known
group structure
31Methods How do we identify primary groups in a
network?
32Methods How do we identify primary groups in a
network?
Cliques
33Methods How do we identify primary groups in a
network?
Cliques
The only way to get something meaningful from
this is to analyze the clique overlap matrix,
which is what the Clique by partion dataset
does, using cluster analysis
34Methods How do we identify primary groups in a
network?
K-Cores
(See example, but in this case it works very
poorly)
35Methods How do we identify primary groups in a
network?
n-Clique (Everyone linked by a path of at least
length n)
36Methods How do we identify primary groups in a
network?
n-Clique (Everyone linked by a path of at least
length n)
37Methods How do we identify primary groups in a
network?
n-Clan (Everyone linked by a path of at least
length n, but path is INSIDE group)
38Methods How do we identify primary groups in a
network?
K-plex (each member of a K-plex of size N has
N-K ties to other members)
39Methods How do we identify primary groups in a
network?
Optimization Methods
Since many of the graph-theoretic options seem
not to work well, authors have used optimization
techniques, that attempt to identify groups
inclusively.
40Methods How do we identify primary groups in a
network?
Identifying Primary groups 1) Measures of fit
To identify a primary group, we need some
measure of how clustered the network is.
Usually, this is a function of the number of ties
that fall within group to the number of ties that
fall between group. 2) Algorithmic approaches
to maximizing (1) Once we have such an index, we
need a method for searching through the network
to maximize the fit. We next go over various
algorithms, that search different criteria for a
fit. 3) Generalized cluster analysis In
addition to maximizing a group function such as
(1) we can use the relational distance directly,
and look for clusters in the data. We next go
over two different styles of cluster analysis
41Segregation Index (Freeman, L. C. 1972.
"Segregation in Social Networks." Sociological
Methods and Research 6411-30.)
Freeman asked how we could identify segregation
in a social network. Theoretically, he argues,
if a given attribute (group label) does not
matter for social relations, then relations
should be distributed randomly with respect to
the attribute. Thus, the difference between the
number of cross-group ties expected by chance and
the number observed measures segregation.
42Segregation Index
Consider the (hypothetical) network below. There
are two attributes in this network people with
Blue eyes and Brown eyes and people who are
square or not (they must be hip).
43Segregation Index
Mixing Matrix
44Segregation Index
To calculate the number of expected, use the
standard formula for a contingency table Row
marginal column Marginal / Total
observed
Expected
Blue Brown Blue 6 17 23 Brown
17 16 33 23
33 56
Blue Brown Blue 9.45
13.55 23 Brown 13.55 19.45 33
23 33 56
In matrix form
E(X) RC/T
45Segregation Index
observed
Expected
Blue Brown Blue 6 17 23 Brown
17 16 33 23
33 56
Blue Brown Blue 9.45
13.55 23 Brown 13.55 19.45 33
23 33 56
E(X) (13.5513.55) X (1717) Seg
27.1 - 34 / 27.1 -6.9 / 27.1 -0.25
46Segregation Index
Observed
Expected
Hip Square Hip 20
3 23 Square 3 30 33 23 33
56
Hip Square Hip 9.45
13.55 23 Square 13.55 19.45 33
23 33 56
E(X) (13.5513.55) X (33) Seg
27.1 - 6 / 27.1 21.1 / 27.1 0.78
47Segregation Index
In SAS, you need to create a mixing matrix to
calculate the segregation index. Mixmat.mod will
do this. It does so using an indicator matrix.
Blue 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0
1 0 1 0 1 0 1 0 1 0 1 0 1
Square 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0
1 0 1 0 1 0 1 0 1
48Segregation Index
You get the mixing matrix by pre multiplying the
adjacency matrix by the transpose of the
indicator matrix and post multiplying by the
indicator matrix
M IAI
I A I
M
(k x k)
(k x n)(n x n)(n x k)
49Segregation Index
In practice, how does the segregation index work?
This is a plot of the extent of race segregation
in a high school, by the racial heterogeneity of
the high school
50Segregation Index
One problem with the segregation index is that it
is not margin free. That is, if you were to
change the distribution of the category of
interest (say race) by a constant but not the
core association between race and friendship
choice, you can get a different segregation
level. One antidote to this problem is to use
odds ratios. In this case, and odds ratio tells
us the relative likelihood that two people in the
same category will choose each other as friends.
51Odds Ratios
The odds ratio tells us how much more likely
people in the same group are to nominate each
other. You calculate the odds ratio based on the
number of ties in a group and their relative
size, based on the following table
Member of Same Group Different
Group Friends A
B Not Friends C D
OR AD/ BC
52Odds Ratios
There are 6 hip people and 9 square people in
this network. This implies that there are the
following number of possible ties in the network
Observed
Hip Square Hip 20
3 23 Square 3 30 33 23 33
56
Hip Square Hip 30
54 Square 54 72 Diagonal
ni(ni-1) off diagonal ni2
Group Same
Dif Yes 50 6 Friend
No 52 102
OR (50)102 / 52(6) 16.35
53Segregation index compared to the odds ratio
Friendship Segregation Index
r.95
Log(Same-Sex Odds Ratio)
54Algorithms that maximize this type of fit
(density / tie ratio based)
- Factions in UCI-NET
- Multiple options for the exact factor maximized.
I recommend either the density or the correlation
function, and I would calculate the distance in
each case. - Franks KliqueFinder (the AJS paper we just read)
- I have it, but Ive yet to be able to get it to
work. The folks at UCI-NET are planning on
incorporating it into the next version. - Fershtmans SMI
- Never seen it programmed, though I use some of
the ideas in the CROWDS algorithm discussed below
55Factions in UCI-NET
56Factions in UCI-NET
57Factions in UCI-NET
58Factions in UCI-NET
Reduced BlockMatrix 1 2 3 4 5 6
-- -- -- -- -- -- 1 59 1 2 14 1 0 2 1
54 0 1 12 2 3 1 2 55 0 1 12 4 9 1
1 51 0 0 5 0 12 2 0 62 1 6 1 0 9
2 0 64
Fit perfectly
59Cluster analysis
In addition to tools like FACTIONS, we can use
the distance information contained in a network
to cluster observations that are close to each
other. In general, cluster analysis is a set of
techniques that allows you to identify
collections of objects that are simmilar to each
other in some degree. A very good reference is
the SAS/STAT manual section called, Introduction
to clustering procedures. (http//wks.uts.ohio-s
tate.edu/sasdoc/8/sashtml/stat/chap8/index.htm)
(See also Wasserman and Faust, though the
coverage is spotty). We are going to start with
the general problem of hierarchical clustering
applied to any set of analytic objects based on
similarity, and then transfer that to clustering
nodes in a network.
60Cluster analysis
Imagine a set of objects (say people) arrayed in
a two dimensional space. You want to identify
groups of people based on their position in that
space. How do you do it?
How Smart you are
How Cool you are
61Cluster analysis
Start by choosing a pair of people who are very
close to each other (such as 15 16) and now
treat that pair as one point, with a value equal
to the mean position of the two nodes.
x
62Cluster analysis
Now repeat that process for as long as possible.
63Cluster analysis
This process is captured in the cluster tree
(called a dendrogram)
64Cluster analysis
- As with the network cluster algorithms, there are
many options for clustering. The three that I
use most are - Wards Minimum Variance -- the one I use almost
95 of the time - Average Distance -- the one used in the example
above - Median Distance -- very similar
- Again, the SAS manual is the best single place
Ive found for information on each of these
techniques. - Some things to keep in mind
- Units matter. The example above draws together
pairs horizontally because the range there is
smaller. Get around this by standardizing your
data. - This is an inductive technique. You can find
clusters in a purely random distribution of
points. Consider the following example. -
65Cluster analysis
The data in this scatter plot are produced using
this code
data random do i1 to 20 xrannor(0)
yrannor(0) output end run
66Cluster analysis
Resulting dendrogram
67Cluster analysis
Resulting cluster solution
68Cluster analysis
Cluster analysis works by building a distance
matrix between each pair of points. In the
example above, it used the Euclidean distance
which in two dimensions is simply the physical
distance between the points in a plot. Can
work on any number of dimensions. To use
cluster analysis in a network, we base the
distance on the path-distance between pairs of
people in the network. Consider again the
blue-eye hip example
69Cluster analysis
Distance Matrix 0 1 3 2 3 3 4 3 3 2 3 2 2 1 1 1 0
2 2 2 3 3 3 2 1 2 2 1 2 1 3 2 0 3 2 4 3 3 2 1 1 1
2 2 3 2 2 3 0 1 1 2 1 1 2 3 3 3 2 1 3 2 2 1 0 2 1
1 1 1 2 2 3 3 2 3 3 4 1 2 0 1 1 2 3 4 4 4 3 2 4 3
3 2 1 1 0 2 2 2 3 3 4 4 3 3 3 3 1 1 1 2 0 1 2 3 3
4 3 2 3 2 2 1 1 2 2 1 0 1 2 2 3 3 2 2 1 1 2 1 3 2
2 1 0 1 1 2 2 2 3 2 1 3 2 4 3 3 2 1 0 1 2 2 3 2 2
1 3 2 4 3 3 2 1 1 0 1 1 2 2 1 2 3 3 4 4 4 3 2 2 1
0 2 2 1 2 2 2 3 3 4 3 3 2 2 1 2 0 1 1 1 3 1 2 2 3
2 2 2 3 2 2 1 0
70Cluster analysis
The distance matrix implies a space that nodes
are embedded within. Using something like MDS,
we can represent the space implied by the
distance matrix in two dimensions. This is the
image of the network you would get if you did
that.
71Cluster analysis
When you use variables, the cluster analysis
program generates a distance matrix. We can,
instead use the network distance matrix directly.
If we do that with this example network, we get
the following
72Cluster analysis
73Cluster analysis
In SAS you use two commands to get a cluster
analysis. The first does the hierarchical
clustering. The second analyzes the cluster
output to create the tree. Example 1. Using
variables to define the space (like income and
musical taste)
proc cluster dataa methodave outclustd
std var x y id node run proc tree
dataclustd ncl5 outcluvars run
74Cluster analysis
proc iml include 'c\moody\sas\programs\modules
\reach.mod' / blue eye example /
mat2j(15,15,0) mat21,2 14 151 / lines
cut here / mat215,1 14 2 41
dmatreach(mat2) mattrib dmat format1.0
print dmat id1nrow(dmat) idid
ddatiddmat create ddat from ddat /
creates the dataset / append from
ddat quit data ddat (typedist) / tells
SAS it is a distance / set ddat /
matrix / run
Example 2. Using a pre-defined distance matrix
to define the space (as in a social network). You
first create the distance matrix (in IML), then
use it in the cluster program.
75Cluster analysis
Example 2. Using a pre-defined distance matrix
to define the space (as in a social
network). Once you have it, the cluster program
is just the same.
proc cluster dataddat methodward
outclustd id col1 run proc tree dataclustd
ncl3 outnetclust copy col1 run proc freq
datanetclust tables cluster run proc print
datanetclust var col1 cluster run
76The CROWDS algorithm combines the density
approach above with an initial cluster analysis
and a routine for determining how many clusters
are in the network. It does so by using the
Segregation index and all of the information from
the cluster hierarchy, combining two groups only
if it improves the segregation fit for both
groups.
77The one other program you should know about is
NEGOPY. Negopy is a program that combines
elements of the density based approach and the
graph theoretic approach to find groups and
positions. Like CROWDS, NEGOPY assigns people
both to groups and to outsider or between
group positions. It also tells you how many
groups are in the network. Its a DOS based
program, and a little clunky to use, but
NEGWRITE.MOD will translate your data into NEGOPY
format if you want to use it. There are many
other approaches. If youre interested in some
specifically designed for very large networks
(10,000 nodes), Ive developed something I call
Recursive Neighborhood Means that seems to work
fairly well.