Social Sub-groups

About This Presentation

Title:

Social Sub-groups

Description:

Properties of cliques: Density: 1.0. Everyone connected to n-1 alters ... Example 1. Using variables to define the space (like income and musical taste) ... – PowerPoint PPT presentation

Number of Views:105

Avg rating:3.0/5.0

Slides: 78

Provided by: sociologyr

Learn more at: https://people.duke.edu

Category:

more less

Transcript and Presenter's Notes

Title: Social Sub-groups

1
Social Sub-groups

Overview
Background
How do we characterize the social structure of a
group?
Theorists from Simmel to Homans have approached
the question and this week we look at how network
researchers have operationalized the concept of
group.
Ken Frank and Jeffrey Yasumoto
A discussion of how action is situated within and
between social groups.
Nice application of a group-detection algorithm
on interesting data.
Linton Freeman
UC-Irvine. Long-standing editor of the journal
Social Networks.
Writes today on the theoretical necessities of a
group.
Methods Graph theoretic elements of group
structure Using UCI-NET

2
Social Sub-groups
Frank Yasumoto Action and Structure
...subgroups may define the essential components
that contextualize actors social ties and
relations. The predominance of subgroups in the
literature, ...leaves unanswered how and why
rational actors simultaneously sustain their
subgroups and the linkages between them.
3
Social Sub-groups
Frank Yasumoto Action and Structure
They argue that actors seek social capital,
defined as the access to resources through social
ties, and emphasize two mechanisms a)
Reciprocity Transactions Actors seek to
build obligations with others, and thereby
gain in the ability to extract resources. b)
Enforceable Trust Social capital is
generated by individual members
disciplined compliance with group
expectations. An indirect, group level
effect, that comes through the judicious
non-use of negative action. (p.646)
4
Social Sub-groups
Frank Yasumoto Action and Structure
They expect to find evidence of enforceable trust
within social subgroups and evidence of
reciprocity between such groups. To do so, they
must identify primary subgroups within the
network. The do so using a density based
criterion. Franks algorithm iteratively assigns
nodes to subgroups until a maximum ratio of
within to between group ties is reached. This
results in a block diagonal adjacency matrix,
where most of the ties fall along the diagonal.
5
Relations among the French Financial Elite (as
drawn by FY)
6
As drawn by PAJEK
7
Relations among the French Financial Elite Group
to group density table
8
Relations among the French Financial Elite
Given a subgroup structure, how do these groups
relate to social capital? Enforceable trust
Look for acts of hostility. A hostile act was
any action on the part of one actor that would
deprive another actor of access to
resources. Note that these were rare. Only 15
overall, likely indicating some level of cohesion
in the system as a whole. On the whole, they
find that -- net of other focal features and
direct ties -- being members of the same
sub-group lowers the probability of a negative
action between the dyad
9
Relations among the French Financial Elite
They repeat the exercise with positive
support. They find that supportive actions are
better predicted by friendship (reciprocity) than
by subgroup membership. They conclude that this
supports the hypothesis that the potential for
enforceable trust within subgroups reduces the
relative need to pursue social capital through
reciprocity transactions within subgroups.
(p.647) Instead, they find that support occurs
between subgroups.
10
Social Sub-groups
Lin Freeman The sociological concept of Group
Focus on collectivities that are Relatively
small, informal, and involve close personal
ties. What we would call Primary
Groups What (network) structure characterizes
such a group?
Goal Identify (a) non-overlapping groups that
allow one to (b) identify internal group
structure.
11
Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
1) Assign people to equivalence classes that are
hierarchically nested
12
Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
To assign people to a class, you must first
identify the strength of the relation between
each pair. Winships model says that you define
proximity based on interaction such that
13
Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
In words, this means that whatever metric you
define, a person is closer to themselves than to
anyone else, that the relation be symmetric, and
that triads be transitive (which, given the
symmetric condition, means that they be
complete). You can then identify partitions by
scaling the proximity, such that these three
conditions are met.
14
Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
15
Social Sub-groups
Lin Freeman The sociological concept of Group
Winships Model
total
A-G
H-K
A-C
D-G
16
Social Sub-groups
Lin Freeman The sociological concept of Group
Granovetters Model
Proceed exactly as in Winship, but treat
intransitivity differently when looking at strong
or weak ties.
If x and y are strongly connected, and y and z
are strongly connected, then x and z should be at
least weakly connected.
17
Social Sub-groups
Lin Freeman The sociological concept of Group
Granovetters Model
An example of a graph fitting the prohibition
against G-intransitive relations.
18
Social Sub-groups
The Davis - Old South Example
19
Social Sub-groups
The Davis - Old South Example Ties gt 2
20
Social Sub-groups
The Davis - Old South Example Ties gt 3
21
Social Sub-groups
The Davis - Old South Example Ties gt 4
22
Social Sub-groups
The Davis - Old South Example Ties gt 5
23
Social Sub-groups
Lin Freeman The sociological concept of Group
Freeman argues that the G-intransitivity model
fits the data best for each of the 7 groups he
studies. Substantively, the types of groups
this model predicts are very similar to those
predicted by the general transitivity model,
except re-cast as a valued relation. Empirically,
if you want to identify groups based on levels
like this, you can use PAJEK and walk through the
model in just the same way as we did with Old
South or you can use UCI-NET IV (Available in
the SRL)
24
Methods How do we identify primary groups in a
network?

A) Graph theoretical methods Cliques and
extensions of cliques
Cliques
k-cores
k-plexes
Freeman (1992) Models
K-components (we talked about these already)
B) Algorithmic methods search through a network
trying to maximize for a particular pattern (I.e.
like Frank Yasumoto)
Adjust assignment of actors to groups until a
particular pattern of ties (block diagonal,
usually) is identified.
Standard models
- Factions (UCI-NET)
- NEGOPY (Richards)
- KliqueFinder (Frank)
- RNM (Moody)
- CROWDS (Moody)
- General Distance Clustering Methods

25
Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
Start with a clique. A clique is defined as a
maximal subgraph in which every member of the
graph is connected to every other member of the
graph. Cliques are collections of nodes where
density 1.0.

Properties of cliques
Density 1.0
Everyone connected to n-1 alters
Distance between every pair is 1
Ratio of within group ties to between group ties
is infinite
All triads are transitive

26
Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
In practice, complete cliques are not very
useful. They tend to overlap heavily and are
limited in their size.
Graph theorists have thus relaxed the complete
connectivity requirement (with varying degrees of
success). See the Moody White paper on cohesion
for a discussion of many of these attempts.
27
Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
k-cores Every person connected to at least k
other people.
Ideally, they would look something like this
(here two 3-cores). However, adding a single
tie from A to B would make the whole graph a
3-core
28
Methods How do we identify primary groups in a
network?
Graph Theoretical Models.
Extensions of this idea include
K-plex Every member connected to at least n-k
other people in the graph (recall in a clique
everyone is connected to n-1, so this relaxes
that condition. n-clique Every person is
connected by a path of N or less (recall a clique
is with distance 1). N-clan same as an
n-clique, but all paths must be inside the
group. Ive never had much luck with any of
these methods empirically. Real data is usually
too messy to work well. You should try them, and
gain some intuition for yourself. The place to
start is in UCINET.
29
Methods How do we identify primary groups in a
network?
Graph Theoretic concepts in UCINET PAJEK
UCINET will compute all of the best-known graph
theoretic treatments for subgroups
30
Methods How do we identify primary groups in a
network?
Consider running different methods on a known
group structure
31
Methods How do we identify primary groups in a
network?
32
Methods How do we identify primary groups in a
network?
Cliques
33
Methods How do we identify primary groups in a
network?
Cliques
The only way to get something meaningful from
this is to analyze the clique overlap matrix,
which is what the Clique by partion dataset
does, using cluster analysis
34
Methods How do we identify primary groups in a
network?
K-Cores
(See example, but in this case it works very
poorly)
35
Methods How do we identify primary groups in a
network?
n-Clique (Everyone linked by a path of at least
length n)
36
Methods How do we identify primary groups in a
network?
n-Clique (Everyone linked by a path of at least
length n)
37
Methods How do we identify primary groups in a
network?
n-Clan (Everyone linked by a path of at least
length n, but path is INSIDE group)
38
Methods How do we identify primary groups in a
network?
K-plex (each member of a K-plex of size N has
N-K ties to other members)
39
Methods How do we identify primary groups in a
network?
Optimization Methods
Since many of the graph-theoretic options seem
not to work well, authors have used optimization
techniques, that attempt to identify groups
inclusively.
40
Methods How do we identify primary groups in a
network?
Identifying Primary groups 1) Measures of fit
To identify a primary group, we need some
measure of how clustered the network is.
Usually, this is a function of the number of ties
that fall within group to the number of ties that
fall between group. 2) Algorithmic approaches
to maximizing (1) Once we have such an index, we
need a method for searching through the network
to maximize the fit. We next go over various
algorithms, that search different criteria for a
fit. 3) Generalized cluster analysis In
addition to maximizing a group function such as
(1) we can use the relational distance directly,
and look for clusters in the data. We next go
over two different styles of cluster analysis
41
Segregation Index (Freeman, L. C. 1972.
"Segregation in Social Networks." Sociological
Methods and Research 6411-30.)
Freeman asked how we could identify segregation
in a social network. Theoretically, he argues,
if a given attribute (group label) does not
matter for social relations, then relations
should be distributed randomly with respect to
the attribute. Thus, the difference between the
number of cross-group ties expected by chance and
the number observed measures segregation.
42
Segregation Index
Consider the (hypothetical) network below. There
are two attributes in this network people with
Blue eyes and Brown eyes and people who are
square or not (they must be hip).
43
Segregation Index
Mixing Matrix
44
Segregation Index
To calculate the number of expected, use the
standard formula for a contingency table Row
marginal column Marginal / Total
observed
Expected
Blue Brown Blue 6 17 23 Brown
17 16 33 23
33 56
Blue Brown Blue 9.45
13.55 23 Brown 13.55 19.45 33
23 33 56
In matrix form
E(X) RC/T
45
Segregation Index
observed
Expected
Blue Brown Blue 6 17 23 Brown
17 16 33 23
33 56
Blue Brown Blue 9.45
13.55 23 Brown 13.55 19.45 33
23 33 56
E(X) (13.5513.55) X (1717) Seg
27.1 - 34 / 27.1 -6.9 / 27.1 -0.25
46
Segregation Index
Observed
Expected
Hip Square Hip 20
3 23 Square 3 30 33 23 33
56
Hip Square Hip 9.45
13.55 23 Square 13.55 19.45 33
23 33 56
E(X) (13.5513.55) X (33) Seg
27.1 - 6 / 27.1 21.1 / 27.1 0.78
47
Segregation Index
In SAS, you need to create a mixing matrix to
calculate the segregation index. Mixmat.mod will
do this. It does so using an indicator matrix.
Blue 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0
1 0 1 0 1 0 1 0 1 0 1 0 1
Square 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0
1 0 1 0 1 0 1 0 1
48
Segregation Index
You get the mixing matrix by pre multiplying the
adjacency matrix by the transpose of the
indicator matrix and post multiplying by the
indicator matrix
M IAI
I A I
M
(k x k)
(k x n)(n x n)(n x k)
49
Segregation Index
In practice, how does the segregation index work?
This is a plot of the extent of race segregation
in a high school, by the racial heterogeneity of
the high school
50
Segregation Index
One problem with the segregation index is that it
is not margin free. That is, if you were to
change the distribution of the category of
interest (say race) by a constant but not the
core association between race and friendship
choice, you can get a different segregation
level. One antidote to this problem is to use
odds ratios. In this case, and odds ratio tells
us the relative likelihood that two people in the
same category will choose each other as friends.
51
Odds Ratios
The odds ratio tells us how much more likely
people in the same group are to nominate each
other. You calculate the odds ratio based on the
number of ties in a group and their relative
size, based on the following table
Member of Same Group Different
Group Friends A
B Not Friends C D
OR AD/ BC
52
Odds Ratios
There are 6 hip people and 9 square people in
this network. This implies that there are the
following number of possible ties in the network
Observed
Hip Square Hip 20
3 23 Square 3 30 33 23 33
56
Hip Square Hip 30
54 Square 54 72 Diagonal
ni(ni-1) off diagonal ni2
Group Same
Dif Yes 50 6 Friend
No 52 102
OR (50)102 / 52(6) 16.35
53
Segregation index compared to the odds ratio
Friendship Segregation Index
r.95
Log(Same-Sex Odds Ratio)
54
Algorithms that maximize this type of fit
(density / tie ratio based)

Factions in UCI-NET
Multiple options for the exact factor maximized.
I recommend either the density or the correlation
function, and I would calculate the distance in
each case.
Franks KliqueFinder (the AJS paper we just read)
I have it, but Ive yet to be able to get it to
work. The folks at UCI-NET are planning on
incorporating it into the next version.
Fershtmans SMI
Never seen it programmed, though I use some of
the ideas in the CROWDS algorithm discussed below

55
Factions in UCI-NET
56
Factions in UCI-NET
57
Factions in UCI-NET
58
Factions in UCI-NET
Reduced BlockMatrix 1 2 3 4 5 6
-- -- -- -- -- -- 1 59 1 2 14 1 0 2 1
54 0 1 12 2 3 1 2 55 0 1 12 4 9 1
1 51 0 0 5 0 12 2 0 62 1 6 1 0 9
2 0 64
Fit perfectly
59
Cluster analysis
In addition to tools like FACTIONS, we can use
the distance information contained in a network
to cluster observations that are close to each
other. In general, cluster analysis is a set of
techniques that allows you to identify
collections of objects that are simmilar to each
other in some degree. A very good reference is
the SAS/STAT manual section called, Introduction
to clustering procedures. (http//wks.uts.ohio-s
tate.edu/sasdoc/8/sashtml/stat/chap8/index.htm)
(See also Wasserman and Faust, though the
coverage is spotty). We are going to start with
the general problem of hierarchical clustering
applied to any set of analytic objects based on
similarity, and then transfer that to clustering
nodes in a network.
60
Cluster analysis
Imagine a set of objects (say people) arrayed in
a two dimensional space. You want to identify
groups of people based on their position in that
space. How do you do it?
How Smart you are
How Cool you are
61
Cluster analysis
Start by choosing a pair of people who are very
close to each other (such as 15 16) and now
treat that pair as one point, with a value equal
to the mean position of the two nodes.
x
62
Cluster analysis
Now repeat that process for as long as possible.
63
Cluster analysis
This process is captured in the cluster tree
(called a dendrogram)
64
Cluster analysis

As with the network cluster algorithms, there are
many options for clustering. The three that I
use most are
Wards Minimum Variance -- the one I use almost
95 of the time
Average Distance -- the one used in the example
above
Median Distance -- very similar
Again, the SAS manual is the best single place
Ive found for information on each of these
techniques.
Some things to keep in mind
Units matter. The example above draws together
pairs horizontally because the range there is
smaller. Get around this by standardizing your
data.
This is an inductive technique. You can find
clusters in a purely random distribution of
points. Consider the following example.

65
Cluster analysis
The data in this scatter plot are produced using
this code
data random do i1 to 20 xrannor(0)
yrannor(0) output end run
66
Cluster analysis
Resulting dendrogram
67
Cluster analysis
Resulting cluster solution
68
Cluster analysis
Cluster analysis works by building a distance
matrix between each pair of points. In the
example above, it used the Euclidean distance
which in two dimensions is simply the physical
distance between the points in a plot. Can
work on any number of dimensions. To use
cluster analysis in a network, we base the
distance on the path-distance between pairs of
people in the network. Consider again the
blue-eye hip example
69
Cluster analysis
Distance Matrix 0 1 3 2 3 3 4 3 3 2 3 2 2 1 1 1 0
2 2 2 3 3 3 2 1 2 2 1 2 1 3 2 0 3 2 4 3 3 2 1 1 1
2 2 3 2 2 3 0 1 1 2 1 1 2 3 3 3 2 1 3 2 2 1 0 2 1
1 1 1 2 2 3 3 2 3 3 4 1 2 0 1 1 2 3 4 4 4 3 2 4 3
3 2 1 1 0 2 2 2 3 3 4 4 3 3 3 3 1 1 1 2 0 1 2 3 3
4 3 2 3 2 2 1 1 2 2 1 0 1 2 2 3 3 2 2 1 1 2 1 3 2
2 1 0 1 1 2 2 2 3 2 1 3 2 4 3 3 2 1 0 1 2 2 3 2 2
1 3 2 4 3 3 2 1 1 0 1 1 2 2 1 2 3 3 4 4 4 3 2 2 1
0 2 2 1 2 2 2 3 3 4 3 3 2 2 1 2 0 1 1 1 3 1 2 2 3
2 2 2 3 2 2 1 0
70
Cluster analysis
The distance matrix implies a space that nodes
are embedded within. Using something like MDS,
we can represent the space implied by the
distance matrix in two dimensions. This is the
image of the network you would get if you did
that.
71
Cluster analysis
When you use variables, the cluster analysis
program generates a distance matrix. We can,
instead use the network distance matrix directly.
If we do that with this example network, we get
the following
72
Cluster analysis
73
Cluster analysis
In SAS you use two commands to get a cluster
analysis. The first does the hierarchical
clustering. The second analyzes the cluster
output to create the tree. Example 1. Using
variables to define the space (like income and
musical taste)
proc cluster dataa methodave outclustd
std var x y id node run proc tree
dataclustd ncl5 outcluvars run
74
Cluster analysis
proc iml include 'c\moody\sas\programs\modules
\reach.mod' / blue eye example /
mat2j(15,15,0) mat21,2 14 151 / lines
cut here / mat215,1 14 2 41
dmatreach(mat2) mattrib dmat format1.0
print dmat id1nrow(dmat) idid
ddatiddmat create ddat from ddat /
creates the dataset / append from
ddat quit data ddat (typedist) / tells
SAS it is a distance / set ddat /
matrix / run
Example 2. Using a pre-defined distance matrix
to define the space (as in a social network). You
first create the distance matrix (in IML), then
use it in the cluster program.
75
Cluster analysis
Example 2. Using a pre-defined distance matrix
to define the space (as in a social
network). Once you have it, the cluster program
is just the same.
proc cluster dataddat methodward
outclustd id col1 run proc tree dataclustd
ncl3 outnetclust copy col1 run proc freq
datanetclust tables cluster run proc print
datanetclust var col1 cluster run
76
The CROWDS algorithm combines the density
approach above with an initial cluster analysis
and a routine for determining how many clusters
are in the network. It does so by using the
Segregation index and all of the information from
the cluster hierarchy, combining two groups only
if it improves the segregation fit for both
groups.
77
The one other program you should know about is
NEGOPY. Negopy is a program that combines
elements of the density based approach and the
graph theoretic approach to find groups and
positions. Like CROWDS, NEGOPY assigns people
both to groups and to outsider or between
group positions. It also tells you how many
groups are in the network. Its a DOS based
program, and a little clunky to use, but
NEGWRITE.MOD will translate your data into NEGOPY
format if you want to use it. There are many
other approaches. If youre interested in some
specifically designed for very large networks
(10,000 nodes), Ive developed something I call
Recursive Neighborhood Means that seems to work
fairly well.

Write a Comment

User Comments (0)