Title: Introduction%20to
1Introduction to Social Network
Analysis Columbia University April
2007 James Moody Duke University
2Introduction
- Introduction
- Social Network data
- Basic data elements
- Network data sources
- Local (ego) Network Analysis
- Introduction
- Network Composition
- Network Structure
- Local Network Models
- Complete Network Analysis
- Exploratory Analysis
- Network Connections
- Network Macro Structure
- Stochastic Network Analyses
- Social Network Software Review
- Work through examples
3Introduction
We live in a connected world
To speak of social life is to speak of the
association between people their associating in
work and in play, in love and in war, to trade or
to worship, to help or to hinder. It is in the
social relations men establish that their
interests find expression and their desires
become realized. Peter M. Blau Exchange and
Power in Social Life, 1964
"If we ever get to the point of charting a whole
city or a whole nation, we would have a picture
of a vast solar system of intangible structures,
powerfully influencing conduct, as gravitation
does in space. Such an invisible structure
underlies society and has its influence in
determining the conduct of society as a
whole." J.L. Moreno, New York Times, April 13,
1933
These patterns of connection form a social space,
that can be seen in multiple contexts
4Introduction
Source Linton Freeman See you in the funny
pages Connections, 23, 2000, 32-42.
5Introduction
High Schools as Networks
6(No Transcript)
7(No Transcript)
8Introduction
And yet, standard social science analysis methods
do not take this space into account. For the
last thirty years, empirical social research has
been dominated by the sample survey. But as
usually practiced, , the survey is a
sociological meat grinder, tearing the individual
from his social context and guaranteeing that
nobody in the study interacts with anyone else in
it. Allen Barton, 1968 (Quoted in Freeman
2004) Moreover, the complexity of the relational
world makes it impossible to identify social
connectivity using only our intuition. Social
Network Analysis (SNA) provides a set of tools to
empirically extend our theoretical intuition of
the patterns that compose social structure.
9Introduction
Why do Networks Matter?
Local vision
10Introduction
Why do Networks Matter?
Local vision
11Introduction
- Social network analysis is
- a set of relational methods for systematically
understanding and identifying connections among
actors. SNA - is motivated by a structural intuition based on
ties linking social actors - is grounded in systematic empirical data
- draws heavily on graphic imagery
- relies on the use of mathematical and/or
computational models. - Social Network Analysis embodies a range of
theories relating types of observable social
spaces and their relation to individual and group
behavior.
12Introduction Key Questions
- Social Network analysis lets us answer questions
about social interdependence. These include - Networks as Variables approaches
- Are kids with smoking peers more likely to smoke
themselves? - Do unpopular kids get in more trouble than
popular kids? - Are people with many weak ties more likely to
find a job? - Do central actors control resources?
- Networks as Structures approaches
- What generates hierarchy in social relations?
- What network patterns spread diseases most
quickly? - How do role sets evolve out of consistent
relational activity? - We dont want to draw this line too sharply
emergent role positions can affect individual
outcomes in a variable way, and variable
approaches constrain relational activity.
131. Introduction and Background
- Why networks matter
- Intuitive information travels through contacts
between actors, which can reflect a power
distribution or influence attitudes and
behaviors. Our understanding of social life
improves if we account for this social space. - Less intuitive patterns of inter-actor contact
can have effects on the spread of goods or
power dynamics that could not be seen focusing
only on individual behavior.
14Social Network Data
The unit of interest in a network are the
combined sets of actors and their relations. We
represent actors with points and relations with
lines. Actors are referred to variously
as Nodes, vertices, actors or
points Relations are referred to variously
as Edges, Arcs, Lines, Ties
Example
b
d
a
c
e
15Social Network Data Basic Data Elements
- Social Network data consists of two linked
classes of data - Nodes Information on the individuals (actors,
nodes, points, vertices) - Network nodes are most often people, but can be
any other unit capable of being linked to another
(schools, countries, organizations,
personalities, etc.) - The information about nodes is what we usually
collect in standard social science research
demographics, attitudes, behaviors, etc. - Often includes dynamic information about when the
node is active - b) Edges Information on the relations among
individuals (lines, edges, arcs) - Records a connection between the nodes in the
network - Can be valued, directed (arcs), binary or
undirected (edges) - One-mode (direct ties between actors) or two-mode
(actors share membership in an organization) - Includes the times when the relation is active
- Graph theory notation G(V,E)
16Social Network Data Basic Data Elements
In general, a relation can be (1) Binary or
Valued (2) Directed or Undirected
The social process of interest will often
determine what form your data take. Almost all
of the techniques and measures we describe can be
generalized across data format.
17Social Network Data Basic Data Elements
In general, a relation can be (1) Binary or
Valued (2) Directed or Undirected
b
d
a
c
e
Directed, Multiplex categorical edges
The social process of interest will often
determine what form your data take. Almost all
of the techniques and measures we describe can be
generalized across data format.
18Social Network Data Basic Data Elements Levels
of analysis
Global-Net
19Social Network Data Basic Data Elements Levels
of analysis
We can examine networks across multiple levels
1) Ego-network - Have data on a respondent (ego)
and the people they are connected to (alters).
Example 1985 GSS module - May include estimates
of connections among alters
2) Partial network - Ego networks plus some
amount of tracing to reach contacts of contacts
- Something less than full account of
connections among all pairs of actors in the
relevant population - Example CDC Contact
tracing data for STDs
20Social Network Data Basic Data Elements Levels
of analysis
We can examine networks across multiple levels
- 3) Complete or Global data
- - Data on all actors within a particular
(relevant) boundary - - Never exactly complete (due to missing data),
but boundaries are set - Example Coauthorship data among all writers in
the social sciences, friendships among all
students in a classroom
21Social Network Data Graph Layout
A good network drawing allows viewers to come
away from the image with an almost immediate
intuition about the underlying structure of the
network being displayed. However, because there
are multiple ways to display the same
information, and standards for doing so are few,
the information content of a network display can
be quite variable.
Consider the 4 graphs drawn at right. After
asking yourself what intuition you gain from each
graph, click on the screen.
Now trace the actual pattern of ties. You will
see that these 4 graphs are exactly the same.
22Social Network Data Graph Layout
Network visualization helps build intuition, but
you have to keep the drawing algorithm in mind.
Here we show the same graphs with two different
techniques
Spring embedder layouts
Tree-Based layouts
(Fair - poor)
(good)
Most effective for very sparse, regular graphs.
Very useful when relations are strongly directed,
such as organization charts or internet
connections.
Most effective with graphs that have a strong
community structure (clustering, etc). Provides
a very clear correspondence between social
distance and plotted distance
Two images of the same network
23Social Network Data Graph Layout
Another example
Spring embedder layouts
Tree-Based layouts
(poor)
(good)
Two layouts of the same network
24Social Network Data
Basic Data Structures
In general, graphs are cumbersome to work with
analytically, though there is a great deal of
good work to be done on using visualization to
build network intuition. I recommend using
layouts that optimize on the feature you are most
interested in. The two I use most are a
hierarchical layout or a force-directed layout
are best. Well see some examples of best
practice after getting a little more familier
with data structure.
25Social Network Data
Basic Data Structures
From pictures to matrices
Undirected, binary
Directed, binary
26Social Network Data
Basic Data Structures
From matrices to lists
Arc List
Adjacency List
a b b a b c c b c d c e d c d e e c e d
27Social Network Data Basic Data Elements Modes
Social network data are substantively divided by
the number of modes in the data. 1-mode data
represents edges based on direct contact between
actors in the network. All the nodes are of the
same type (people, organization, ideas, etc).
Examples Communication, friendship, giving
orders, sending email. There are no constraints
on connections between classes of nodes.
1-mode data are usually singly reported (each
person reports on their friends), but you can use
multiple-informant data, which is more common in
child development research (Cairns and Cairns).
28Social Network Data Basic Data Elements Modes
Social network data are substantively divided by
the number of modes in the data. 2-mode data
represents nodes from two separate classes, where
all relations cross classes. Examples People
as members of groups People as authors on
papers Words used often by people Events in the
life history of people The two modes of the data
represent a duality you can project the data as
people connected to people through joint
membership in a group, or groups to each other
through common membership N-mode data
generalizes the constraint on ties between
classes to N groups
29Social Network Data Basic Data Elements Modes
Breiger 1974 - Duality of Persons and Groups
Argument
Metaphor people intersect through their
associations, which defines (in part) their
individuality.
The Duality argument is that relations among
groups imply relations among individuals
30Social Network Data Basic Data Elements Modes
Bipartite networks imply a constraint on the
mixing, such that ties only cross classes. Here
we see a tie connecting each woman with the party
she attended (Davis data)
31Social Network Data Basic Data Elements Modes
Bipartite networks imply a constraint on the
mixing, such that ties only cross classes. Here
we see a tie connecting each woman with the party
she attended (Davis data)
32Social Network Data Basic Data Elements Modes
By projecting the data, one can look at the
shared between people or the common memberships
in groups this is the person-to-person
projection of the 2-mode data.
33Social Network Data Basic Data Elements Modes
By projecting the data, one can look at the
shared between people or the common memberships
in groups this is the group-to-group projection
of the 2-mode data.
34Social Network Data Basic Data Elements Modes
Working with two-mode data
A person-to-group adjacency matrix is
rectangular, with persons down rows and groups
across columns
Each column is a group, each row a person, and
the cell 1 if the person in that row belongs to
that group. You can tell how many groups two
people both belong to by comparing the rows
Identify every place that both rows 1, sum
them, and you have the overlap.
1 2 3 4 5 A 0 0 0 0 1 B 1 0 0 0 0 C 1 1 0 0 0 D
0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0
A
35Social Network Data Basic Data Elements Modes
Working with two-mode data
Compare persons A and F
Person A is in 1 group, Person F is in two
groups, and they are in no groups together.
Or persons D and F
Person D is in 4 groups, Person F is in two
groups, and they are in 2 groups together.
36Social Network Data Basic Data Elements Modes
Working with two-mode data
Similarly for Groups
Group 1 has 2 members, group 2 has 2 members and
they overlap by 1 members (C).
37Social Network Data Basic Data Elements Modes
Working with two-mode data
In general, you can get the overlap for any pair
of groups / persons by summing the multiplied
elements of the corresponding rows/columns of the
persons-to-groups adjacency matrix. That is
Groups-to-Groups
Persons-to-Persons
38Social Network Data Basic Data Elements Modes
Working with two-mode data
One can get either projection easily with a
little matrix multiplication. First define AT as
the transpose of A (simply reverse the rows and
columns). If A is of size P x G, then AT will be
of size G x P.
39Social Network Data Basic Data Elements Modes
1 2 3 4 5 A 0 0 0 0 1 B 1 0 0 0 0 C 1 1 0 0 0 D
0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0
A B C D E F 1 0 1 1 0 0 0 2 0 0 1 1 0 0 3 0 0 0
1 1 1 4 0 0 0 1 0 1 5 1 0 0 1 0 0
P A(AT) G AT(A)
A
AT
(5x6)
(6x5)
40Social Network Data Basic Data Elements Modes
Theoretically, these two equations define what
Breiger means by duality With respect to the
membership network,, persons who are actors in
one picture (the P matrix) are with equal
legitimacy viewed as connections in the dual
picture (the G matrix), and conversely for
groups. (p.87)
The resulting network 1) Is always
symmetric 2) the diagonal tells you how many
groups (persons) a person (group) belongs to
(has)
In practice, most network software (UCINET,
PAJEK) will do all of these operations. It is
also simple to do the matrix multiplication in
programs like SAS, SPSS, or R.
41Social Network Data Network Data Sources
Existing data sources
- Existing Sources of Social Network Data
- There are lots of network data archived. Check
INSNA for a listing. The PAJEK data page
includes a number of exemplars for large-scale
networks. - 2-Mode Data
- One can construct networks from many different
data sources if you want to work with 2-mode
data. Any list can be so transformed. - Director interlocks
- Protest event participation
- Authors on papers
- Words in documents
- 1-Mode Data
- Local Network data
- Fairly common, because it is easy to collect from
sample surveys. - GSS, NHSL, Urban Inequality Surveys, etc.
- Pay attention to the question asked
- Key features are (a) number of people named and
(b) whether alters are able to nominate each
other.
42Social Network Data Network Data Sources
Existing data sources
- Existing Sources of Social Network Data
- 1-Mode Data
- Partial network data
- Much less common, because cost goes up
significantly once you start tracing to contacts.
- Snowball data start with focal nodes and trace
to contacts - CDC style data on sexual contact tracing
- Limited snowball samples
- Colorado Springs drug users data
- Geneology data
- Small-world network samples
- Limited Boundary data select data within a
limited bound - Cross-national trade data
- Friendships within a classroom
- Family support ties
-
43Social Network Data Network Data Sources
Existing data sources
- Existing Sources of Social Network Data
- 1-Mode Data
- Complete network data
- Significantly less common and never perfect.
- Start by defining a theoretically relevant
boundary - Then identify all relations among nodes within
that boundary - Co-sponsorship patterns among legislators
- Friendships within strongly bounded settings
(sororities, schools) - Examples
- Add Health on adolescent friendships
- Hallinan data on within-school friendships
- McFarlands data on verbal interaction
- Electronic data on citations or coauthorship (see
Pajek data page) - See INSNA home page for many small-scale networks
44Social Network Data Network Data Sources
Collecting network data
Boundary Specification Problem Network methods
describe positions in relevant social fields,
where flows of particular goods are of interest.
As such, boundaries are a fundamentally
theoretical question about what you think matters
in the setting of interest. See Marsden (19xx)
for a good review of the boundary specification
problem In general, there are usually relevant
social foci that bound the relevant social field.
We expect that social relations will be very
clumpy. Consider the example of friendship ties
within and between a high-school and a Jr. high
45Social Network Data Network Data Sources
Collecting network data
- Network data collection can be time consuming. It
is better (I think) to have breadth over depth.
Having detailed information on lt50 of the sample
will make it very difficult to draw conclusions
about the general network structure. - Question format
- If you ask people to recall names (an open list
format), fatigue will result in under-reporting - If you ask people to check off names from a full
list, you can often get over-reporting - c) It is common to limit people to a small number
if nominations (5). This will bias network
measures, but is sometimes the best choice to
avoid fatigue. - d) Concrete relational indicators are best (who
did you talk to?) over attitudes that are harder
to define (who do you like?)
46Social Network Data Network Data Sources
Collecting network data
Boundary Specification Problem
While students were given the option to name
friends in the other school, they rarely do. As
such, the school likely serves as a strong
substantive boundary
47Social Network Data Network Data Sources
Collecting network data
- Local Network data
- When using a survey, common to use an
ego-network module. - First part Name Generator question to elicit
a list of names - Second part Working through the list of names to
get information about each person named - Third part asking about relations among each
person named. -
GSS Name Generator From time to time, most
people discuss important matters with other
people. Looking back over the last six months --
who are the people with whom you discussed
matters important to you? Just tell me their
first names or initials.
- Why this question?
- Only time for one question
- Normative pressure and influence likely travels
through strong ties - Similar to best friend or other strong tie
generators - Note there are significant substantive problems
with this name generator
48Social Network Data Network Data Sources
Collecting network data
- Electronic Small World name generator
-
49Social Network Data Network Data Sources
Collecting network data
Local Network data The second part usually asks
a series of questions about each person GSS
Example Is (NAME) Asian, Black, Hispanic,
White or something else?
ESWP example
Will generate N x (number of attributes)
questions to the survey
50Social Network Data Network Data Sources
Collecting network data
Local Network data The third part usually asks
about relations among the alters. Do this by
looping over all possible combinations. If you
are asking about a symmetric relation, then you
can limit your questions to the n(n-1)/2 cells of
one triangle of the adjacency matrix
GSS Please think about the relations between the
people you just mentioned. Some of them may be
total strangers in the sense that they wouldn't
recognize each other if they bumped into each
other on the street. Others may be especially
close, as close or closer to each other as they
are to you. First, think about NAME 1 and NAME 2.
A. Are NAME 1 and NAME 2 total strangers? B. ARe
they especially close? PROBE As close or closer
to eahc other as they are to you?
51Social Network Data Network Data Sources
Collecting network data
Local Network data The third part usually asks
about relations among the alters. Do this by
looping over all possible combinations. If you
are asking about a symmetric relation, then you
can limit your questions to the n(n-1)/2 cells of
one triangle of the adjacency matrix
52Social Network Data Network Data Sources
Collecting network data
- Snowball Samples
- Snowball samples work much the same as
ego-network modules, and if time allows I
recommend asking at least some of the basic
ego-network questions, even if you plan to sample
(some of) the people your respondent names. - Start with a name generator, then any demographic
or relational questions. - Have a sample strategy
- Random Walk designs (Klovdahl)
- Strong tie designs
- All names designs
- Get contact information from the people named
- Snowball samples are very effective at providing
network context around focal nodes. New work on
Respondent Driven Sampling (RDS) makes it
possible to get good representation even with
initially biased seed nodes.
http//www.respondentdrivensampling.org/reports/RD
Srefs.htm
53Social Network Data Network Data Sources
Collecting network data
Snowball Samples
54Social Network Data Network Data Sources
Collecting network data
- Complete Network data
- Data collection is concerned with all relations
within a specified boundary. - Requires sampling every actor in the population
of interest (all kids in the class, all nations
in the alliance system, etc.) - The network survey itself can be much shorter,
because you are getting information from each
person (so ego does not report on alters). - Two general formats
- Recall surveys (Name all of your best friends)
- Check-list formats Give people a list of names,
have them check off those with whom they have
relations.
55Social Network Data Network Data Sources
Collecting network data
- Complete network surveys require a process that
lets you link answers to respondents. - You cannot have anonymous surveys.
- Recall
- Need Id numbers a roster to link, or hand-code
names to find matches - Checklists
- Need a roster for people to check through
56Social Network Data Network Data Sources
Collecting network data
- Complete network surveys require a process that
lets you link answers to respondents. - Typically you have a number of data tradeoffs
- Limited number of responses.
- Eases survey construction coding, lowers
density degree, which affects nearly every
other system-level measure. - Some evidence that people try to fill all of the
slots. - Name check-off roster (names down a row or on
screen, relations as check-boxes). - Easy in small settings or CADI, but encourages
over-response. - The Amy Willis Problem.
- Open recall list.
- Very difficult cognitively, requires an extra
name-matching step in analysis. - Think carefully about what you want to learn from
your survey items.
57Social Network Data Network Data Sources Missing
Data
Whatever method is used, data will always be
incomplete. What are the implications for
analysis?
Example 1. People can name friends out of
sample, but no way to match them (Add Health)
Out
Out
Out
Out
Out
Out
M
Ego
M
Ego
M
M
M
M
M
M
If the true network looks like this
you cannot distinguish it from this
58Social Network Data Network Data Sources Missing
Data
Example 2 Node population 2-step
neighborhood of Actor X Relational population
Any connection among all nodes
F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3
Full (0)
Full
Full (0)
F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7
2.8 3.1 3.2 3.3
Full (0)
Full
Full
F
1-step
UK
Full
Full
F (0)
2-step
3-step
F (0)
Full (0)
Unknown
UK
59Social Network Data Network Data Sources Missing
Data
Example 3 Node population 2-step neighborhood
of Actor X Relational population Trace, plus
All connections among 1-step contacts
F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3
Full (0)
Full
Full (0)
F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7
2.8 3.1 3.2 3.3
Full (0)
Full
Full
F
UK
Full
Unknown
F (0)
F (0)
Full (0)
Unknown
UK
60Social Network Data Network Data Sources Missing
Data
Example 4. Node population 2-step neighborhood
of Actor X Relational population Only tracing
contacts
F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3
Full (0)
Full
Full (0)
F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7
2.8 3.1 3.2 3.3
Full (0)
Unknown
Full
F
1-step
UK
Full
Unknown
F (0)
2-step
3-step
F (0)
Full (0)
Unknown
UK
61Social Network Data Network Data Sources Missing
Data
Example 5 Node population 2-step neighborhood
from 3 focal actors Relational population All
relations among actors
Focal
1-Step
2-Step
3-Step
Focal
Full
Full (0)
Full (0)
Full
Full (0)
Full
Full
Full
1-Step
UK
Full
Full
Full (0)
2-Step
Full (0)
3-Step
Full (0)
Unknown
UK
62Social Network Data Network Data Sources Missing
Data
Example 6. Node population 1-step neighborhood
from 3 focal actors Relational population Only
relations from focal nodes
Focal
1-Step
2-Step
3-Step
Focal
Full
Full (0)
Full (0)
Full
Full (0)
Unknown
Unknown
Full
1-Step
UK
Unknown
Unknown
Full (0)
2-Step
Full (0)
3-Step
Full (0)
Unknown
UK
63Social Network Data Network Data Sources Missing
Data
Summary Data collection design missing data
affect the information at hand to draw
conclusions about the system. Everything we do
from now on is built on some manipulation of the
observed adjacency matrix so we want to
understand what are valid and invalid conclusions
due to systematic distortions on the
data. Statistical modeling tools hold promise.
We can build models of networks that account for
missing data we are able to fix the
structural zeros in or models by treating them as
given. This then lets us infer to the world of
all graphs with that same missing data structure.
These models are very new, and not widely
available yet.
64Local Network Analysis Introduction
- Local network analysis uses data from a simple
ego-network survey. These might include
information on relations among egos contacts,
but often not. Questions include
Population Mixing The extent to which one type
of person is tied to another type of person (race
by race, etc.) Local Network Composition Peer
behavior Cultural milieu Opportunities or
Resources in the network Social Support Local
Network Structure Network Size Density Holes
Constraint Concurrency Dyadic behavior Frequency
of contact Interaction content Specific exchange
behaviors Dyadic Similarity
65Local Network Analysis Introduction
- Advantages
- Cost data are easy to collect and can be sampled
- Methods are relatively simple extensions of
common variable-based methods social scientists
are already familiar with - Provides information on the local network
context, which is often the primary substantive
interest. - Can be used to describe general features of the
global network context - Population mixing, concurrency, exchange
frequency, etc. - Disadvantages
- Treats each local network as independent, which
is false. - The poor performance of number of partners for
predicting STD spread is a clear example. - Impossible to account for how position in a
larger context affects local network
characteristics. popular with who - If structure matters, ego-networks are strongly
constrained to limit the information you can get
on overall structure
66Local Network Analysis Introduction
Local
67Local Network Analysis Introduction
Global
68Local Network Analysis Network Composition
Perhaps the simplest network question is what
types of alters does ego interact with?
Network composition refers to the distribution of
types of people in your network.
- Networks tend to be more homogeneous than the
population. Using the GSS, Marsden reports
heterogeneity in Age, Education, Race and Gender.
He finds that - Age distribution is fairly wide, almost evenly
distributed, though lower than the population at
large - Homogenous by education (30 differ by less than
a year, on average) - Very homogeneous with respect to race (96 are
single race) - Heterogeneous with respect to gender
69Local Network Analysis Network Composition
Claude Fischers book To Dwell Among Friends is
a classic study of urbanism that makes good use
of local network data.
Age heterogeneity varies by egos age and across
urban settings.
70Local Network Analysis Network Composition
Claude Fischers book To Dwell Among Friends is
a classic study of urbanism that makes good use
of local network data.
Marital composition similarly varies across
respondents and settings
71Local Network Analysis Network Composition
Calculating network composition using GSS style
data.
Generally you have a separate variable for each
alter characteristic, and you can construct items
by summing over the relevant variables. You
would, for example, have variables on age of each
alter such as Age_alt1 age_alt2 age_alt3
age_alt4 age_alt5 15 35 20 12 . You
get the mean age, then, with a statement such
as meanagemean(Age_alt1, age_alt2, age_alt3,
age_alt4, age_alt5) Be sure you know how the
program you use (SAS, SPSS) deals with missing
data.
72Local Network Analysis Network Composition
Calculating local network information from global
network data
- We often want to construct local-level measures
from global level data. This involves a number
of steps opens more opportunities than
GSS-style data - 1) Define the local neighborhood
- Distance (1-step, 2-steps, what?)
- Direction of tie
- Sent, Received, or both?
- 2) Pull the relevant alters
- 3) Match the alters to the variables of interest
- Once you decide on a type of tie, you need to get
the information of interest in a form similar to
that in the example above. - A number of programs do this for you
automatically (SPAN, R, etc.)
73Local Network Analysis Network Composition
An example network All senior males from a small
(n350) public HS.
SPAN will do this for you
74Local Network Analysis Network Composition
- Common composition measures
- Level measures
- Mean of a given attribute (average income of
alters) - Proportion with a particular attribute
(proportion who smoke) - Counts (number of peers who have had sex)
- Dispersion measures
- Heterogeneity index (Racial heterogeneity)
- Index of dissimilarity
- Standard Deviation
- Absolute value of the differences
- Variable range of values
- Composition measures for multiple variables
simultaneously - Average correlation across all alters
- Euclidean / Mahalanobis distance measures
75Local Network Analysis Network Mixing
A common interest in network research is
identifying how likely persons of one category
are to interact with people of another
category. Examples Race mixing how likely are
people of one race to interact with people of
another? Sexual activity mixing Are people with
many partners likely to associate with each
other? Neighborhood / location mixing Are people
likely to name friends from the same
neighborhood. These questions can be answered by
cross classifying the category of the nominator
with the category of the nominated in a mixing
matrix.
76Local Network Analysis Network Mixing
Race mixing in one of the Add Health schools
77Local Network Analysis Network Mixing
White Black Hispan Asian
Mix/Other White 1099 128 53
0 231 Black 97 10218
1032 0 539 Hispanic 54
961 104 1 91 Asian
0 0 0 0 0 Mix/Other
191 560 66 0 106
78Local Network Analysis Network Mixing
- Working with mixing matrices
- Group segregation index (Freeman 1972)
- Associations between rows and columns (valued
relations) - Assortative mixing
- Correlations or Q
- Log-linear models
- Assessing chance levels depends on the data
available. If you have full network data you can
look at density between groups, without you can
only focus on the sheer volume of ties (without
information on the size of the target groups)
79Local Network Analysis Network Structure
- While network structure data are limited, there
are a number of features that can be of interest,
assuming you have data on the relations among
egos contacts. - Basic arguments
- structural amplification that some feature of
the arrangement of ties amplifies any peer effect
of network composition (see Haynies paper) - Network range effects that being connected to
a diverse set of alters -- who are not connected
to each other provides profitable returns.
Granovetters Strength of Weak Ties, Burts
Structural Holes - Familiar to students of social theory as the
Tertius Gaudens argument from Simmel - In both cases, we use the pattern of ties
surrounding ego to characterize the local
structure. We start with volume measures, then
move on to more complex pattern measures.
80Local Network Analysis Network Structure volume
Network Size
X1985 2.9 X2004 2.1
From time to time, most people discuss important
matters with other people. Looking back over the
last six monthswho are the people with whom you
discussed matters important to you? Just tell me
their first names or initials. IF LESS THAN 5
NAMES MENTIONED, PROBE Anyone else?
81Local Network Analysis Network Structure volume
Network Size by
Age Drops with age at an increasing rate.
Elderly have few close ties. Education Increase
s with education. College degree 1.8 times
larger Sex (Female) No gender differences on
network size. Race African Americans networks
are smaller (2.25) than White Networks (3.1).
82Local Network Analysis Network Structure volume
What does Fischer have to say about the size of
local nets (by context)?
83Local Network Analysis Network Structure volume
Density is the average value of the relation
among all pairs of ties. T /
((NN-1)/2) Density is usually calculated over
the alters in the network.
2
1
R
3
4
5
D 5 / ((54)/2) 5 / 10 0.5
84Local Network Analysis Network Structure volume
What does Fischer have to say about the density
of local nets (by context)?
85Local Network Analysis Network Structure volume
GSS Density
86Local Network Analysis Network Structure volume
- In general, dense networks should be more
cohesive and we would expect that goods will
flow through the network more efficiently - Social support peer influence, for example,
should be stronger in dense networks - Density is a volume measure, however, and can
mask significant structural differences
These two networks have the same density but very
different structures. Most network analysis
programs will calculate ego-network density
directly.
87Local Network Analysis Network Structure Weak
Ties Structural Holes
The Strength of Weak Ties In a classic
article, Granovetter (1972) argues that for many
purposes (such as getting a job), the most useful
network contacts are through weak ties. This
is because weak ties connect you to a more
diverse set of alters, increasing the range of
your network. Your strong ties tend to be tied
to each other, making them redundant for the
purposes of bringing information. Essentially
this argument works on a spurious relation. The
key value of weak ties is not in the weak
affective bond, but in the structural location of
the ties. We can measure this directly, and Ron
Burt provides a series of measures for doing so.
88Local Network Analysis Network Structure Weak
Ties Structural Holes
Maximum Efficiency
Decreasing Efficiency
Number of Non-Redundant Contacts
Increasing Efficiency
Minimum Efficiency
Number of Contacts
89Local Network Analysis Network Structure Weak
Ties Structural Holes
Effective Size
Conceptually the effective size is the number of
people ego is connected to, minus the redundancy
in the network, that is, it reduces to the
non-redundant elements of the network. Effective
size Size - Redundancy
Where j indexes all of the people that ego i has
contact with, and q is every third person other
than i or j. The quantity (piqmjq) inside the
brackets is the level of redundancy between ego
and a particular alter, j.
90Local Network Analysis Network Structure Weak
Ties Structural Holes
Effective Size
Piq is the proportion of actor is relations that
are spent with q.
2
3
Adjacency 1 2 3 4 5 1 0 1 1 1 1 2 1 0 0 0 1 3 1
0 0 0 0 4 1 0 0 0 1 5 1 1 0 1 0
1
5
4
91Local Network Analysis Network Structure Weak
Ties Structural Holes
Effective Size
mjq is the marginal strength of contact js
relation with contact q. Which is js interaction
with q divided by js strongest interaction with
anyone. For a binary network, the strongest link
is always 1 and thus mjq reduces to 0 or 1
(whether j is connected to q or not) The sum of
the product piqmjq measures the portion of is
relation with j that is redundant to is relation
with other primary contacts.
92Local Network Analysis Network Structure Weak
Ties Structural Holes
Effective Size
2
3
Working with 1 as ego, we get the following
redundancy levels
1
P 1 2 3 4 5 1 .00 .25 .25 .25 .25 2
.50 .00 .00 .00 .50 3 1.0 .00 .00 .00 .00 4 .50
.00 .00 .00 .50 5 .33 .33 .00 .33 .00
PM1jq 1 2 3 4 5 1 --- --- --- ---
--- 2 --- .00 .00 .00 .25 3 --- .00 .00 .00 .00 4
--- .00 .00 .00 .25 5 --- .25 .00 .25 .00
5
4
Redundancy 1 Effective size 4-1 3
93Local Network Analysis Network Structure Weak
Ties Structural Holes
Effective Size
2
3
When you work it out, in a binary network,
redundancy reduces to the average degree, not
counting ties with ego of egos alters. Since
the average degree is simply another way to say
density, we can calculate redundancy as 2t/n
where t is the number of ties (not counting
ties to ego) and n is the number of people in the
network (not counting ego). Meaning that
effective size n - 2t/n
1
5
4
UCINET, STRUCTURE, SPAN and PAJEK all calculate
effective size
94Local Network Analysis Network Structure Weak
Ties Structural Holes
Efficiency is simply effective size divided by
observed size. Taken from each egos point of
view, efficiency in this network would be
Effective Ego Size
Size Efficiency 1 4 3 .75 2
2 1 .50 3 1 1 1.00 4
2 1 .50 5 3 1.67 .55
2
3
1
5
4
95Local Network Analysis Network Structure Weak
Ties Structural Holes
Constraint
Conceptually, constraint refers to how much room
you have to negotiate or exploit potential
structural holes in your network.
2
3
..opportunities are constrained to the extent
that (a) another of your contacts q, in whom you
have invested a large portion of your network
time and energy, has (b) invested heavily in a
relationship with contact j. (p.54)
1
5
4
96Local Network Analysis Network Structure Weak
Ties Structural Holes
Constraint
Cij Direct investment (Pij) Indirect
investment (PiqPqj)
97Local Network Analysis Network Structure Weak
Ties Structural Holes
2
3
Constraint
1
5
4
Given the p matrix, you can get indirect
constraint (piqpqj) by simply squaring the matrix
PP 1 2 3 4 5 1 ... .083
.000 .083 .250 2 .165 ... .125 .290 .125 3 .000
.250 ... .250 .250 4 .165 .290 .125 ... .125 5
.330 .083 .083 .083 ...
P 1 2 3 4 5 1 .00 .25 .25 .25 .25 2
.50 .00 .00 .00 .50 3 1.0 .00 .00 .00 .00 4 .50
.00 .00 .00 .50 5 .33 .33 .00 .33 .00
98Local Network Analysis Network Structure Weak
Ties Structural Holes
Constraint
Total constraint between any two people then is
C (P P2)2
Where P is the normalized adjacency matrix, and
means to square the elements of the matrix.
99Local Network Analysis Network Structure Weak
Ties Structural Holes
Hierarchy
Conceptually, hierarchy (for Burt) is really the
extent to which constraint is concentrated in a
single actor. It is calculated as
Note this measure says nothing about the
direction of ties its not about asymmetry
100Local Network Analysis Network Structure Weak
Ties Structural Holes
Hierarchy
2
3
1
2 3 4 5 C C .11 .06 .11 .25
.53 .83 .46 .83 1.9
5
4
H.514
101Local Network Analysis Network Structure Weak
Ties Structural Holes
Burt (2004) AJS 110349-399
102Local Network Analysis Network Structure Weak
Ties Structural Holes
Burt (2004) AJS 110349-399
103Local Network Analysis Network Structure Weak
Ties Structural Holes
Burt (2004) AJS 110349-399
104Local Network Analysis Local Network Models
Modeling Issues
- Local Network modeling issues
- Case independence
- In very clustered settings, the alters that each
person names will overlap. This will lead to
non-independence among the cases. - If you have enough cases or over time data, you
can use random or fixed effect models - If you know the names of alters, you can link
them to build in a direct network autocorrelation
effect. - Small network effects
- Be aware of the size of your networks.
Substantively, having 50 white networks means
something different in a net of size 2 vs a net
of size 10. I often suggest interactions to
check for these kinds of effects - Dealing with isolates
- Isolated nodes have no network alters, so none of
these measures apply. Depending on the context,
you can either leave them out of the analysis, or
use interaction terms to selectively apply the
measures of interest.
105Local Network Analysis Local Network Models
Modeling Issues
- Selection
- That some unobserved factor, z, creates both
friendships and the outcome of interest. - Endogeneity
- That the causal order of peer relations and
outcomes is reversed. Peers do not cause Y, but
Y causes friendship relations
106Local Network Analysis Local Network Models
Modeling Issues
Selection
- What do we know about how friendships form?
- Opportunity / focal factors
- - Being members of the same group
- - In the same class
- - On the same team
- - Members of the same church
- Structural Relationship factors
- - Reciprocity
- - Social Balance
- Behavior Homophily
- - Smoking
- - Drinking
107Local Network Analysis Local Network Models
Modeling Issues
Selection
How to correct this problem?
- Essentially, this is an omitted variable problem,
and the obvious solution is been to identify as
many potentially relevant alternative variables
as you can find. - Sensitivity measures (see Ken Franks work here)
- Propensity score matching
- Individual-level fixed effect models
- Substantively you only look at change in Y as a
function of change in X, holding constant
(because dummied out) any individual level
effect. - This works, but its drastic. Any endogenous
effect of networks on the self are essentially
removed
108Local Network Analysis Local Network Models
Modeling Issues
Endogeneity
Estimated Y b0 b1(P) e where P some
peer function. But the actual model may really
be P b0 b1(f(Y)) e
109Local Network Analysis Local Network Models
Modeling Issues
Endogeneity
Does it matter?
Algebraically the relation between y and p should
be direct translation of the coefficients
since
The statistical problem of endogeneity is that
when you estimate b1, it does not equal 1/b1,
because of our assumptions about x, and hence e.
There are other models that make different
assumptions, where this direction is irrelevant.
But they are uncommon and hard to work with in
the multivariate context.
(see Joel H. Levine, Exceptions are the Rule, for
a full discussion of this)
110Local Network Analysis Local Network Models
Modeling Issues
Possible solutions
- Theory Given what we know about how friendships
form, is it reasonable to assume a bi-directional
cause? That is, work through the meeting,
socializing, etc. process and ask whether it
makes sense that Y is a cause of P. - Models
- Time Order. We are on somewhat firmer ground if
P precedes Y in time. - - Simultaneous Equation Models. Model both the
friendship pattern and the outcome of interest
simultaneously. Difficult to identify
instruments or to specify orders that do not
logically make the model inestimable.
111Local Network Analysis Local Network Models Peer
influence example
- Haynie asks whether peers matter for delinquent
behavior, focusing on - a) the distinction between selection and
influence - b) the effect of friendship structure on peer
influence - Two basic theories underlie her work
- a) Hirchis Social Control Theory
- Social bonds constrain otherwise criminal
behavior - The theory itself is largely ambivalent toward
direction of network effects - b) Sutherlands Differential Association
- Behavior is the result of internalized
definitions of the situation - The effect of peers is through communication of
the appropriateness of particular behaviors - Haynie adds to these the idea that the structural
context of the network can boost the effect of
peers (a) so transmission is more effective in
locally dense networks and (b) the effect of
peers is stronger on central actors.
112Local Network Analysis Local Network Models Peer
influence example
113Local Network Analysis Local Network Models Peer
influence example