Title: NSF grant
1NSF grant 0429452
- Life in the Network
- The Coming Era of Computational Social Science
David Lazer Harvard University Netsci 2007
2Life in the Network
- Much of human civilization has been about
building network infrastructure (or taking
advantage of naturally occurring infrastructure,
such as rivers) - Benefits of trade
- Economies of scale
- Human drive to connect
- Cities, roads, railroads, telephone lines
- There has been a proliferation of various types
of ICT networks connecting people in the last few
decades
3Digital traces of our networked lives
- E-mail
- Instant messaging
- Text messaging
- Telephone logs
- Link structure among websites
- Facebook
- Web surfing http//www.iq.harvard.edu/blog/netgo
v/ - What can data like these tell us?
4Existing approaches to studying social networks
- Growth in interest in these kinds of relational
phenomena - Generally rely on self reports
- Static generally based on snapshots
- Shaky reliability what is being measured by self
reports? - Small scale mostly systems in the hundreds or
less - ? Inferential challenges in existing research
- ? Many important phenomena are neglected
5What can data like these tell us?
- How do things spread through a network?
- Ideas?
- Avian flu?
- How do people/organizations work together?
- Collaboration and coordination?
- Who is in key positions in the network?
- Form an empirical basis for various types of
policy recommendations - Possibly even real-time feedback for effective
interventions
6Computational social science
- The capturing and analysis of human activity
represented in digital form - Increased computational capacity to manipulate
data - Incidental, vast archives of human activity
(e.g., Internet, e-mail) - Instrumentation of human behavior (e.g., cookies,
GPS devices) - Creation of virtual worlds to experiment with
- What are the implications for our understanding
of collective human behavior?
7What can be done with data like these? Four
studies
- Call log analysis
- Instrumentation of human behavior
- Natural language processing
- Building virtual worlds
8Study 1 Call log analysis
- Structure and tie strengths in mobile
communication networks (just came out in PNAS,
with J.-P. Onnela, J. Saramäki, J. Hyvönen, G.
Szabó,K. Kaskil, J. Kertész, A.-L. Barabási ) - Examination of call log data from mobile phone
company in moderate sized European nation a
total of approximately 7,000,000 users, 49
trillion dyads - What does network structure look like?
- Small world? (six degrees of separation Watts
Strogatz) - Scale free? (hub-spoke structure Barabasi and
Albert) - Strength of weak ties? (Granovetter)
9Call log network data
10Results
- Hub-spoke structure (scale free)
- Small world (on average, 13 degrees of
separation) - But poorly structured for dissemination Strong
ties tend to be clustered, and weak ties bind
clusters together (consistent with Granovetter) - Simulations suggest that weak(est) ties are not
effective at spreading (inconsistent with
Granovetter) - Potentially powerful tool for studying evolving
social structures of communities - Possible use of data for a variety of policy
purposes, from criminal investigations to early
warning system for avian flu - But what does a phone call between two phones
mean??
11Study 2 Instrumentation of human behavior
- Paper Revealing Social Relationships using
Contextualized Proximity and Communication Data
(with Nathan Eagle and Sandy Pentland) - Collaboration with Media Lab
- Program mobile phones of 100 students for 9
months - Call log data
- Physical proximity (using Bluetooth)
- Location (using cell tower triangulation)
- Also collected self report data on friendship,
satisfaction - What is the information in these data?
- Compare observations to self reports
12Self reported vs observed proximity
- Substantial recency effects recent interactions
weighted more heavily - Reciprocal non-friends 99.5 accurate at
reporting 0s - Reciprocal friends 35 accurate at reporting
0s - Friends more accurate at non-0s
13Is friendship observable?
- Friendship is important at individual and
collective levels due to the resources that flow
among friends - Purely cognitive relationship in principle,
you could be friends with someone with whom you
do not interact. - But generally we all make inferences about who is
friends with whom based on our observations - Can the types of information that inform our
inferences be captured via our mobile phones? - Certainly, one anticipates that (for ex) friends
will tend to be proximate to each other - If high accuracy is possible, then possible to
look at evolution of friendship structure in
larger populations over time (as well as other
cognitive relationships, such as advice)
14Predicting friendships
- Relational scripts culturally-embedded patterns
of relational behavior - We generated seven relational variables
- Frequency of phone calls
- Proximity at home
- Proximity at work
- Proximity outside work
- Proximity on Saturday nights
- Proximity with no signal
- Number of unique locations
- Interactions broke into two factors
15Predicting friendships
- Relational scripts culturally-embedded patterns
of relational behavior - We generated seven relational variables
- Frequency of phone calls
- Proximity at home
- Proximity at work
- Proximity outside work
- Proximity on Saturday nights
- Proximity with no signal
- Number of unique locations
- Interactions broke into two factors
- In-role communication
- Extra-role communication
16Reported friendships
17Inferred friendships (based on extra-role factor)
18Self reported versus observed friendships
- We were able to categorize correctly 95 of
reciprocated friendships and reciprocated
non-friendships - Unreciprocated friendships came from high
scores in-role communication, perhaps capturing
cultural ambiguity - Created continuous construct from dichotomous
self report perhaps a more valid measure of
friendship? - Second layer of validation predicting
satisfaction based on (a) actual friendships and
(b) inferred friendship. Second model does
slightly better. - We follow culturally embedded programs with
respect to our relationships - Results suggest potential for inferring
friendship on much larger scale.
19Current data collection the sociometer
- More collaboration with the Media Lab
sociometers - Study of teams (with Nancy Katz)
- When do team members talk to each other, and who
does the talking? - What difference does this make at individual and
collective level?
20Study 3 Automated content analysis of
Congressional websites
- Acknowledgement NSF grant 0429452, Stephen
Purpura - Advances in computational linguistics and voice
recognition software - Importance amplified by simultaneous improvements
in voice recognition software - Code human coders categorize blocks of text
(training set and test set) - Train classification algorithms
- Test against test data
21Official Congressional websites
- Every House member has a website at www.House.gov
- Example http//www.house.gov/capuano/
- Strategic calculus of what message to send to
constituents - Comparable set of websites, create panel data
set, some things varying over time, some not
22(No Transcript)
23Official Congressional websites
- Every House member has a website at www.House.gov
- Example http//www.house.gov/capuano/
- Strategic calculus of what message to send to
constituents - Comparable set of websites, create panel data
set, some things varying over time, some not
24Official Congressional websites
- Track evolution of language usage
- Examples positive/negative mentions of Bush?
- For example, what was/is strongest single word
predictor of partisanship of Member of Congress? - In 2001 terror (Republicans used more than
Democrats) - In 2006 Iraq (Democrats used more than
Republicans) - Allows us to see what is flowing through networks
25Study 4 Building virtual deliberation world
- Acknowledgements Kevin Esterling, Michael
Neblo, Curt Ziniel NSF grant 0429452
- Creation of virtual space (use of Macromedia
Breeze) - Twenty deliberative sessions with Members of
Congress regarding immigration - Allows complete control and recording of
interactions, recruitment of representative
sample - Pretest, various control groups, during session
questions, post-test a week later, and
post-election survey - Analysis still pending (dramatic effects on
approval and vote intention)
26Virtual deliberation world
27Study 4 Building virtual deliberation world
- Acknowledgements Kevin Esterling, Michael
Neblo, Curt Ziniel NSF grant 0429452
- Creation of virtual space (use of Macromedia
Breeze) - Twenty deliberative sessions with Members of
Congress regarding immigration - Allows complete control and recording of
interactions, recruitment of representative
sample - Pretest, various control groups, during session
questions, post-test a week later, and
post-election survey - Analysis still pending (dramatic effects on
approval and vote intention)
28Computational social science
- Orders of magnitude increase in data being
collected about human behavior over last decade - Constant increase in computational power
- Shift in social science research over the next
generation - Thinking relationally what is flowing among
people? How are people working together?
29The big picture
- The capturing of massive amounts of digitalized
information about human behavior (especially
relational behavior) -
- The capacity to manipulate those data
-
- New insights into collective human behavior
30Challenges, Caveats, and Conundrums
- Overcoming silos of academia, particularly wide
between the sciences and social sciences - The need to develop new infrastructures within
social sciences - Substantial human subjects issues
- Partnerships with those that are guardians of the
network - Concerns about use of knowledge that is produced
(ex of NSA wiretaps, private sector data mining) - Figuring out what those insights are time to
shift the paradigm
31NSF grant 0429452
- Computational Social Science
Partially supported by NSF grant 0429452
32NSF grant 0429452
- Computational Social Science
Partially supported by NSF grant 0429452
continue at http//www.iq.harvard.edu/blog/netgo
v/
33(No Transcript)
34(No Transcript)