Title: Social networks from the perspective of Physics J
1Social networks from the perspective of
Physics János Kertész1,2 Jukka-Pekka Onnela2,
Jari Saramäki2, Jörkki Hyvönen2, Kimmo Kaski2,
Jussi Kumpula2 David Lazer3 Gábor Szabó3,4,
Albert-László Barabási3,41Budapest
University of Technology and Economics, Hungary
2Helsinki University of Technology,
Finland3Harvard University4University of Notre
Dame, USA
2Outline
- 0. Introduction
- Constructing the social network
- Basic statistics
- Granovetters hypothesis
- Thresholding (percolation)
- Spreading
- Modeling
- Conclusions
3Introduction
Complex systems Many interacting units such that
the resulting behavior is more than a mere sum
(brain, internet, society) Much is known about
the interactions but complex behavior often still
puzzling N 3 can be many! See Three-body
problem of mechanics Statistical physics N
1023 Social sciences N 3 109
4Introduction
Complex systems More input needed than mere
interactions ? Forget about interactions Network
s Scaffold of complexity Useful to concentrate
on the carrying NW structure (nodes and links)
Holistic approach with very general
statements Spectacular recent development Abunda
nce of data due to IT new concepts
5Introduction
Phenomenon Nodes Links
Cell metabolism Molecules Chemical reactions
Scientific collaboration Scientists Joint papers
WWW Pages URL links
Air traffic Airports Airline connections
Economy Firms Trading
Language Words Synonymous meaning
Society People Acquaintances
6Introduction
- Characterization of many empirical NW-s
- BROAD DEGREE DISTRIBUTION in many natural and
human made NW-s - - SMALL WORLD property Average distance between
two nodes usually very small ( log( N ) ) 6
degrees of separation - - HIGH CLUSTERING The number of triangles is
significantly high - Studied in many networks WWW, Internet, actor,
citation, metabolic etc
7World not only small and scale free but
clustered! Friends of friends are often friends.
Clustering coeff. C ratio of connected neighbors
ER graph too small clustering!
8Introduction
WEIGHTED NW-S Step toward reductionism
Interactions have different strength ? weights on
links Weights Fluxes (traffic or chemical
reactions), correlation based networks,
etc. (Often no negative weights, wij ? 0.) How to
characterize weighted NW-s? E.g. STRENGTH of
node i si ?j wij Intensity, coherence of
subgraphs clustering, motifs etc. (see
Onnela et al. PRE 71, 065103(R) (2005)
9Introduction
SOCIAL NW-S Much has been taken from Sociology
betweennes, clustering, assortativity Main
method Questionnaires (10 - 10 000) Weighted
social nw-s Strength of social relationships
varies over wide range
I know him/her We are on first name basis We
are friends We are good friends We are very
good friends
How to measure?
Scale? Subjectivity?
10Introduction
Advantage of questionnaires Ask whatever you
are interested in. It enables complex studies,
multi-factor analyses. Disadvantage Difficulty
in quantification and subjectivity E.g.,
AddHealth Quantification of tie strength by
number of joint activities Mutuality test fails
very often M.Gonzales et al. Physica A 379,
307-316. (2007) Alternative approach Use
communication databases (email, phone etc)
11Outline
- 0. Introduction
- Constructing the social network
- Basic statistics
- Granovetters hypothesis
- Thresholding (percolation)
- Spreading
- Modeling
- Conclusions
12Constructing the Network
- Use a network constructed from mobile phone calls
as a proxy for a social network - In the network
- Nodes ? individuals
- Links ? voice calls
- Link weights
- Number of calls
- Total call duration (time money)
13Constructing the Network
- Over 7 million private mobile phone subscriptions
- Focus voice calls within the home operator
- Data aggregated from a period of 18 weeks
- Require reciprocity (X?Y AND Y?X) for a link
- Customers are anonymous (hash codes)
- Data from an European mobile operator
14Outline
- 0. Introduction
- Constructing the social network
- Basic statistics
- Granovetters hypothesis
- Thresholding (percolation)
- Spreading
- Modeling
- Conclusions
15Basic Statistics Visualisation
Largest connected component dominates 3.9M / 4.6M
nodes 6.5M / 7.0M links Use it for analysis!
16Basic Statistics Distributions
Vertex degree distribution
Link weight distribution
Fat tail
Dunbar number (monkeysphere) max 150 connections
17Outline
- 0. Introduction
- Constructing the social network
- Basic statistics
- Granovetters hypothesis
- Thresholding (percolation)
- Spreading
- Modeling
- Conclusions
18Granovetters Weak Ties Hypothesis
- Granovetter suggests analysis of social
networks as a tool for
linking micro and
macro levels of sociological theory - Considers the macro level implications of tie
(micro level) strengths - The strength of a tie is a (probably linear)
combination of the amount of time, the emotional
intensity, the intimacy (mutual confiding), and
the reciprocal services which characterize the
tie. - Formulates a hypothesis
- The relative overlap of two individuals
friendship networks varies directly with the
strength of their tie to one another -
- Explores the impact of the hypothesis on, e.g.
diffusion of information, stressing the cohesive
power of weak ties - M. Granovetter, The Strength of Weak Ties,
- The American Journal of Sociology 78,
1360-1380, 1973.
19Granovetters Weak Ties Hypothesis
- Hypothesis based on theoretical work and some
direct evidence - Present network is suitable for testing the
hypothesis - (i) Call durations ? time commitment ? tie
strength - (ii) Call durations ? monetary commitment ? tie
strength - (iii) Largest weighted social network so far
- (Problem Other factors, such as emotional
intensity or reciprocal services?) - What is the coupling between network topology
and link weights? - Consider two connected nodes. We would like to
characterize their relative neighborhood overlap,
i.e. proportion of common friends - This leads naturally to link neighborhood
overlap
20Overlap
-
- Definition relative neighborhood overlap
(topological) - where the number of triangles around edge (vi,
vj) is nij - Illustration of the concept
21Empirical Verification
- Let ltOgtw denote Oij averaged over a bin of
w-values - Use cumulative link weight distribution
- (the fraction of links with weights less than
w)
- Relative neighbourhood overlap increases as a
function of link weight -
- ?Verifies Granovetters hypothesis (95)
- (Exception Top 5 of weights)
- Blue curve empirical network
- Red curve weight randomised network
22Local Implications
- Implication for strong links?
Neighbourhood overlap is high ? People
form strongly connected communities
- Implication for weak links?
Neighbourhood overlap is low ?
Communities are connected by weak links
23A Piece of the Network
weak links
strong links
community
24Overlap
Global optimization to transport would put high
weights to links with high betweenness
centrality ( passing shortest paths)
In contrast, ltO gt decreases with b
25High Weight Links?
- (a) Average Oij as a function of weight w
- w ? 104 stronger tie ? larger overlap
- w ? 104 stronger tie ? smaller overlap
- Contradicts the weak ties hypothesis !
- Links in the decreasing part correspond to over
3h of communication over the period - (b) Putting it into perspective
- - For only 5 of links w ? 104
- - Corresponds to 325 000 links, cannot be
insufficient statistics
26High Weight Links?
- Weak links Strengh of both adjacent nodes (min
max) considerably higher than link weight - Strong links Strength of both adjacent nodes
(min max) about as high as the link weight - Indication High weight relationships clearly
dominate on-air time of both, others negligible - Time ratio spent communicating with one other
person converges to 1 at roughly w 104 - Consequence Less time to interact with others
- Explaining onset of decreasing trend for ltOgtw
27Outline
- 0. Introduction
- Constructing the social network
- Basic statistics
- Granovetters hypothesis
- Thresholding (percolation)
- Spreading
- Modeling
- Conclusions
28Thresholding Analysis Introduction
- Childrens approach Break to learn!
- We do this systematically using thresholding
analysis - Order the links by weight
- Delete the links, one by one, based on their
order - Control parameter f is the fraction of removed
links - We can continuously interpolate, in either
direction, between the initial connected network
(f0) and the set of isolated nodes (f1) - We use two different thresholding schemes
- (i) Increasing thresholding (remove low
wij/Oij links first) - (ii) Descending thresholding (remove high
wij/Oij links first) - Question How does the network respond to link
removal? - How similar is the response to wij and Oij driven
thresholding?
29Thresholding
- Initial connected network (f0)
- ? All links are intact, i.e. the network is in
its initial stage
30Thresholding
- Increasing weight thresholded network (f0.8)
- ? 80 of the weakest links removed, strongest
20 remain
31Thresholding
- Initial connected network (f0)
- ? All links are intact, i.e. the network is in
its initial stage
32Thresholding
- Decreasing weight thresholded network (f0.8)
- ? 80 of the strongest links removed, weakest
20 remain
33Thresholding
- We will study, as a function of the control
parameter f, the following - Order parameter (size of the largest component)
- Susceptibility (average size of other
components) - Average path lengths (in LCC)
- Average clustering coefficient in the LCC
34Thresholding Size of Largest Component
- RLCC is the fraction of nodes in the largest
connected component - LCC is able to sustain its integrity for moderate
values of f - Least affected by removal of high Oij links (in
tight communities) - Most affected by removal of low Oij links
(between communities) - Difference between removal of low and high wij
links is small, but LCC breaks earlier if weak
links are removed (Granovetter) - Very few links are required for global
connectivity
remove low first remove high first
(c)
35Thresholding Size of Other Components
- Collapse for different values of f, but what is
its nature? - Susceptibility (average cluster size excl. LCC)
- ns is the number of clusters with s nodes
- Percolation theory S?8 as f?fc
- Finite signature of divergence fc 0.60 (incr.
o.) fc 0.82 (incr. w.) - Demarcation between weak and strong links given
by fc 0.82 - Qualitatively different role for weak and strong
links
remove low first remove high first
(c)
36Thresholding Path Lengths in LCC
- Granovetter refers to interpersonal flow
(information, rumour) from one person to another - In order for a flow to exist, the two people
(nodes) need to be connected at least through one
path - The size of the LCC says nothing about how
tightly connected the component is, only that it
is connected - Granovetters corollary
- Weak ties create a large number of short paths
between nodes in different communities, and
thus removing them should increase average path
lengths and make it more difficult for the flow
to happen
37Thresholding Path Lengths in LCC
- Connectedness necessary but not sufficient
condition for flow - But how is the LCC connected?
- Use a.p.l. ltlgt to study the role of different
links for global paths - Removing weak links leads to longer paths
f0.75 ltlgt45 vs. ltlgt30 - Supports the weak ties conjecture on path lengths
- (communities are locally connected by weak ties)
remove low first remove high first
(c)
38Thresholding Clustering in LCC
- Effect of different links on the structure of
communities? - Quantify this with ltCgt, average clustering
coefficient - Strong links are mostly within communities
(triangles abundant), and thus removing them
lowers clustering - Weak links are mostly between communities (rarely
participate in triangles), and thus removing them
has little effect - Removing high Oij links shatters communities
quickly - Removing low Oij links brings out communities
remove low first remove high first
39Outline
- 0. Introduction
- Constructing the social network
- Basic statistics
- Granovetters hypothesis
- Thresholding (percolation)
- Diffusion of infromation
- Modeling
- Conclusions
40Diffusion of information
- Knowledge of information diffusion based on
unweighted networks - Use the present network to study diffusion on a
weighted network Does the local
relationship between topology and tie strength
have an effect? - Spreading simulation infect one node with new
information - (1) Empirical pij ? wij
- (2) Reference pij ? ltwgt
- Spreading significantly faster on the reference
(average weight) network - Information gets trapped in communities in the
real network
Reference
Empirical
41Diffusion of information
- Where do individuals get their information?
Majority of infections through - (1) Empirical ties of intermediate strength
- (2) Reference (would be) weak ties
- Both weak and strong ties have a diminishing role
as information sources The weakness of weak
and strong ties
42Diffusion of information
- - Start spreading 100 times (large red node)
- - Information flows differently due to the local
organizational principle - (1) Empirical information flows along a strong
tie backbone - (2) Reference information mainly flows along
the shortest paths
Best search results Reach out of your own
community
Empirical
Reference
43Spreading
- In simplified terms, we can think of each link as
transmitting information locally between the two
individuals it connects - Strong links involve larger time commitments, so
natural to assume that information flow through a
link is proportional to its weight wij - Flow through weak (high wij) links
- (i) Low per se (by definition)
- (ii) Low overlap Oij ? Few alternative paths of
length 2, so information can easily get trapped - Flow through strong (high wij) links
- (i) High per se (by definition)
- (ii) High overlap Oij ? Many alternative paths
enhance flow further, so particularly well suited
to efficient local transfer
44Searching
- Fix a set of search strategies
- Study which strategies are successful in finding
information - Best search results Reach out of your own
community!
45Outline
- 0. Introduction
- Constructing the social network
- Basic statistics
- Granovetters hypothesis
- Thresholding (percolation)
- Spreading
- Modeling
- Conclusions
46Modeling
- What is all this good for?
- Understanding structure and mechanisms of the
society - Improving spreading of news and opinions
- (Developing marketing strategies and other tools
of mass manipulation) - MODELING needed
47Modeling
Needed Weighted network model, which reflects
the observations with possibly limited
input Links created by random encounters on
acquaintance basis Weights generated by
one-to-one activities (phone calls) Take into
account the different time scales Encounter
(call) frequency Lifetime of relationships
Lifetime of nodes
?
treated together
48Microscopic mechanisms in sociology
- Network sociology
- Cyclic closure
- Exponential decay for growing geodesic distance
- Focal closure
- Distance independent
- Sample window
- Network model
- Local attachment (LA)
- Special case of cyclic closure Triadic closure
- Global attachment (GA)
- Node deletion (ND)
M. Kossinets et al., Empirical Analysis of an
Evolving Social Network, Science 311, 88 (2006)
49Modeling
i meets j with prob. ? wij , who meets k with
prob. ? wjk. If k is a common friend wij, wjk
wki are increased by ? (a). If k is not connected
to i, wik w0 ( 1) is created with probability
p? (b). With prob. pr new links with w0 weight
are created (c). With prob. pd a node with all
links is deleted and a new one is born with no
links.
50Microscopic rules in the model
- Summary of the model
- Weighted local search for new acquaintances
- Reinforcement of existing (popular) links
- Unweighted global search for new acquaintances
- Node removal, exp.link weight lifetimes lttgt2
lttwgt(pd)-1 - Model parameters
- d Free weight reinforcement parameter
- pr 10-3 Sets the time scale of the model lt tN
gt 1/pd - (average node lifetime of 1000 time steps)
- pr 510-4 Global connections results not
sensitive for it - (one random link per node during 1000 time
steps) - p? Adjusted in relation to d to keep ltkgt
constant - (structure changes due to only link
re-organisations)
51Modeling
pd 0.001 pr 0.0005 N 30 000
Changing ? by keeping ltkgt fixed by adjusting p?.
Communities emerge, with strong internal
links. Communities are interconnected by weak
links
? 0.001 0.1 0.5
52Social network model
Samples of N105 network for variable
weight-increase d
Tie strength weak ? intermediate ? strong tie
53Communities by inspection
- Average number of links
- constant ltLgt N ltkgt/2
- (ltkgt 10 )
- gt All changes in structure due to
re-organisation of links - Increasing d traps search
- in communities, further enhancing
trapping effect - gt Clear communities form
- Triangles accumulate weight and act as
nuclei for communities to emerge
54Communities by k-clique method
- k-clique algorithm as definition for communities
- Focus on 4-cliques (smallest non-trivial cliques)
- Relative largest community size Rk4 ? 0,1
- Average community size ltnsgt (excl. largest)
- Observe clique percolation through the system for
small d - Increasing d leads to condensation of communities
G. Palla et al., Uncovering the overlapping
community structure..., Nature 435, 814 (2005)
55Global consequences
Remove weak strong links first
56Global consequences
Model network
Phone network
Ascending link removal
Descending link removal
Ascending Descending
Fraction of links, f
0
1
f
f
Phase transition for ascending tie removal
(weaker first)
57Modeling
- The model fulfills essential criteria of social
nw-s - Broad (but not scale free degree) distribution
- Assortative mixing (popular people attract each
other) - High clustering many triangles (by
construction) - Community structure with strong links inside and
weak ones between them
58Outline
- 0. Introduction
- Constructing the social network
- Basic statistics
- Granovetters hypothesis
- Thresholding (percolation)
- Spreading
- Modeling
- Conclusions
59Discussion and Conclusion
- Weak ties maintain networks structural
integrity Strong ties maintain local
communities Intermediate ties mostly responsible
for first-time infections - How can one efficiently search for information in
a social network? Go out of your community! - Social networks seem better suited to local
processing than global transmission of
information - Are there simple rules or mechanisms that lead to
observed properties? - Efficient modeling possible
-
Publications J.-P. Onnela, et al. PNAS 104,
7332-7336 (2007) J.-P. Onnela, et al. New
J. Phys. 9, 179 (2007) J.M.
Kumpula, et al. PRL (to be published)
www.phy.bme.hu/kertesz/
60Marc Granovetter, Connections, 1990
gtgtIn the history of public speaking, there have
been many famous denials. One sunny day in 1880,
Karl Marx declared "I am not a Marxist". On a
less auspicious occasion in 1973, Richard Nixon
insisted "I am not a crook". Neither Marx nor
Nixons audience gave much credence to their
denials, and you too may respond with disbelief
when I tell you that "I am not a networker".ltlt
gtgtInstead, the slogan of the day will be "We are
all networkers now".ltlt