Information diffusion - PowerPoint PPT Presentation

1 / 76
About This Presentation
Title:

Information diffusion

Description:

Lecture 27: Information diffusion CS 765 Complex Networks Slides are modified from Lada Adamic, David Kempe, Bill Hackbor – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 77
Provided by: ladamic
Category:

less

Transcript and Presenter's Notes

Title: Information diffusion


1
Lecture 27 Information diffusion
CS 765 Complex Networks
Slides are modified from Lada Adamic, David
Kempe, Bill Hackbor
2
outline
  • factors influencing information diffusion
  • network structure which nodes are connected?
  • strength of ties how strong are the connections?
  • studies in information diffusion
  • Granovetter the strength of weak ties
  • J-P Onnela et al strength of intermediate ties
  • Kossinets et al strength of backbone ties
  • Davis board interlocks and adoption of practices
  • network position and access to information
  • Burt Structural holes and good ideas
  • Aral and van Alstyne networks and information
    advantage
  • networks and innovation
  • Lazer and Friedman innovation

3
factors influencing diffusion
  • network structure (unweighted)
  • density
  • degree distribution
  • clustering
  • connected components
  • community structure
  • strength of ties (weighted)
  • frequency of communication
  • strength of influence
  • spreading agent
  • attractiveness and specificity of information

4
Strong tie defined
  • A strong tie
  • frequent contact
  • affinity
  • many mutual contacts
  • Less likely to be a bridge (or a local bridge)

forbidden triad strong ties are likely to
close
Source Granovetter, M. (1973). "The Strength of
Weak Ties",
5
edge embeddeness
  • embeddeness number of common neighbors the two
    endpoints have
  • neighborhood overlap

6
school kids and 1st through 8th choices of friends
  • snowball sampling
  • will you reach more different kids by asking each
    kid to name their 2 best friends, or their 7th
    8th closest friend?

Source M. van Alstyne, S. Aral. Networks,
Information Social Capital
7
is it good to be embedded?
  • What are the advantages of occupying an embedded
    position in the network?
  • What are the disadvantages of being embedded?
  • Advantages of being a broker (spanning
    structural holes)?
  • Disadvantages of being a broker?

8
outline
  • factors influencing information diffusion
  • network structure which nodes are connected?
  • strength of ties how strong are the connections?
  • studies in information diffusion
  • Granovetter the strength of weak ties
  • J-P Onnela et al strength of intermediate ties
  • Kossinets et al strength of backbone ties
  • Davis board interlocks and adoption of practices
  • network position and access to information
  • Burt Structural holes and good ideas
  • Aral and van Alstyne networks and information
    advantage
  • networks and innovation
  • Lazer and Friedman innovation

9
how does strength of a tie influence diffusion?
  • M. S. Granovetter The Strength of Weak Ties,
    AJS, 1973
  • finding a job through a contact that one saw
  • frequently (2 times/week) 16.7
  • occasionally (more than once a year but lt 2x
    week) 55.6
  • rarely 27.8
  • but length of path is short
  • contact directly works for/is the employer
  • or is connected directly to employer

10
strength of tie frequency of communication
  • Kossinets, Watts, Kleinberg, KDD 2008
  • which paths yield the most up to date info?
  • how many of the edges form the backbone?

source Kossinets et al. The structure of
information pathways in a social communication
network
11
the strength of intermediate ties
  • strong ties
  • frequent communication, but ties are redundant
    due to high clustering
  • weak ties
  • reach far across network, but communication is
    infrequent
  • Structure and tie strengths in mobile
    communication networks
  • use nation-wide cellphone call records and
    simulate diffusion using actual call timing

12
Localized strong ties slow infection spread.
source Onnela J. et.al. Structure and tie
strengths in mobile communication networks
13
how can information diffusion be different from
simple contagion (e.g. a virus)?
  • simple contagion
  • infected individual infects neighbors with
    information at some rate
  • threshold contagion
  • individuals must hear information (or observe
    behavior) from a number or fraction of friends
    before adopting
  • in lab complex contagion (Centola Macy, AJS,
    2007)
  • how do you pick individuals to infect such that
    your opinion prevails
  • http//www.ladamic.com/netlearn/NetLogo4/Diffusion
    Competition.html

14
Framework
  • The network consists of nodes (individual) and
    edges (links between nodes)
  • Each node is in one of two states
  • Susceptible (in other words, healthy)
  • Infected
  • Susceptible-Infected-Susceptible (SIS) model
  • Cured nodes immediately become susceptible

15
Framework (Continued)
  • Homogeneous birth rate ß on all edges between
    infected and susceptible nodes
  • Homogeneous death rate d for infected nodes

Healthy
N2
X
N1
Infected
N3
16
SIR and SIS Models
  • An SIS model consists of two group
  • Susceptible Those who may contract the disease
  • Infected Those infected
  • An SIR model consists of three group
  • Susceptible Those who may contract the disease
  • Infected Those infected
  • Recovered Those with natural immunity or those
    that have died.

17
Important Parameters
  • a is the transmission coefficient, which
    determines the rate at which the disease travels
    from one population to another.
  • ? is the recovery rate (I persons)/(days
    required to recover)
  • R0 is the basic reproduction number.
  • (Number of new cases arising from one infective)
    x (Average duration of infection)
  • If R0 gt 1 then ?I gt 0 and an epidemic occurs

18
SIR and SIS Models
  • SIS Model
  • SIR Model

19
Diffusion in networks ER graphs
  • review diffusion in ER graphs

http//www.ladamic.com/netlearn/NetLogo501/ERDiffu
sion.html
20
Quiz Q
  • When the density of the network increases,
    diffusion in the network is
  • faster
  • slower
  • unaffected

21
ER graphs connectivity and density
nodes infected after 10 steps, infection rate
0.15
average degree 2.5
average degree 10
22
Quiz Q
  • When nodes preferentially attach to high degree
    nodes, the diffusion over the network is
  • faster
  • slower
  • unaffected

23
Diffusion in grown networks
  • nodes infected after 4 steps, infection rate 1

preferential attachment
non-preferential growth
http//www.ladamic.com/netlearn/NetLogo501/BADiffu
sion.html
24
Diffusion in small worlds
  • What is the role of the long-range links in
    diffusion over small world topologies?

http//www.ladamic.com/netlearn/NetLogo4/SmallWorl
dDiffusionSIS.html
25
Quiz Q
  • Relative to the simple contagion process the
    threshold contagion process
  • is better able to use shortcuts
  • advances more rapidly through the network
  • infects a greater number of nodes

26
diffusion of innovation
  • surveys
  • farmers adopting new varieties of hybrid corn by
    observing what their neighbors were planting
    (Ryan and Gross, 1943)
  • doctors prescribing new medication (Coleman et
    al. 1957)
  • spread of obesity happiness in social networks
    (Christakis and Fowler, 2008)
  • online behavioral data
  • Spread of Flickr photos Digg stories (Lerman,
    2007)
  • joining LiveJournal groups CS conferences
    (Backstrom et al. 2006)
  • others e.g. Anagnostopoulos et al. 2008

27
Open question how do we tell influence from
correlation?
  • approaches
  • time resolved data if adoption time is shuffled,
    does it yield the same patterns?
  • if edges are directed does reversing the edge
    direction yield less predictive power?

28
Example adopting new practices
  • poison pills
  • diffused through interlocks
  • geography had little to do with it
  • more likely to be influenced
  • by tie to firm doing something
  • similar having similar centrality
  • golden parachutes
  • did not diffuse through interlocks
  • geography was a significant factor
  • more likely to follow central firms
  • why did one diffuse through the network while
    the other did not?

Source Corporate Elite Networks and Governance
Changes in the 1980s.
29
Social Network and Spread of Influence
  • Social network plays a fundamental role as a
    medium for the spread of INFLUENCE among its
    members
  • Opinions, ideas, information, innovation
  • Direct Marketing takes the word-of-mouth
    effects to significantly increase profits
  • Gmail, Tupperware popularization, Microsoft
    Origami

30
Problem Setting
  • Given
  • a limited budget B for initial advertising
  • e.g. give away free samples of product
  • estimates for influence between individuals
  • Goal
  • trigger a large cascade of influence
  • e.g. further adoptions of a product
  • Question
  • Which set of individuals should B target at?
  • Application besides product marketing
  • spread an innovation
  • detect stories in blogs

31
Models of Influence
  • First mathematical models
  • Schelling '70/'78, Granovetter '78
  • Large body of subsequent work
  • Rogers '95, Valente '95, Wasserman/Faust '94
  • Two basic classes of diffusion models threshold
    and cascade
  • General operational view
  • A social network is represented as a directed
    graph, with each person (customer) as a node
  • Nodes start either active or inactive
  • An active node may trigger activation of
    neighboring nodes
  • Monotonicity assumption active nodes never
    deactivate

32
Threshold dynamics
  • The network
  • aij is the adjacency matrix (N N)
  • un-weighted
  • undirected
  • The nodes
  • are labelled i , i from 1 to N
  • have a state
  • and a threshold ri from some distribution.

33
Threshold dynamics
Updating
The fraction of nodes in state vi1 is r(t)
34
Linear Threshold Model
  • A node v has random threshold ?v U0,1
  • A node v is influenced by each neighbor w
    according to a weight bvw such that
  • A node v becomes active when at least
  • (weighted) ?v fraction of its neighbors are
    active

35
Example
Inactive Node
0.6
Active Node
Threshold
0.2
0.2
0.3
Active neighbors
X
0.1
0.4
U
0.3
0.5
Stop!
0.2
0.5
w
v
36
Independent Cascade Model
  • When node v becomes active, it has a single
    chance of activating each currently inactive
    neighbor w.
  • The activation attempt succeeds with probability
    pvw .

37
Example
0.6
Inactive Node
0.2
0.2
0.3
Active Node
Newly active node
U
X
0.1
0.4
Successful attempt
0.5
0.3
0.2
Unsuccessful attempt
0.5
w
v
Stop!
38
outline
  • factors influencing information diffusion
  • network structure which nodes are connected?
  • strength of ties how strong are the connections?
  • studies in information diffusion
  • Granovetter the strength of weak ties
  • J-P Onnela et al strength of intermediate ties
  • Kossinets et al strength of backbone ties
  • Davis board interlocks and adoption of practices
  • network position and access to information
  • Burt Structural holes and good ideas
  • Aral and van Alstyne networks and information
    advantage
  • networks and innovation
  • Lazer and Friedman innovation

39
Burt structural holes and good ideas
  • Managers asked to come up with an idea to improve
    the supply chain
  • Then asked
  • whom did you discuss the idea with?
  • whom do you discuss supply-chain issues with in
    general
  • do those contacts discuss ideas with one another?
  • 673 managers (455 (68) completed the survey)
  • 4000 relationships (edges)

40
(No Transcript)
41
(No Transcript)
42
results
  • people whose networks bridge structural holes
    have
  • higher compensation
  • positive performance evaluations
  • more promotions
  • more good ideas
  • these brokers are
  • more likely to express ideas
  • less likely to have their ideas dismissed by
    judges
  • more likely to have their ideas evaluated as
    valuable

43
networks information advantage
Betweenness
Constrained vs. Unconstrained
Source M. van Alstyne, S. Aral. Networks,
Information Social Capital
slides Marshall van Alstyne
44
Aral Alstyne Study of a head hunter firm
  • Three firms initially
  • Unusually measurable inputs and outputs
  • 1300 projects over 5 yrs and
  • 125,000 email messages over 10 months (avg 20 of
    time!)
  • Metrics
  • (i) Revenues per person and per project,
  • (ii) number of completed projects,
  • (iii) duration of projects,
  • (iv) number of simultaneous projects,
  • (v) compensation per person
  • Main firm 71 people in executive search (2 firms
    partial data)
  • 27 Partners, 29 Consultants, 13 Research, 2 IT
    staff
  • Four Data Sets per firm
  • 52 Question Survey (86 response rate)
  • E-Mail
  • Accounting
  • 15 Semi-structured interviews

45
Email structure matters
a
a
Coefficients
Coefficients
New Contract Revenue
Contract Execution Revenue
Unstandardized Coefficients
Unstandardized Coefficients
B
Std. Error
Adj. R2
Sig. F ?
B
Std. Error
Adj. R2
Sig. F ?
(Base Model)
0.40
0.19
Best structural pred.
12604.0
4454.0
0.52
.006
1544.0
639.0
0.30
.021
Ave. E-Mail Size
-10.7
4.9
0.56
.042
-9.3
4.7
0.34
.095
Colleagues Ave.
-198947.0
168968.0
0.56
.248
-368924.0
157789.0
0.42
.026
Response Time
a.
a.
Dependent Variable Bookings02
Dependent Variable Billings02

b.
b.
Base Model YRS_EXP, PARTDUM, _CEO_SRCH,
SECTOR(dummies), _SOLO.
N39. plt.01, plt.05, plt.1
Sending shorter e-mail helps get contracts and
finish them. Faster response from colleagues
helps finish them.
46
diverse networks drive performance by providing
access to novel information
  • network structure (having high degree) correlates
    with receiving novel information sooner (as
    deduced from hashed versions of their email)
  • getting information sooner correlates with
    brought in
  • controlling for of years worked
  • job level
  • .

47
Network Structure Matters
a
a
Coefficients
Coefficients
New Contract Revenue
Contract Execution Revenue
Unstandardized Coefficients
Unstandardized Coefficients
B
Std. Error
Adj. R2
Sig. F ?
B
Std. Error
Adj. R2
Sig. F ?
(Base Model)
0.40
0.19
Size Struct. Holes
13770
4647
0.52
.006
7890
4656
0.24
.100
Betweenness
1297
773
0.47
.040
1696
697
0.30
.021
a.
a.
Dependent Variable Bookings02
Dependent Variable Billings02

b.
b.
Base Model YRS_EXP, PARTDUM, _CEO_SRCH,
SECTOR(dummies), _SOLO.
N39. plt.01, plt.05, plt.1
Bridging diverse communities is
significant. Being in the thick of information
flows is significant.
48
outline
  • factors influencing information diffusion
  • network structure which nodes are connected?
  • strength of ties how strong are the connections?
  • studies in information diffusion
  • Granovetter the strength of weak ties
  • J-P Onnela et al strength of intermediate ties
  • Kossinets et al strength of backbone ties
  • Davis board interlocks and adoption of practices
  • network position and access to information
  • Burt Structural holes and good ideas
  • Aral and van Alstyne networks and information
    advantage
  • networks and innovation
  • Lazer and Friedman innovation

49
networked coordination game
  • choice between two things, A and B
  • e.g. basketball and soccer
  • if friends choose A, they get payoff a
  • if friends choose B, they get payoff b
  • if one chooses A while the other chooses B, their
    payoff is 0

50
coordinating with ones friends
Let A basketball, B soccer. Which one should
you learn to play?
fraction p 3/5 play basketball
fraction p 2/5 play soccer
51
which choice has higher payoff?
  • d neighbors
  • p fraction play basketball (A)
  • (1-p) fraction play soccer (B)
  • if choose A, get payoff p d a
  • if choose B, get payoff (1-p) d b
  • so should choose A if
  • p d a (1-p) d b
  • or
  • p b / (a b)

52
two equilibria
  • everyone adopts A
  • everyone adopts B

53
what happens in between?
  • What if two nodes switch at random? Will a
    cascade occur?
  • example
  • a 3, b 2
  • payoff for nodes interaction using behavior A is
    3/2
  • as large as what they get if they both choose B
  • nodes will switch from B to A if at least q
    2/(32) 2/5 of their neighbors are using A

54
how does a cascade occur
  • suppose 2 nodes start playing basketball due to
    external factors
  • e.g. they are bribed with a free pair of shoes by
    some devious corporation

55
Quiz Q
Which node(s) will switch to playing basketball
next?
56
the complete cascade
57
you pick the initial 2 nodes
  • A larger example (Easley/Kleinberg Ch. 19)
  • does the cascade spread throughout the network?

http//www.ladamic.com/netlearn/NetLogo412/Cascade
Model.html
58
implications for viral marketing
  • if you could pay a small number of individuals to
    use your product,
  • which individuals would you pick?

59
Quiz question
  • What is the role of communities in complex
    contagion
  • enabling ideas to spread in the presence of
    thresholds
  • creating isolated pockets impervious to outside
    ideas
  • allowing different opinions to take hold in
    different parts of the network

60
bilingual nodes
  • so far nodes could only choose between A and B
  • what if you can play both A and B,
  • but pay an additional cost c?

61
try it on a line
  • Increase the cost of being bilingual so that no
    node chooses to do so. Let the cascade run
  • Now lower the cost.
  • What happens?

62
Quiz Q
  • The presence of bilingual nodes
  • helps the superior solution to spread throughout
    the network
  • helps inferior options to persist in the network
  • causes everyone in the network to become
    bilingual

63
knowledge, thresholds, and collective action
  • nodes need to coordinate across a network, but
    have limited horizons

64
can individuals coordinate?
  • each node will act if at least x people
    (including itself) mobilize

nodes will not mobilize
65
mobilization
  • there will be some turnout

66
Quiz Q
  • will this network mobilize (at least some
    fraction of the nodes will protest)?

67
innovation in networks
  • network topology influences who talks to whom
  • who talks to whom has important implications for
    innovation and learning

68
better to innovate or imitate?
brainstorming more minds together, but also
danger of groupthink
working in isolation more independence slower
progress
69
in a network context
70
modeling the problem space
  • Kauffmans NK model
  • N dimensional problem space
  • N bits, each can be 0 or 1
  • K describes the smoothness of the fitness
    landscape
  • how similar is the fitness of sequences with only
    1-2 bits flipped (K 0, no similarity, K large,
    smooth fitness)

71
Kauffmans NK model
K large
K medium
K small
fitness
distance
72
Update rules
  • As a node, you start out with a random bit string
  • At each iteration
  • If one of your neighbors has a solution that is
    more fit than yours, imitate (copy their
    solution)
  • Otherwise innovate by flipping one of your bits

73
Quiz Q
  • Relative to the regular lattice, the network with
    many additional, random connections has on
    average
  • slower convergence to a local optimum
  • smaller improvement in the best solution relative
    to the initial maximum
  • more oscillations between solutions

74
networks and innovationis more information
diffusion always better?
linear network
fully connected network
  • Nodes can innovate on their own (slowly) or adopt
    their neighbors solution
  • Best solutions propagate through the network

source Lazer and Friedman, The Parable of the
Hare and the Tortoise Small Worlds, Diversity,
and System Performance
75
networks and innovation
  • fully connected network converges more quickly on
    a solution, but if there are lots of local maxima
    in the solution space, it may get stuck without
    finding optimum.
  • linear network (fewer edges) arrives at better
    solution eventually because individuals innovate
    longer

76
Coordination graph coloring
  • Application coloring a map limited set of
    colors, no two adjacent countries should have the
    same color

77
graph coloring on a network
  • Each node is a human subject. Different
    experimental conditions
  • knowledge of neighbors color
  • knowledge of entire network
  • Compare
  • regular ring lattice
  • small-world topology
  • scale-free networks

Kearns et al., An Experimental Study of the
Coloring Problem on Human Subject Networks,
Science, 313(5788), pp. 824-827, 2006
78
simulation
  • Kearns et al. An Experimental Study of the
    Coloring Problem on Human Subject Networks
  • network structure affects convergence in
    coordination games, e.g. graph coloring
  • http//www.ladamic.com/netlearn/NetLogo4/GraphColo
    ring.html

79
Quiz Q
  • As the rewiring probability is increased from 0
    to 1 the following happens
  • the solution time decreases
  • the solution time increases
  • the solution time initially decreases then
    increases again

80
to sum up
  • network topology influences processes occurring
    on networks
  • what state the nodes converge to
  • how quickly they get there
  • process mechanism matters
  • simple vs. complex contagion
  • coordination
  • learning
  • diffusion can be simple (person to person) or
    complex (individuals having thresholds)
  • people in special network positions (the brokers)
    have an advantage in receiving novel info
    coming up with novel ideas
  • in some scenarios, information diffusion may
    hinder innovation
Write a Comment
User Comments (0)
About PowerShow.com