Title: SI 614 Directed
1SI 614Directed weighted networks, minimum
spanning trees, flow
Lecture 12 Instructor Lada Adamic
2Outline
- directed networks
- prestige
- weighted networks
- minimum spanning trees
- flow
3Review of centrality in undirected
networks Comparison
- Comparing across these 3 centrality values
- Generally, the 3 centrality types will be
positively correlated - When they are not (low) correlated, it probably
tells you something interesting about the network.
Â
Low Degree
Low Closeness
Low Betweenness
High Degree
Â
Embedded in cluster that is far from the rest of
the network
Ego's connections are redundant - communication
bypasses him/her
High Closeness
Key player tied to important important/active
alters
Â
Probably multiple paths in the network, ego is
near many people, but so are many others
High Betweenness
Ego's few ties are crucial for network flow
Very rare cell. Would mean that ego monopolizes
the ties from a small number of people to many
others.
Â
slide Jim Moody
4Centrality in Social Networks Power / Eigenvalue
Bonacich Power Centrality Actors centrality
(prestige) is equal to a function of the prestige
of those they are connected to. Thus, actors who
are tied to very central actors should have
higher prestige/ centrality than those who are
not.
- a is a scaling vector, which is set to normalize
the score. - b reflects the extent to which you weight the
centrality of people ego is tied to. - R is the adjacency matrix (can be valued)
- I is the identity matrix (1s down the diagonal)
- 1 is a matrix of all ones.
slide Jim Moody
5Centrality in Social Networks Power / Eigenvalue
Bonacich Power Centrality
The magnitude of b reflects the radius of power.
Small values of b weight local structure, larger
values weight global structure. If b is
positive, then ego has higher centrality when
tied to people who are central. If b is
negative, then ego has higher centrality when
tied to people who are not central. As b
approaches zero, you get degree centrality.
slide Jim Moody
6Centrality in Social Networks Power / Eigenvalue
Bonacich Power Centrality
b 0.23
slide Jim Moody
7Centrality in Social Networks Power / Eigenvalue
Bonacich Power Centrality
b-.35
b.35
slide Jim Moody
8Centrality in Social Networks Power / Eigenvalue
Bonacich Power Centrality
b.23
b -.23
slide Jim Moody
9Examples of directed networks?
- WWW
- food webs
- population dynamics
- influence
- hereditary
- citation
- transcription regulation networks
- neural networks
10Prestige in directed social networks
- when prestige may be the right word
- admiration
- influence
- gift-giving
- trust
- directionality especially important in instances
where ties may not be reciprocated (e.g. dining
partners choice network) - when prestige may not be the right word
- gives advice to (can reverse direction)
- gives orders to (- -)
- lends money to (- -)
- dislikes
- distrusts
11Extensions of undirected degree centrality -
prestige
- degree centrality
- indegree centrality
- a paper that is cited by many others has high
prestige - a person nominated by many others for an reward
has high prestige
12Extensions of undirected closeness centrality
- closeness centrality usually implies
- all paths should lead to you
- and unusually not
- paths should lead from you to everywhere else
- usually consider only vertices from which the
node i in question can be reached
13Influence range
- The influence range of i is the set of vertices
who are reachable from the node i
14Extending betweenness centrality to directed
networks
- We now consider the fraction of all directed
paths between any two vertices that pass through
a node
paths between j and k that pass through i
betweenness of vertex i
all paths between j and k
- Only modification when normalizing, we have
(N-1)(N-2) instead of (N-1)(N-2)/2, because we
have twice as many ordered pairs as unordered
pairs
15Directed geodesics
- A node does not necessarily lie on a geodesic
from j to k if it lies on a geodesic from k to j
j
k
16Prestige in Pajek
- Calculating the indegree prestige
- NetgtPartitiongtDegreegtInput
- to view, select FilegtPartitiongtEdit
- if you need to reverse the direction of each tie
first (e.g. lends money to -gt borrows
from)NetgtTransformgtTranspose - Influence range (a.k.a. input domain)
- Netgtk-NeighboursgtInput
- enter the number of the vertex, and 0 to consider
all vertices that eventually lead to your chosen
vertex - to find out the size of the input domain, select
InfogtPartition - Calculate the size of the input domains for all
vertices - NetgtPartitionsgtDomaingtInput
- Can also limit to only neighbors within some
distance
17Proximity prestige in Pajek
- Direct nominations (choices) should count more
than indirect ones - Nominations from second degree neighbors should
count more than third degree ones - So consider proximity prestige
- Cp(ni)
fraction of all vertices that are in is input
domain
average distance from i to vertex in input domain
18Weighted networks
- Examples
- email communication
- sports matches
- packet transfer
- population movement
- co-authorship
- food webs
- Weighted treatment of data/algorithms usually
left for future work
19But what are weights good for?
- Defining thresholds
- Shortest paths that dont take long
- Flow/capacity of a network
20Food webs
- Food webs
- usually considered as binary networks
- problems in defining threshold fluxes
- do killer whales who eat bears count?
- weights
- interaction frequency
- acts of predation per hectare per day
- carbon flow (prey to predator)
- grams of Carbon per meter squared per year
- interaction strength (predator on prey)
- (carbon flow of prey to predator)/ (biomass of
predator)
Lake carbon flow
21Co-authorship networks
- The weight assigned to each edge is the sum of
the number of papers in which two people were
co-authors, divided by the total number of people
in that paper - large-scale high energy physics collaboration
producing a paper with 100 authors is less
evidence of direct collaboration than an article
in Social Networks with only two co-authors. - Should we normalize?
- all weights from i to other nodes should sum to
1? (probably not)
all papers where i and j were coauthors
number of authors of paper k
22Symmetry in normalization
- If normalizing by the sum of values for each node
assume simple weighting number ofpapers
co-authored
2
1
3
wij 3/31 wji 3/151/5
3
j
3
i
6
- Cosine similarity symmetric values
- assume the weight for each paper is wk 1/(nk-1)
- i and j each have vectors of 0s and ws
depending on whetherthey authored paper k - normalize by the length of both vectors
23Other similarity Measures
Simple matching Dices Coefficient Jaccards
Coefficient Cosine CoefficientOverlap Coefficient
a1
a3
p2
p9
p1
p4
p11
p5
p3
p6
p10
p8
a2
p7
Q set of papers authored by a1 D set of papers
authored by a2
24Weighted shortest paths
- Routes
- shortest route from Chicago to Boston
- vertex intersection
- edge weights road distances
- alternative weights expected time traveled, gas
consumed - usually sum the weights from each segment
-
finish
surface road 25 mph, 50 miles 2 hours
freeway, 70 mph 30 miles/70 mph 26 minutes
start
freeway, 65 mph 40 miles/65 mph 37 minutes
25Reliable paths through social networks
- The probability of transmitting a message or
infectious agent could be related to the strength
of the tie - e.g. rather than summing the weights, we might
multiply the probabilities of getting through
p 1
p 0.001
p 0.05
p 0.5
p 0.5
Probability of getting an idea through to the
head of labs via CEO (0.0011 0.001), via
direct manager (0.50.5 0.25)
26Shortest Path Problem
- Given a weighted graph and two vertices u and v,
we want to find a path of minimum total weight
between u and v. - Length of a path is the sum of the weights of its
edges. - Example
- Shortest path between Providence and Honolulu
- Applications
- Internet packet routing
- Flight reservations
- Driving directions
849
PVD
ORD
1843
142
SFO
802
LGA
1205
1743
337
1387
HNL
2555
1099
1233
LAX
1120
DFW
MIA
slide by Huajie Zhang, http//www.cs.unb.ca/cours
es/cs3913/
27Negative weights
- Shortest paths usually undefined for edges with
negative weights if there are negative cycles
present
2
4
3
-3
28Shortest Path Properties
- Property 1
- A subpath of a shortest path is itself a
shortest path - Property 2
- There is a tree of shortest paths from a start
vertex to all the other vertices - Example
- Tree of shortest paths from Providence
849
PVD
ORD
1843
142
SFO
802
LGA
1205
1743
337
1387
HNL
2555
1099
1233
LAX
1120
DFW
MIA
slide by Huajie Zhang, http//www.cs.unb.ca/cours
es/cs3913/
29Dijkstras Algorithm
- The distance of a vertex v from a vertex s is the
length of a shortest path between s and v - Dijkstras algorithm computes the distances of
all the vertices from a given start vertex s - Assumptions
- the graph is connected
- the edges are undirected
- the edge weights are nonnegative
- We grow a cloud of vertices, beginning with s
and eventually covering all the vertices - We store with each vertex v a label d(v)
representing the distance of v from s in the
subgraph consisting of the cloud and its adjacent
vertices - At each step
- We add to the cloud the vertex u outside the
cloud with the smallest distance label, d(u) - We update the labels of the vertices adjacent to
u
slide by Huajie Zhang, http//www.cs.unb.ca/cours
es/cs3913/
30Edge Relaxation
- Consider an edge e (u,z) such that
- u is the vertex most recently added to the cloud
- z is not in the cloud
- The relaxation of edge e updates distance d(z) as
follows - d(z) ? mind(z),d(u) weight(e)
d(u) 50
d(z) 75
10
e
u
z
s
d(u) 50
d(z) 60
10
e
u
z
s
slide by Huajie Zhang, http//www.cs.unb.ca/cours
es/cs3913/
31Example
0
A
4
8
2
4
2
8
7
1
C
B
D
3
9
?
?
2
5
E
F
0
0
A
A
4
4
8
8
2
2
3
2
8
3
2
7
7
1
7
1
C
B
D
C
B
D
3
9
3
9
5
11
5
8
2
5
2
5
E
F
E
F
slide by Huajie Zhang, http//www.cs.unb.ca/cours
es/cs3913/
32Example (cont.)
0
A
4
8
2
3
2
7
7
1
C
B
D
3
9
5
8
2
5
E
F
0
A
4
8
2
3
2
7
7
1
C
B
D
3
9
5
8
2
5
E
F
slide by Huajie Zhang, http//www.cs.unb.ca/cours
es/cs3913/
33Minimum spanning trees
- Connect all vertices with a single tree
- Consider a communications company, such as ATT
or GTE that needs to build a communication
network that connects n different users. The
cost of making a link joining i and j is cij.
What is the minimum cost of connecting all of the
users?
Common assumption the only links possible are
the ones directly joining two nodes.
web.mit.edu/jorlin/www/15.082/Lectures/16_Spannin
g_Trees.ppt
34Electronic Circuitry
- Consider a system with a number of electronic
components. In order to make two pins i and j of
different components electrically equivalent, one
can connect i and j by a wire. How can we
connect n different pins in this way to make them
electrically equivalent to each other so as to
minimize the total wire length.
web.mit.edu/jorlin/www/15.082/Lectures/16_Spannin
g_Trees.ppt
35Minimum Cost Spanning Tree Problem
- Undirected network G (N, A).
- (i, j) is the same arc as (j, i).
- We associate with each arc (i, j) ? A a cost
cij. - A spanning tree T of G is a connected acyclic
subgraph that spans all the nodes. A connected
graph with n nodes and n 1 arcs is a spanning
tree. - The minimum cost spanning tree problem is to find
a spanning tree of minimum cost.
web.mit.edu/jorlin/www/15.082/Lectures/16_Spannin
g_Trees.ppt
36A Minimum Cost Spanning Tree Problem
10
8
2
4
6
35
15
17
1
30
25
20
21
40
3
5
7
15
11
web.mit.edu/jorlin/www/15.082/Lectures/16_Spannin
g_Trees.ppt
37A Minimum Cost Spanning Tree
10
8
2
4
6
35
15
17
1
30
25
20
21
40
3
5
7
15
11
web.mit.edu/jorlin/www/15.082/Lectures/16_Spannin
g_Trees.ppt
38Prim-Jarnik Algorithm
- Vertex based algorithm
- Grows one tree T, one vertex at a time
- A cloud covering the portion of T already
computed - Label the vertices v outside the cloud with
keyv the minimum weigth of an edge connecting
v to a vertex in the cloud, keyv , if no
such edge exists
www.cs.earlham.edu/celikeb/fall_2005/cs310_aads/l
ecture_slides/ch23_minimum_spanning_trees.ppt
39Prim Example
www.cs.earlham.edu/celikeb/fall_2005/cs310_aads/l
ecture_slides/ch23_minimum_spanning_trees.ppt
40Prim Example (2)
www.cs.earlham.edu/celikeb/fall_2005/cs310_aads/l
ecture_slides/ch23_minimum_spanning_trees.ppt
41Prim Example (3)
www.cs.earlham.edu/celikeb/fall_2005/cs310_aads/l
ecture_slides/ch23_minimum_spanning_trees.ppt
42Kruskal's Algorithm
- The algorithm adds the cheapest edge that
connects two trees of the forest
MST-Kruskal(G,w) 01 A Æ 02 for each vertex v Î
VG do 03 Make-Set(v) 04 sort the edges of E
by non-decreasing weight w 05 for each edge (u,v)
ÃŽ E, in order by non-decreasing weight do 06 if
Find-Set(u) ¹ Find-Set(v) then 07 A A È
(u,v) 08 Union(u,v) 09 return A
www.cs.earlham.edu/celikeb/fall_2005/cs310_aads/l
ecture_slides/ch23_minimum_spanning_trees.ppt
43Kruskal Example
www.cs.earlham.edu/celikeb/fall_2005/cs310_aads/l
ecture_slides/ch23_minimum_spanning_trees.ppt
44Kruskal Example (2)
www.cs.earlham.edu/celikeb/fall_2005/cs310_aads/l
ecture_slides/ch23_minimum_spanning_trees.ppt
45Kruskal Example (3)
www.cs.earlham.edu/celikeb/fall_2005/cs310_aads/l
ecture_slides/ch23_minimum_spanning_trees.ppt
46Kruskal Example (4)
www.cs.earlham.edu/celikeb/fall_2005/cs310_aads/l
ecture_slides/ch23_minimum_spanning_trees.ppt
47Network flow
- Applications
- traffic transportation
- maximum number of cars that can commute from
Berkley to San Francisco during rush hour - fluid networks pipes that carry liquids
- computer networks packets traveling along fiber
- extended applications (from Kleinberg Tardos,
Algorithm Design) - bipartite matching problem
- number of disjoint paths between two vertices
- survey design
- airline scheduling
- image segmentation
- baseball elimination
48Max flow problem how much stuff can we get from
source to sink per unit time?
Capacity
7
Sink
Source
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
49Equivalent tasks
- Find a cut with minimum capacity
- Find maximum flow from source to sink
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
50A Flow
3
5
7
2
residual graph
5
2
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
51Augmenting Paths
- A path from source to sink in the residual graph
of a given flow - If there is an augmenting path in the residual
graph, we can push more flow
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
52Ford-Fulkerson Method
- initialize total flow to 0
- residual graph G G
- while augmenting path exist in G
- pick a augmenting path P in G
- m bottleneck capacity of P
- add m to total flow
- push flow of m along P
- update G
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
53Example
3
2
1
3
1
1
3
1
3
1
2
1
4
3
2
2
1
1
1
2
4
2
4
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
54Example
3
2
1
3
1
1
3
1
3
1
2
1
4
3
2
2
1
1
1
2
4
2
4
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
55Example
3
2
1
3
1
1
3
1
3
1
2
1
4
3
2
2
1
1
1
2
3
1
3
1
1
1
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
56Example
3
2
1
3
1
1
3
1
3
1
2
1
4
3
2
2
1
1
1
2
3
1
3
1
1
1
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
57Example
1
2
1
3
1
1
1
1
1
1
2
1
4
3
2
2
1
1
1
2
3
1
3
1
1
1
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
58Example
1
2
1
3
1
1
1
1
1
1
1
4
3
2
2
1
1
1
2
3
1
3
1
1
1
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
59Example
1
2
1
3
1
1
1
1
1
1
1
1
3
3
2
1
1
1
1
1
2
2
2
2
2
2
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
60Answer Max Flow 4
2
2
2
2
2
1
1
1
2
2
2
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
61Answer Minimum Cut 4
3
2
1
3
1
1
3
1
3
1
1
4
3
2
2
1
1
1
2
4
2
4
www.comp.nus.edu.sg/ooiwt/slides/2004-cs3233-grap
h2.ppt
62project status report
- worth 5 of your grade, meant to keep you on
track - 2-3 weeks later in-class presentation
- 1 month later final project report due
- what it should do
- include part of your project proposal as intro
- include result summaries (including figures
tables). - be 4-6 pages
- include references to and briefly (paragraph or
2) discuss some related work. - include a plan of remaining work.
- It is graded on a 0-5 scale
- 5 - same as 4, but very complete and already
shows interesting new insights - 4 - data, more than basic analysis (e.g. looked
at robustness, community structure, centrality,
etc. if applicable) - 3 - some data, preliminary analysis (imported
data into Pajek or GUESS, counted things up,
visualized, if possible) - 2 - some data, no results
- 1 - attempts made to get project started, but
nothing worked out (no data, no results) - 0 - no work done
63GUESS installation
- Windows
- unzip the files into a folder
- edit the guess.bat (a batch executable file) so
that - _at_rem set GUESS_HOMEc\program files\GUESS
becomes _at_set GUESS_HOMEC\PROGRA1\GUESS
if you installed into c\Program Files\GUESS - else you can try installing into a directory with
no spaces in the name and have (e.g.) _at_set
GUESS_HOMEC\apps\GUESS