Using Structure Indices for Efficient Approximation of Network Properties

About This Presentation

Title:

Using Structure Indices for Efficient Approximation of Network Properties

Description:

How close is each actor to an Academy Award winner? Closeness Centrality ... Closeness to all winners of Academy Awards for best actor in the past 10 years ... –

Number of Views:78

Avg rating:3.0/5.0

Slides: 39

Provided by: csK4

Learn more at: https://www.cs.kent.edu

Category:

more less

Transcript and Presenter's Notes

Title: Using Structure Indices for Efficient Approximation of Network Properties

1
Using Structure Indices for Efficient
Approximation of Network Properties

Matthew J. Rattigan, Marc Maier, and David
Jensen
University of Massachusetts Amherst
Data Mining
November 27, 2006
Deborah Stoffer

2
The Problem

Recent research works with very large networks
Millions of nodes
Calculating network statistics on very large
networks can be difficult
Shortest paths
Betweenness centrality
The proportion of all shortest paths in the
network that run through a given node
Closeness centrality
The average distance from the given node to every
other node in the network

3
The Problem

The most efficient known algorithms for
calculating betweenness centrality and closeness
centrality are O(ne n2logn)
n number of nodes
e number of edges
Calculations for path finding can have even
higher complexity
Require bidirectional breadth-first search

4
The Problem

Example - Rexa citation graph
Papers in computer science and related fields
Largest connected component contains 165,000
nodes (papers) and 321,000 edges (citations)
Finding a path of length 15 requires the
exploration of 65,000 nodes

5
The Problem
6
Network Structure Index (NSI)

Similar to the type of index commonly used to
speed queries in modern database systems
Can be constructed once for a given graph and
then used to speed the calculations of many
measures on the graph
Two components of a NSI
Set of annotations on every node in the network
that provide information about relative or
absolute location
For G(V,E) the annotations define A V ? S, where
S is an arbitrarily complex annotation space
A distance function that uses the annotations to
define graph distance between pairs of nodes by
mapping pairs of node annotations to a positive
real number
D S x S ? R

7
Types of Network Structure Indices

All Pairs Shortest Path (APSP)
Degree
Landmark
Global Network Positioning (GNP)
Zone
Distance to Zone (DTZ)

8
All Pairs Shortest Path NSI

Node annotations
Consist of an n x n matrix (n V) containing
the optimal path distances between all pairs of
nodes
Distance function
A simple lookup in the matrix

9
Degree NSI

Node annotations
Annotate each node with its undirected degree
within the graph
Distance function between source node s and
target node t
DDegree (s, t) 2n degree (s) degree (t)

10
Landmark NSI

Randomly designate a small number of nodes in the
network to serve as navigational beacons
Node annotations
Annotate nodes in the graph by flooding out from
each landmark and recording the graph distance to
each node in the network
Gives a vector of graph distances for each node
Distance function

11
Landmark NSI
12
Global Network Positioning NSI

Node annotation
Annotation uses a nonlinear optimization
algorithm to create a multidimensional coordinate
system that encodes the location of each node
within the network
Distance function is the Manhattan distance
between node pairs

13
Zone NSI

Node annotations
Each node is annotated with a d-dimensional
vector of zone labels
Distance function

14
Zone NSI Algorithm

For d dimensions
Randomly select k seed nodes, assign them zone
labels 1 through k, and place them in the labeled
set
Place all other nodes in the unlabeled set
While the unlabeled set is not empty
Randomly select a node l from the labeled set
Randomly select a node u from the unlabeled set
that is a neighbor to l
Assign u to the same zone as l and move it to the
labeled set

15
Zone NSI
16
Distance to Zone (DTZ) NSI

Hybrid between Landmark and Zone NSIs
Node annotations
Divide the graph into zones and for each node u
and zone Z calculate the distance from u to the
closest node in Z
Distance function

17
Distance to Zone (DTZ) NSI
18
Complexity of Different NSIs
19
Search Performance

Optimality of the lengths of paths found
Path ratio
pf is the length of the found paths
po is the length of the optimal paths
r is the number of randomly selected pairs of
nodes in the graph
P 1.0 indicates an NSI that finds optimal
paths
P 1.0 indicates a poor performing NSI

20
Search Performance

Performance gain
Exploration ratio
ef is the number of nodes explored by best-first
search
eb is the number of nodes that are explored using
a bidirectional breadth-first search
r is the number of pairs of nodes in the graph
E values close to zero indicate good search
performance
E values greater than 1.0 indicate poor search
performance

21
Search Performance

NSIs evaluated on synthetic graphs
Random
Rewired lattices
Forest Fire

22
Search Performance
23
Search Performance
24
Search Performance
25
Search Performance
26
Constant Time Distance Estimation

Can sometimes use an NSI to directly estimate the
graph distance between any two nodes
Can use the DTZ annotation distance to estimate
actual graph distances
Annotate the graph as described for the DTZ NSI
Randomly sample p pairs of nodes in the graph and
perform breadth-first search to obtain their
exact graph distance
Use linear regression to obtain an equation for
estimated distance

27
Constant Time Distance Estimation
28
Constant Time Distance Estimation
29
Constant Time Distance Estimation

Simple distance can be used to produce a wide
variety of attributes on nodes, which can be used
by data mining algorithms that analyze graphs
Label nodes with their distance to a particular
node in a graph
How close is each actor to Kevin Bacon?
Label nodes with the minimum or maximum distance
to one of a set of designated nodes
How close is each actor to an Academy Award
winner?

30
Closeness Centrality

Measures the proximity of a given node in a
network to every other node
Important to social network dynamics
Accurate estimates of closeness centrality often
impossible to calculate for large data sets
Using an NSI for path finding can estimate
closeness centrality efficiently

31
Closeness Centrality
32
Closeness Centrality

A measure of centrality can be used to produce
attributes on nodes that may be useful to
knowledge discovery algorithms
Determine the closeness of every node to a
collection of key nodes
Closeness to all winners of Academy Awards for
best actor in the past 10 years
Constrain closeness calculations for members of
clusters
Closeness rank of an actor within their movie
industry
Weight closeness based on the attributes of the
outlying nodes
Closeness to winners of Academy Awards weighted
by how recent an award

33
Betweenness Centrality

Measures the number of short paths on which a
given node lies
Important to social network dynamics
Accurate estimates of betweenness centrality
often impossible to calculate for large data
sets

34
Betweenness Centrality

Can estimate betweenness using the paths
identified through NSI navigation
Randomly sample pairs of nodes and discover the
shortest path between them
Count the number of times each node in the graph
appears on one of these paths to obtain a
betweenness ranking

35
Betweenness Centrality
36
Betweenness Centrality

A high betweenness score can indicate a bridge
between two communities
An actor that has played in movies belonging to
different movie industries
Betweenness centrality can be used to create
features on nodes that are useful for data
mining
Calculate betweenness centrality for particular
groups of nodes
Actors that sit between winners of Academy Awards
for best picture and the IMDbs Bottom 100, the
worst 100 movies as voted by users of the
Internet Movie Database

37
Conclusions

The NSIs Zone and DTZ allow efficient and
accurate estimation of path lengths between
arbitrary nodes in a network
Efficient calculations of network statistics
allow a better range of potential approaches to
knowledge discovery
All potential NSIs have not been exhaustively
researched
NSIs could have other applications
Finding connection subgraphs
Approximating neighborhood functions