News and Notes: Feb 9 - PowerPoint PPT Presentation

About This Presentation
Title:

News and Notes: Feb 9

Description:

Collective Human Computation in Networks: Beyond Shortest Paths ... This is the domain of social network theory. Sometimes also referred to as link analysis ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 64
Provided by: CIS4
Category:
Tags: feb | networks | news | notes | social

less

Transcript and Presenter's Notes

Title: News and Notes: Feb 9


1
News and Notes Feb 9
  • Watts talk reminder
  • tomorrow at noon, Annenberg School (3620 Walnut),
    Room 110
  • extra credit reports
  • Turn in revisions of NW Construction Project,
    Task 1
  • MK will review quickly
  • deadline for Task 2 set shortly start working!
  • Description of Tuesday class experiments
  • Social Network Theory, continued

2
Collective Human Computation in Networks Beyond
Shortest Paths
  • Travers and Milgram, Dodds et al., Kleinberg,
  • human networks can efficiently route messages
  • using only local topology and info on target
  • What about other computations?
  • minimum coloring
  • maximum matching
  • maximum independent set
  • Participation on Tuesday is for course credit
  • Start at 1205 sharp
  • You will be given a score for each experiment
  • but as long as you participate, you will receive
    full credit
  • 50 cash prize will be split between those with
    the highest total score
  • An experimental investigation of the Price of
    Anarchy
  • comparison of centralized social optimum and
    decentralized greedy solutions

3
Graph Colorings
  • A coloring of an undirected graph is
  • an assignment of a color (label) to each vertex
  • such that no pair connected by an edge have the
    same color
  • chromatic number of graph G fewest colors needed
  • Example application
  • classes and exam slots
  • chromatic number determines length of exam period
  • Heres a coloring demo
  • Computation of chromatic numbers is hard
  • (poor) approximations are possible
  • Interesting fact the four-color theorem for
    planar graphs
  • Here is a description of our Lifester Coloring
    Experiment

4
Matchings in Graphs
  • A matching of an undirected graph is
  • a subset of the edges
  • such that no vertex is touched more than once
  • perfect matching every vertex touched exactly
    once
  • perfect matchings may not always exist (e.g. N
    odd)
  • maximum matching largest number of edges
  • Can be found efficiently here is a perfect
    matching demo
  • Example applications
  • pairing of compatible partners
  • perfect matching nobody left out
  • jobs and qualified workers
  • perfect matching full employment, and all jobs
    filled
  • clients and servers
  • perfect matching all clients served, and no
    server idle
  • Here is a description of our Lifester Matching
    Experiment

5
Cliques and Independent Sets
  • A clique in a graph G is a set of vertices
  • informal that are all directly connected to each
    other
  • formal whose induced subgraph is complete
  • all vertices in direct communication, exchange,
    competition, etc.
  • the tightest possible social structure
  • an edge is a clique of just 2 vertices
  • generally interested in large cliques
  • Independent set
  • set of vertices whose induced subgraph is empty
    (no edges)
  • vertices entirely isolated from each other
    without help of others
  • Maximum clique or independent set largest in the
    graph
  • Maximal clique or independent set cant grow any
    larger
  • Here is a description of our Lifester Independent
    Set Experiment

6
The Results
7
The chromatic number of the Lifester network is
4...
8
and the 43 class members present computed a
legal 5-coloring.
9
The Lifester network has a maximum independent
set of size 16...
10
and the class computed a maximal independent
set of size 13. (mean degree of winners 4 mean
degree of losers 5.3)
11
The Lifester network has a maximum matching of
size 21 and the class found one. (mean degree of
score 2 5 mean degree of others 3.8)
12
Just 40 More Times and You Can Buy a Share of
Google
CHEN,CHARLENE CHENG,ZAISHAO FAULKNER,ELIZABETH FRA
NK,WILLIAM GROFF,MAX JOHNNIDIS,CHRISTOPHER LAWEE,
AARON LEIKER,MATTHEW MUTREJA,MOHIT RYTERBAND,JASON
SILENGO,MICHAEL SWANSON,EDWARD
Post-experiment analysis assignment due in class
Tuesday!
13
Social Network Theory
  • Networked Life
  • CSE 112
  • Spring 2005
  • Prof. Michael Kearns

14
Natural Networks and Universality
  • Consider the many kinds of networks we have
    examined
  • social, technological, business, economic,
    content,
  • These networks tend to share certain informal
    properties
  • large scale continual growth
  • distributed, organic growth vertices decide
    who to link to
  • interaction restricted to links
  • mixture of local and long-distance connections
  • abstract notions of distance geographical,
    content, social,
  • Do natural networks share more quantitative
    universals?
  • What would these universals be?
  • How can we make them precise and measure them?
  • How can we explain their universality?
  • This is the domain of social network theory
  • Sometimes also referred to as link analysis

15
Some Interesting Quantities
  • Connected components
  • how many, and how large?
  • Network diameter
  • maximum (worst-case) or average?
  • exclude infinite distances? (disconnected
    components)
  • the small-world phenomenon
  • Clustering
  • to what extent to links tend to cluster
    locally?
  • what is the balance between local and
    long-distance connections?
  • what roles do the two types of links play?
  • Degree distribution
  • what is the typical degree in the network?
  • what is the overall distribution?

16
A Canonical Natural Network has
  • Few connected components
  • often only 1 or a small number independent of
    network size
  • Small diameter
  • often a constant independent of network size
    (like 6)
  • or perhaps growing only logarithmically with
    network size
  • typically exclude infinite distances
  • A high degree of clustering
  • considerably more so than for a random network
  • in tension with small diameter
  • A heavy-tailed degree distribution
  • a small but reliable number of high-degree
    vertices
  • quantifies Gladwells connectors
  • often of power law form

17
Some Models of Network Generation
  • Random graphs (Erdos-Renyi models)
  • gives few components and small diameter
  • does not give high clustering and heavy-tailed
    degree distributions
  • is the mathematically most well-studied and
    understood model
  • Watts-Strogatz and related models
  • give few components, small diameter and high
    clustering
  • does not give heavy-tailed degree distributions
  • Preferential attachment
  • gives few components, small diameter and
    heavy-tailed distribution
  • does not give high clustering
  • Hierarchical networks
  • few components, small diameter, high clustering,
    heavy-tailed
  • Affiliation networks
  • models group-actor formation
  • Nothing magic about any of the measures or
    models

18
Approximate Roadmap
  • Examine a series of models of network generation
  • macroscopic properties they do and do not entail
  • pros and cons of each model
  • Examine some real life case studies
  • Study some dynamics issues (e.g. navigation)
  • Move into in-depth study of the web as network

19
Probabilistic Models of Networks
  • All of the network generation models we will
    study are probabilistic or statistical in nature
  • They can generate networks of any size
  • They often have various parameters that can be
    set
  • size of network generated
  • average degree of a vertex
  • fraction of long-distance connections
  • The models generate a distribution over networks
  • Statements are always statistical in nature
  • with high probability, diameter is small
  • on average, degree distribution has heavy tail
  • Thus, were going to need some basic statistics
    and probability theory

20
Statistics and Probability TheoryThe Absolute,
Bare Minimum Essentials
21
Probability and Random Variables
  • A random variable X is simply a variable that
    probabilistically assumes values in some set
  • set of possible values sometimes called the
    sample space S of X
  • sample space may be small and simple or large and
    complex
  • S Heads, Tails, X is outcome of a coin flip
  • S 0,1,,U.S. population size, X is number
    voting democratic
  • S all networks of size N, X is generated by
    preferential attachment
  • Behavior of X determined by its distribution (or
    density)
  • for each value x in S, specify PrX x
  • these probabilities sum to exactly 1 (mutually
    exclusive outcomes)
  • complex sample spaces (such as large networks)
  • distribution often defined implicitly by simpler
    components
  • might specify the probability that each edge
    appears independently
  • this induces a probability distribution over
    networks
  • may be difficult to compute induced distribution

22
Some Basic Notions and Laws
  • Independence
  • let X and Y be random variables
  • independence for any x and y, PrX x Y y
    PrXxPrYy
  • intuition value of X does not influence value of
    Y, vice-versa
  • dependence
  • e.g. X, Y coin flips, but Y is always opposite of
    X
  • Expected (mean) value of X
  • only makes sense for numeric random variables
  • average value of X according to its
    distribution
  • formally, EX S (PrX x X), sum is over all
    x in S
  • often denoted by m
  • always true EX Y EX EY
  • true only for independent random variables EXY
    EXEY
  • Variance of X
  • Var(X) E(X m)2 often denoted by s2
  • standard deviation is sqrt(Var(X)) s
  • Union bound
  • for any X, Y, PrXx or Yy lt PrXx PrYy

23
Convergence to Expectations
  • Let X1, X2,, Xn be
  • independent random variables
  • with the same distribution PrXx
  • expectation m EX and variance s2
  • independent and identically distributed (i.i.d.)
  • essentially n repeated trials of the same
    experiment
  • natural to examine r.v. Z (1/n) S Xi, where sum
    is over i1,,n
  • example number of heads in a sequence of coin
    flips
  • example degree of a vertex in the random graph
    model
  • EZ EX what can we say about the
    distribution of Z?
  • Central Limit Theorem
  • as n becomes large, Z becomes normally
    distributed
  • with expectation m and variance s2/n
  • heres a demo

24
The Normal Distribution
  • The normal or Gaussian density
  • applies to continuous, real-valued random
    variables
  • characterized by mean (average) m and standard
    deviation s
  • density at x is defined as
  • (1/(s sqrt(2p))) exp(-(x-m)2/2s2)
  • special case m 0, s 1 a exp(-x2/b) for some
    constants a,b gt 0
  • peaks at x m, then dies off exponentially
    rapidly
  • the classic bell-shaped curve
  • exam scores, human body temperature,
  • here are some examples
  • remarks
  • can control mean and standard deviation
    independently
  • can make as broad as we like, but always have
    finite variance

25
The Binomial Distribution
  • The binomial distribution
  • coin with Prheads p, flip n times
  • probability of getting exactly k heads
  • choose(n,k) pk (1-p)(n-k)
  • for large n and p fixed
  • approximated well by a normal with m pn, s
    sqrt(np(1-p))
  • s/m ? 0 as n grows
  • leads to strong large deviation bounds

26
The Poisson Distribution
  • The Poisson distribution
  • like binomial, applies to variables taken on
    integer values gt 0
  • often used to model counts of events
  • number of phone calls placed in a given time
    period
  • number of times a neuron fires in a given time
    period
  • single free parameter l
  • probability of exactly x events
  • exp(-l) lx/x!
  • mean and variance are both l
  • here are some examples
  • binomial distribution with n large, p l/n (l
    fixed)
  • converges to Poisson with mean l

27
Heavy-tailed Distributions
  • Pareto or power law distributions
  • for variables assuming integer values gt 0
  • probability of value x 1/xa
  • typically 0 lt a lt 2 smaller a gives heavier tail
  • here are some examples
  • sometimes also referred to as being scale-free
  • For binomial, normal, and Poisson distributions
    the tail probabilities approach 0 exponentially
    fast
  • Inverse polynomial decay vs. inverse exponential
    decay
  • What kind of phenomena does this distribution
    model?
  • What kind of process would generate it?

28
Distributions vs. Data
  • All these distributions are idealized models
  • In practice, we do not see distributions, but
    data
  • Thus, there will be some largest value we observe
  • Also, can be difficult to eyeball data and
    choose model
  • So how do we distinguish between Poisson, power
    law, etc?
  • Typical procedure
  • might restrict our attention to a range of values
    of interest
  • accumulate counts of observed data into
    equal-sized bins
  • look at counts on a log-log plot
  • note that
  • power law
  • log(PrX x) log(1/xa) -a log(x)
  • linear, slope a
  • Normal
  • log(PrX x) log(a exp(-x2/b)) log(a)
    x2/b
  • non-linear, concave near mean
  • Poisson
  • log(PrX x) log(exp(-l) lx/x!)
  • also non-linear

29
Zipfs Law
  • Look at the frequency of English words
  • the is the most common, followed by of, to,
    etc.
  • claim frequency of the n-th most common 1/n
    (power law, a 1)
  • General theme
  • rank events by their frequency of occurrence
  • resulting distribution often is a power law!
  • Other examples
  • North America city sizes
  • personal income
  • file sizes
  • genus sizes (number of species)
  • lets look at log-log plots of these
  • People seem to dither over exact form of these
    distributions (e.g. value of a), but not heavy
    tails

30
Models of Network Generationand Their Properties
31
The Erdos-Renyi (ER) Model(Random Graphs)
  • A model in which all edges
  • are equally probable
  • appear independently
  • NW size N gt 1 and probability p distribution
    G(N,p)
  • each edge (u,v) chosen to appear with probability
    p
  • N(N-1)/2 trials of a biased coin flip
  • The usual regime of interest is when p 1/N, N
    is large
  • e.g. p 1/2N, p 1/N, p 2/N, p10/N, p
    log(N)/N, etc.
  • in expectation, each vertex will have a small
    number of neighbors
  • will then examine what happens when N ? infinity
  • can thus study properties of large networks with
    bounded degree
  • Degree distribution of a typical G drawn from
    G(N,p)
  • draw G according to G(N,p) look at a random
    vertex u in G
  • what is Prdeg(u) k for any fixed k?
  • Poisson distribution with mean l p(N-1) pN
  • Sharply concentrated not heavy-tailed
  • Especially easy to generate NWs from G(N,p)

32
A Closely Related Model
  • For any fixed m lt N(N-1)/2, define distribution
    G(N,m)
  • choose uniformly at random from all graphs with
    exactly m edges
  • G(N,m) is like G(N,p) with p m/(N(N-1)/2)
    2m/N2
  • this intuition can be made precise, and is
    correct
  • if m cN then p 2c/(N-1) 2c/N
  • mathematically trickier than G(N,p)

33
Another Closely Related Model
  • Graph process model
  • start with N vertices and no edges
  • at each time step, add a new edge
  • choose new edge randomly from among all missing
    edges
  • Allows study of the evolution or emergence of
    properties
  • as the number of edges m grows in relation to N
  • equivalently, as p is increased
  • For all of these models
  • high probability ?? almost all large graphs of
    a given density

34
The Evolution of a Random Network
  • We have a large number n of vertices
  • We start randomly adding edges one at a time
  • At what time t will the network
  • have at least one large connected component?
  • have a single connected component?
  • have small diameter?
  • have a large clique?
  • have a large chromatic number?
  • How gradually or suddenly do these properties
    appear?

35
Recap
  • Model G(N,p)
  • select each of the possible edges independently
    with prob. p
  • expected total number of edges is pN(N-1)/2
  • expected degree of a vertex is p(N-1)
  • degree will obey a Poisson distribution (not
    heavy-tailed)
  • Model G(N,m)
  • select exactly m of the N(N-1)/2 edges to appear
  • all sets of m edges equally likely
  • Graph process model
  • starting with no edges, just keep adding one edge
    at a time
  • always choose next edge randomly from among all
    missing edges
  • Threshold or tipping for (say) connectivity
  • fewer than m m(N) edges ? graph almost
    certainly not connected
  • more than m m(N) edges ? graph almost certainly
    is connected
  • made formal by examining limit as N ? infinity

36
Combining and Formalizing Familiar Ideas
  • Explaining universal behavior through statistical
    models
  • our models will always generate many networks
  • almost all of them will share certain properties
    (universals)
  • Explaining tipping through incremental growth
  • we gradually add edges, or gradually increase
    edge probability p
  • many properties will emerge very suddenly during
    this process

prob. NW connected
number of edges
37
Monotone Network Properties
  • Often interested in monotone graph properties
  • let G have the property
  • add edges to G to obtain G
  • then G must have the property also
  • Examples
  • G is connected
  • G has diameter lt d (not exactly d)
  • G has a clique of size gt k (not exactly k)
  • G has chromatic number gt c (not exactly c)
  • G has a matching of size gt m
  • d, k, c, m may depend on NW size N (How?)
  • Difficult to study emergence of non-monotone
    properties as the number of edges is increased
  • what would it mean?

38
Formalizing TippingThresholds for Monotone
Properties
  • Consider Erdos-Renyi G(N,m) model
  • select m edges at random to include in G
  • Let P be some monotone property of graphs
  • P(G) 1 ? G has the property
  • P(G) 0 ? G does not have the property
  • Let m(N) be some function of NW size N
  • formalize idea that property P appears suddenly
    at m(N) edges
  • Say that m(N) is a threshold function for P if
  • let m(N) be any function of N
  • look at ratio r(N) m(N)/m(N) as N ? infinity
  • if r(N) ? 0 probability that P(G) 1 in
    G(N,m(N)) ? 0
  • if r(N) ? infinity probability that P(G) 1 in
    G(N,m(N)) ? 1
  • A purely structural definition of tipping
  • tipping results from incremental increase in
    connectivity

39
So Which Properties Tip?
  • Just about all of them!
  • The following properties all have threshold
    functions
  • having a giant component
  • being connected
  • having a perfect matching (N even)
  • having small diameter
  • Demo look at the following progression
  • giant component ? connectivity ? small diameter
  • in graph process model (add one new edge at a
    time)
  • example 1 example 2 example 3 example 4
    example 5
  • With remarkable consistency (N 50)
  • giant component 40 edges, connected 100,
    small diameter 180

40
Ever More Precise
  • Connected component of size gt N/2
  • threshold function is m(N) N/2 (or p 1/N)
  • note full connectivity impossible
  • Fully connected
  • threshold function is m(N) (N/2)log(N) (or p
    log(N)/N)
  • NW remains extremely sparse only log(N) edges
    per vertex
  • Small diameter
  • threshold is m(N) N(3/2) for diameter 2 (or p
    2/sqrt(N))
  • fraction of possible edges still 2/sqrt(N) ? 0
  • generate very small worlds

41
Other Tipping Points?
  • Perfect matchings
  • consider only even N
  • threshold function is m(N) (N/2)log(N) (or p
    log(N)/N)
  • same as for connectivity!
  • Cliques
  • k-clique threshold is m(N) (1/2)N(2 2/(k-1))
    (p 1/N(2/k-1))
  • edges appear immediately triangles at N/2 etc.
  • Coloring
  • k colors required just as k-cliques appear

42
Erdos-Renyi Summary
  • A model in which all connections are equally
    likely
  • each of the N(N-1)/2 edges chosen randomly
    independently
  • As we add edges, a precise sequence of events
    unfolds
  • graph acquires a giant component
  • graph becomes connected
  • graph acquires small diameter
  • etc.
  • Many properties appear very suddenly (tipping,
    thresholds)
  • All statements are mathematically precise
  • But is this how natural networks form?
  • If not, which aspects are unrealistic?
  • maybe all edges are not equally likely!

43
The Clustering Coefficient of a Network
  • Let nbr(u) denote the set of neighbors of u in a
    graph
  • all vertices v such that the edge (u,v) is in the
    graph
  • The clustering coefficient of u
  • let k nbr(u) (i.e., number of neighbors of u)
  • choose(k,2) max possible of edges between
    vertices in nbr(u)
  • c(u) (actual of edges between vertices in
    nbr(u))/choose(k,2)
  • 0 lt c(u) lt 1 measure of cliquishness of us
    neighborhood
  • Clustering coefficient of a graph
  • average of c(u) over all vertices u

k 4 choose(k,2) 6 c(u) 4/6 0.666
44
Erdos-Renyi Clustering Coefficient
  • Generate a network G according to G(N,p)
  • Examine a typical vertex u in G
  • choose u at random among all vertices in G
  • what do we expect c(u) to be?
  • Answer exactly p!
  • In G(N,m), expect c(u) to be 2m/N(N-1)
  • Both cases c(u) entirely determined by overall
    density
  • Baseline for comparison with more clustered
    models
  • Erdos-Renyi has no bias towards clustered or
    local edges

45
Caveman and Solaria
  • Erdos-Renyi
  • sharing a common neighbor makes two vertices no
    more likely to be directly connected than two
    very distant vertices
  • every edge appears entirely independently of
    existing structure
  • But in many settings, the opposite is true
  • you tend to meet new friends through your old
    friends
  • two web pages pointing to a third might share a
    topic
  • two companies selling goods to a third are in
    related industries
  • Watts Caveman world
  • overall density of edges is low
  • but two vertices with a common neighbor are
    likely connected
  • Watts Solaria world
  • overall density of edges low no special bias
    towards local edges
  • like Erdos-Renyi

46
Making it (Somewhat) Precise the a-model
  • The a-model has the following parameters or
    knobs
  • N size of the network to be generated
  • k the average degree of a vertex in the network
    to be generated
  • p the default probability two vertices are
    connected
  • a adjustable parameter dictating bias towards
    local connections
  • For any vertices u and v
  • define m(u,v) to be the number of common
    neighbors (so far)
  • Key quantity the propensity R(u,v) of u to
    connect to v
  • if m(u,v) gt k, R(u,v) 1 (share too many
    friends not to connect)
  • if m(u,v) 0, R(u,v) p (no mutual friends ? no
    bias to connect)
  • else, R(u,v) p (m(u,v)/k)a (1-p)
  • here are some plots for different a (see Watts
    page 77)
  • Generate NW incrementally
  • using R(u,v) as the edge probability details
    omitted
  • Note a infinity is like Erdos-Renyi (but not
    exactly)

47
Small Worlds and Occams Razor
  • For small a, should generate large clustering
    coefficients
  • we programmed the model to do so
  • Watts claims that proving precise statements is
    hard
  • But we do not want a new model for every little
    property
  • Erdos-Renyi ? small diameter
  • a-model ? high clustering coefficient
  • etc.
  • In the interests of Occams Razor, we would like
    to find
  • a single, simple model of network generation
  • that simultaneously captures many properties
  • Watts small world small diameter and high
    clustering
  • here is a figure showing that this can be
    captured in the a-model

48
Meanwhile, Back in the Real World
  • Watts examines three real networks as case
    studies
  • the Kevin Bacon graph
  • the Western states power grid
  • the C. elegans nervous system
  • For each of these networks, he
  • computes its size, diameter, and clustering
    coefficient
  • compares diameter and clustering to best
    Erdos-Renyi approx.
  • shows that the best a-model approximation is
    better
  • important to be fair to each model by finding
    best fit
  • Overall moral
  • if we care only about diameter and clustering, a
    is better than p

49
Case 1 Kevin Bacon Graph
  • Vertices actors and actresses
  • Edge between u and v if they appeared in a film
    together
  • Here is the data

50
Case 2 Western States Power Grid
  • Vertices power stations in Western U.S.
  • Edges high-voltage power transmission lines
  • Here is the network and data

51
Case 3 C. Elegans Nervous System
  • Vertices neurons in the C. elegans worm
  • Edges axons/synapses between neurons
  • Here is the network and data

52
Two More Examples
  • M. Newman on scientific collaboration networks
  • coauthorship networks in several distinct
    communities
  • differences in degrees (papers per author)
  • empirical verification of
  • giant components
  • small diameter (mean distance)
  • high clustering coefficient
  • Alberich et al. on the Marvel Universe
  • purely fictional social network
  • two characters linked if they appeared together
    in an issue
  • empirical verification of
  • heavy-tailed distribution of degrees (issues and
    characters)
  • giant component
  • rather small clustering coefficient

53
One More (Structural) Property
  • A properly tuned a-model can simultaneously
    explain
  • small diameter
  • high clustering coefficient
  • But what about heavy-tailed degree distributions?
  • a-model and simple variants will not explain
    this
  • intuitively, no bias towards large degree
    evolves
  • all vertices are created equal
  • Can concoct many bad generative models to explain
  • generate NW according to Erdos-Renyi, reject if
    tails not heavy
  • describe fixed NWs with heavy tails
  • all connected to v1 N/2 connected to v2 etc.
  • not clear we can get a precise power law
  • not modeling variation
  • why would the world evolve this way?
  • As always, we want a natural model

54
Preferential Attachment
  • Start with (say) two vertices connected by an
    edge
  • For i 3 to N
  • for each 1 lt j lt i, let d(j) be degree of vertex
    j (so far)
  • let Z S d(j) (sum of all degrees so far)
  • add new vertex i with k edges back to 1,,i-1
  • i is connected back to j with probability d(j)/Z
  • Vertices j with high degree are likely to get
    more links!
  • Rich get richer
  • Natural model for many processes
  • hyperlinks on the web
  • new business and social contacts
  • transportation networks
  • Generates a power law distribution of degrees
  • exponent depends on value of k

55
Two Out of Three Isnt Bad
  • Preferential attachment explains
  • heavy-tailed degree distributions
  • small diameter (log(N), via hubs)
  • Will not generate high clustering coefficient
  • no bias towards local connectivity, but towards
    hubs
  • Can we simultaneously capture all three
    properties?
  • probably, but well stop here
  • soon there will be a fourth property anyway

56
Two Out of Three Isnt Bad
  • Preferential attachment explains
  • heavy-tailed degree distributions
  • small diameter (log(N), via hubs)
  • Will not generate high clustering coefficient
  • no bias towards local connectivity, but towards
    hubs
  • Can we simultaneously capture all three
    properties?
  • probably, but well stop here
  • soon there will be a fourth property anyway

57
The Midterm
  • Midterm date this Thursday, March 3
  • Exam handed out beginning at 12 sharp
  • Pencils down at 120 sharp
  • Closed-book exam only exams and pencils
  • no books, papers, notes, devices, etc.
  • Exam covers everything to date
  • all assigned readings in books and papers
  • all lectures, including todays
  • all assignments and experiments
  • Todays agenda
  • short lecture on search and navigation
  • quick midterm review
  • NW Construction Project Task 2 due at midnight

58
Search and Navigation
59
Finding Short Paths
  • Milgrams experiment, Columbia Small Worlds,
    a-model
  • all emphasize existence of short paths between
    pairs
  • How do individuals find short paths
  • in an incremental, next-step fashion
  • using purely local information about the NW and
    location of target
  • This is not a structural question, but an
    algorithmic one
  • statics vs. dynamics
  • Navigability may impose additional restrictions
    on model!
  • Briefly investigate two alternatives
  • variation on the a-model
  • a social identity model

60
Kleinbergs Model
  • Similar in spirit to the a-model
  • Start with an n by n grid of vertices (so N
    n2)
  • add local connections all vertices within grid
    distance p (e.g. 2)
  • add distant connections
  • q additional connections
  • probability of connection at distance d 1/dr
  • so full model given by choice of p, q and r
  • small r heavy bias towards more local
    long-distance connections
  • large r approach uniformly random
  • Kleinbergs question
  • what value of r permits effective search?
  • Assume parties know only
  • grid address of target
  • addresses of their own direct links
  • Algorithm pass message to neighbor closest to
    target

61
Kleinbergs Result
  • Intuition
  • if r is too small (strong local bias), then
    long-distance connections never help much
    short paths may not even exist
  • if r is too large (no local bias), we may quickly
    get close to the target but then well have to
    use local links to finish
  • think of a transport system with only long-haul
    jets or donkey carts
  • effective search requires a delicate mixture of
    link distances
  • The result (informally)
  • r 2 is the only value that permits rapid
    navigation (log(N) steps)
  • any other value of r will result in time Nc
    for 0 lt c lt 1
  • a critical value phenomenon
  • Note locality of information crucial to this
    argument
  • centralized algorithm may compute short paths at
    large r
  • can recognize when backwards steps are
    beneficial

62
Navigation via Identity
  • Watts et al.
  • we dont navigate social networks by purely
    geographic information
  • we dont use any single criterion recall Dodds
    et al. on Columbia SW
  • different criteria used a different points in the
    chain
  • Represent individuals by a vector of attributes
  • profession, religion, hobbies, education,
    background, etc
  • attribute values have distances between them
    (tree-structured)
  • distance between individuals minimum distance in
    any attribute
  • only need one thing in common to be close!
  • Algorithm
  • given attribute vector of target
  • forward message to neighbor closest to target
  • Permits fast navigation under broad conditions
  • not as sensitive as Kleinbergs model

63
Next Up The Web as Network
Write a Comment
User Comments (0)
About PowerShow.com