Title: Discrete Probability on Graphs: Estimation, Reconstruction of
1Discrete Probability on Graphs Estimation,
Reconstruction of Optimization on Networks
- Elchanan Mossel
- UC Berkeley
- At IPAM Mar 2007
2Outline Stochastic Models on Networks
- Disclaimer Big field Biased choice of examples
- an applied view. - Part 0 Two types of Network problems.
- Part I Estimation of statistical quantities in
Gibbs-measures / Markov Random Fields - part II Reconstruction of Stochastic Networks
from observations. - Tree Networks.
- - Directed Acyclic graphs.
- part III Optimization over stochastic models
defined on networks - - Which functions of stochastic models can be
(approximately) optimized efficiently?
3Part 0Two Types of Network Problems
4Two types of Network problems
- Type 1 Structural Network problems.
- Type 2 Distributional Network problems.
- This talk Mostly Distributional network
problems. - Examples of Structural Network problems
- Clustering Partition a graph G (V,E) to V
V1,,Vk such that each Vi is big and there is a
small number of edges between Vi and Vj for i ?
j. - Ranking Given a random walk on a finite set,
find the stationary distribution. - Spectral Techniques are applicable for both
problems.
5A hard Structural Network Problem
- The Graph Isomorphism Problem
- Given two graphs (G,E) and (H,F) is there an
isomorphism, f G ! H one to one s.t. (v1,v2)
2 E iff (f(v1),f(v2)) 2 F. - Clear if two graphs isomorphic, then they have
same spectral structure, but this is not enough - Other open problems exits in this area
- Example of recent work
6part IEstimation in Markov Random Fields
7Gibbs Measures / Graphical Models
- A Gibbs Measure on a (finite) graph G(V,E) is
given by - Node potentials (?v v 2 V) and
- Edge Potentials (?e e 2 E)
- The probability of ? (?(v) v 2 V) 2 AV is
given by
- P? Z-1
- ?v 2 V ?v?(v)
- ?e(v,u) 2 E?e?(v),?(u)
G
- Gibbs measures introduced in Statistical Physics.
- Essential in Machine Learning.
- Also known as Markov Random Fields, Graphical
Models etc.
8Message Passing Algorithms / The Replica Method
- Statistical Problem Given a Gibbs measure
estimate - P?(0) a
- Equivalent to many other Inference Problems.
- Computational View Problem can be NP hard (to
approximate) even in very simple cases. - Statistical Physics view Find Dynamics / Markov
Chains that have P as stationary measure. - Statistical Physics Insight
- Rapid Convergence of Dynamics ? spatial
correlation decay. - A very active area of research Fascinating
Challenges. - Artificial Intelligence / Neuroscience / Replica
view - Solve problem by Message Passing
G
9Message Passing Algorithms / The Replica Method
- Message Passing Algorithms are used to estimate
probabilities on graphical models. - Examples Warning Propagation, Sum-Product,
Belief Propagation etc. - All of these algorithms do exact calculation for
an associated computation tree. - Example Belief Propagation (BP) is a popular
method in AI/Coding for estimating marginal
probabilities P?(0) a for a Gibbs measure G. - It is equivalent TatikondaJordan02 to
calculating marginal probabilities P?(0) a on
the computation tree T(G). - Question How come message passing algorithms
work in practice?
G
T
10Message Passing Algorithms in Coding
- In coding
- BP is used to decode Low Density Parity Check
Codes (LDPC) Gallager62 - Proved to be efficient Luby-Mitzenmacher-Shokroll
ahi-Spielman-98, Richardson-Urbanke-01 - Message passing algorithms work because
- LDPC factor graphs are locally tree-like
- Individual constraints push toward the correct
code word. - Actual analysis uses recursion of random
variables on the tree.
11Message Passing Algorithms -Random 3-SAT
n
x1
x2
x3
x4
x5
x6
x7
x8
m ?n
WalkSAT
Survey propagation
Not satisfiable
Satisfiable
Belief propagation
Not satisfiable
Satisfiable
Myopic
PLR
?
12Message Passing Algorithms for Random 3-SAT
- Message passing algorithms work because
- Random-SAT graphs are locally tree-like
- Far away variables are uncorrelated
- Speculation 1 For Belief Propagation Variables
are un-correlated in a standard sense when ?
3.95 - Thm (Maneva-M-Wainwright-05) Survey Propagation
is just Belief Propagation on an extended Markov
Random Field. - Speculation 2 For Survey Propagation Variables
are un-correlated in the extended Markov Random
Field for all ?. - Speculations 1 2 are under heated discussions
between Physicists, Computer Scientists and
Mathematicians
M. Talagrand
G. Parisi
B. Selman
13Decay of correlation for 3-SAT extended MRF
Partial assignments
0, 1n assignments
01??1?0?
01101???
stars
?10?11??
01101???
????????
14part IIReconstructing Stochastic Network from
observations
15Main Problem
- How to reconstruct the network topology from
observations at a (sub)-set of the nodes? - The Example Reconstructing Trees.
16Two Tree Inference Problems
- In Evolution
- Given a tree of species / mothers, can we infer
ancestral sequence at the root from contemporary
samples? - Phase Transition
- Trade-off between noise and duplication?
- Reconstructing Evolution
- Is it possible to reconstruct evolutionary
history from genetic sequences?
17Defn Markov Model on a Tree
001100011101000011000100
0
s(r)
- Ising/BSC/CFN Model
- Tree T (V,E)
- Node states
- Mutation probabilities
- Number of leaves n
pra
prc
0
s(a)
pab
pa3
1
0
s(c)
s(b)
pc4
pc5
pb1
pb2
1
0
0
0
1
0 Purines (A,G) 1 Pyrimidines (C,T)
s(1)
s(4)
s(5)
s(3)
s(2)
18Defn Phylogenetic Reconstruction Problem
- Phylogenetic Reconstruction
- Given k i.i.d. samples at the n leaves
- Task fully reconstruct the model, i.e. find tree
and mutation probabilities (and, if possible, do
so efficiently) - Studied in
- Biology (dozens of books, 1000s of papers)
Felsenstein04 - TCS (Learning) Ambainis-Desper-Farach-Kannan97
, Farach-Kannan96, Cryan-Goldberg-Goldberg02
- M-Roch
- Combinatorial Phylogeny Erdos-Steel-Szekely-Warn
ow97, 98, M07
s(1)
s(4)
s(5)
s(3)
s(2)
1
1
1
0
0
0
1
1
0
0
1
0
0
1
1
0
1
1
1
0
1
1
1
0
0
19Phase Transition for the Ising model
LOW Temp
HIGH Temp
bias
no bias
typical boundary
typical boundary
2 ?2 gt 1
2?2 lt 1
The transition at 2 ?2 1 was proved
by Bleher-Ruiz-Zagrebnov95,
Ioffe96,Evans-Kenyon-Peres-Schulman00,
Kenyon-Mossel-Peres01,Martinelli-Sinclair-Wei
tz04, Borgs-Chayes-M-Roch06. Also,
spin-glass case studied by Chayes-Chayes-Sethna
-Thouless86. Solvability for 2 ?2 gt 1 was
first proved by Higuchi77 (and
Kesten-Stigum66).
20Steels Favorite Conjecture
n of leaves k of samples
Reconstruction Problem
Phylogeny
conj
N
k n?(1)
conj
k ?(log n)
Y
proof
N
k n?(1)
M03 (J. Comp. Biol.)
proof
k ?(log n)
Y
Random Cluster Model M-Steel04 (Math.
Biosciences.)
CFN Model M-04 (Transaction of AMS),
Daskalakis-M-Roch (STOC06)
21Polynomial Lower Bound at High Mutations
Conditional Independence Data Processing Lemma
- In fact
- M06 (IEEE. Comp. Bio. BioInfo) Shallow
Part of the tree can be efficiently
reconstructed when k O(log n) for all mutation
rates. - Also in practice Daskalakis-Hill-Jaffe-Mihaescu-M
-Rao (Recomb06)
22Reconstruction from short sequences
- Th Daskalakis-M-Roch (STOC06) If T is a tree
on n leaves s.t. - For all e, ?min lt ?(e)lt ?max and 2?2min gt 1, ?max
lt 1. - Then there exists a polynomial time algorithm
that uses sequences of length k O(log n log
?) to reconstruct the topology with probability
1-? in polynomial time where the constant depends
on (?min, ?max).
23Proof Distance Methods
- Associate to each edge e the weight ln (1- 2pe)
- For any two leaves i and j
- ln(1 2 pi,j) ? ln (1 2 pe)
- where the sum is over all e in the path
connecting a to b. - Reconstruction Algorithm
- Estimate pi,j from sequences
- Deduce the topology of the tree
- Problem Need exp. long sequences
- ESSW log n radius neighborhoods determine the
tree ) poly(n) sequence length suffices.
Back
24Four-Point Method
25Balanced Trees
- Two-Step Algorithm M, 2004
- 1) Reconstruct one (or a few) level(s)
- 2) Infer sequences at roots
- 3) Start over
26General Trees Daskalakis, M, Roch, 2006
27Blindfolded Cherry Picking
- Need only one extra step in the algorithm
- Main Loop
- 1) Distance estimation
- 2) Identify cherries from the next level
- 3) Sequence reconstruction
- 4) Detect fake cherries
28Blindfolded Cherry Picking I Edge Disjointness
True Tree
Non Edge-Disjoint Reconstruction
29Blindfolded Cherry Picking II Weight Estimation
30Blindfolded Cherry Picking III Collisions
31Tree Reconstruction in a Nutshell
- Similar Techniques apply to other tree networks
for example - Reconstructing Multicast Networks (Liang-M-Yu,
Bhamidi-Rajagopal-Roch)
32Back to General Problem
- How to reconstruct the network topology from
observations at a (sub)-set of the nodes? - Example 3 Reconstructing Markov Random Fields
from observations at a subset of the nodes ???
33part IIIOptimization over Stochastic Networks
34Motivating Problem
- Problem
- Optimization over stochastic models defined on
networks. - Examples
- Which Genes to knock out in order to kill a
cancer cell? - Which computers to immune in order make a
networks robust? - Which computers to attack in order to fail the
network? - Which individuals to immune to stop a disease
from spreading. - Viral Marketing Which individuals to expose to a
product so as to maximize its distribution? - One case Study Influence in Social Networks
- Joint work with Sebastien Roch.
35models of collective behavior
- examples
- joining a riot
- adopting a product
- going to a movie
- model features
- binary decision
- cascade effect
- network structure
36viral marketing
- referrals, word-of-mouth can be very effective
- ex. Hotmail
- viral marketing
- goal mining the network value of potential
customers - how target a small set of trendsetters, seeds
- example Domingos-Richardson02
- collaborative filtering system
- use MRF to compute influence of each customer
37independent cascade model
- when a node is activated
- it gets one chance to activate each neighbour
- probability of success from u to v is pu,v
0.5
0.33
1.0
0.25
0.5
0.5
0.5
0.5
1.0
0.75
0.5
0.5
0.5
0.25
38generalized models
- graph G(V,E) initial activated set S0
- generalized threshold model Kempe-Kleinberg-Tardo
s03,05 - activation functions fu(S) where S is set of
activated nodes - threshold value ?u uniform in 0,1
- dynamics at time t,set St to St-1 and add all
nodes with fu(St-1) ? ?u - (note the process stops after (at most) n-1
steps) - generalized cascade model KKT03,05
- when node u is activated
- gets one chance to activate each neighbours
- probability of success from u to v pu(v,S) where
S is set of nodes who have already tried (and
failed) to activate u - assumption the pu(v,.)s are order-independent
- theorem KKT03 - the two models are equivalent
39influence maximization
- definition - the influence ?(S) given the initial
seed S is the expected size of the infected set
at termination - definition - in the influence maximization
problem (IMP), we want to find the seed S of
fixed size k that maximizes the influence - theorem KKT03 - the IMP is NP-hard
- reduction from Set Cover ground set U
u1,,un and collection of cover subsets S1,,Sm
u1
S1
u2
S2
independent cascade model
u3
S3
un
Sm
40submodularity
- definition - a set function f V -gt R is
submodular if for all A, B in V - example f(S) g(S) where g is concave
- interpretation discrete concavity or
diminishing returns, indeed submodularity
equivalent to - threshold models
- it is natural to assume that the activation
functions have diminishing returns - supported by observations of Leskovec-Adamic-Hube
rman06 in the context of viral marketing
41main result
- theorem M-Roch06 first conjectured in KKT03
- in the generalized threshold model, if all
activation functions are monotone and submodular,
then the influence is also submodular - corollary M-Roch06 - IMP admits a (1 - e-1 -
?)-approximation algorithm (for all ? gt 0) - this follows from a general result on the
approximation of submodular functions
Nemhauser-Wolsey-Fisher78 - known special cases KKT03,05
- linear threshold model, independent cascade model
- decreasing cascade model, normalized submodular
threshold model
42related work
- sociology
- threshold models Granovetter78, Morris00
- cascades Watts02
- data mining
- viral marketing KKT03,05, Domingos-Richardso
n02 - recommendation networks Leskovec-Singh-Kleinberg
05, Leskovec-Adamic-Huberman06 - economics
- game-theoretic point of view Ellison93,
Young02 - probability theory
- Markov random fields, Glauber dynamics
- percolation
- interacting particle systems voter model,
contact process
43proof sketch
44coupling
- we use the generalized threshold model
- arbitrary sets A, B consider 4 processes
- (At) started at A
- (Bt) started at B
- (Ct) started at A?B
- (Dt) started at A?B
- it suffices to couple the 4 processes in such a
way that for all t - indeed, at termination
- (note this works with . replaced with any w
monotone, submodular)
45proof ideas
- our goal
- antisense coupling
- obvious way to couple use same ?us for all 4
processes - satisfies (1) but not (2)
- antisense using ?u for (At) and (1-?u) for
(Bt) maximizes union - we combine both couplings
- piecemeal growth
- seed sets can be introduced in stages
- we add A?B then A\B and finally B\A
- need-to-know
- not necessary to pick all ?us at beginning
- can unveil only what we need to know
46piecemeal growth
- process started at S (St)
- partition of S S(1),,S(K)
- consider the process (Tt)
- pick ?us
- run the process with seed S(1) until termination
- add S(2) and continue until termination
- add S(3) and so on
- lemma - the sets Sn-1 and TKn-1 have the same
distribution
47antisense coupling
- disjoint sets S, T
- partition of S S(1),,S(K)
- piecemeal process with seeds S(1),,S(K),T (St)
- consider the process (Tt)
- pick ?us
- run piecemeal process with seeds S(1),,S(K)
until termination - add T and continue with threshold values
- lemma - the sets S(K1)n-1 and T(K1)n-1 have the
same distribution
48need-to-know
- proof of lemma
- run the first K stages identically in both
processes - note that for all v not in SKn-1 TKn-1, ?v is
uniformly distributed in - fv(TKn-1),1
- but ?v 1 - ?v fv(TKn-1) has the same
distribution
simulation 1
simulation 2
49proof I
ANTI
50proof II
ANTI
51proof III
- new processes have correct final distribution
- up to time 2n-1, Bt Ct and At Dt so that
- for time 2n, note that
- so by monotonicity and submodularity
- then proceed by induction
52general result
- we have proved
- theorem Mossel-R06 - in the generalized
threshold model, if all activation functions are
submodular, then for any monotone, submodular
function w, the generalized influence - is submodular
- Note A closure property for sub-modular
functions!
53Future Research Directions
- Study optimization problems for other stochastic
models defined on networks. - And another annoying problem where discrete
probability may help - Are there (easily computable? Probabilistic?)
invariants of unlabelled graphs that uniquely
determine them? - Motivation Can one efficiently check if two
graphs are isomorphic?