Discrete Probability on Graphs: Estimation, Reconstruction of

About This Presentation

Title:

Discrete Probability on Graphs: Estimation, Reconstruction of

Description:

Discrete Probability on Graphs: Estimation, Reconstruction of – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 54

Provided by: sebasti65

Category:

more less

Transcript and Presenter's Notes

Title: Discrete Probability on Graphs: Estimation, Reconstruction of

1
Discrete Probability on Graphs Estimation,
Reconstruction of Optimization on Networks

Elchanan Mossel
UC Berkeley
At IPAM Mar 2007

2
Outline Stochastic Models on Networks

Disclaimer Big field Biased choice of examples
- an applied view.
Part 0 Two types of Network problems.
Part I Estimation of statistical quantities in
Gibbs-measures / Markov Random Fields
part II Reconstruction of Stochastic Networks
from observations.
Tree Networks.
- Directed Acyclic graphs.
part III Optimization over stochastic models
defined on networks
- Which functions of stochastic models can be
(approximately) optimized efficiently?

3
Part 0Two Types of Network Problems
4
Two types of Network problems

Type 1 Structural Network problems.
Type 2 Distributional Network problems.
This talk Mostly Distributional network
problems.
Examples of Structural Network problems
Clustering Partition a graph G (V,E) to V
V1,,Vk such that each Vi is big and there is a
small number of edges between Vi and Vj for i ?
j.
Ranking Given a random walk on a finite set,
find the stationary distribution.
Spectral Techniques are applicable for both
problems.

5
A hard Structural Network Problem

The Graph Isomorphism Problem
Given two graphs (G,E) and (H,F) is there an
isomorphism, f G ! H one to one s.t. (v1,v2)
2 E iff (f(v1),f(v2)) 2 F.
Clear if two graphs isomorphic, then they have
same spectral structure, but this is not enough
Other open problems exits in this area
Example of recent work

6
part IEstimation in Markov Random Fields
7
Gibbs Measures / Graphical Models

A Gibbs Measure on a (finite) graph G(V,E) is
given by
Node potentials (?v v 2 V) and
Edge Potentials (?e e 2 E)
The probability of ? (?(v) v 2 V) 2 AV is
given by

P? Z-1
?v 2 V ?v?(v)
?e(v,u) 2 E?e?(v),?(u)

Gibbs measures introduced in Statistical Physics.
Essential in Machine Learning.
Also known as Markov Random Fields, Graphical
Models etc.

8
Message Passing Algorithms / The Replica Method

Statistical Problem Given a Gibbs measure
estimate
P?(0) a
Equivalent to many other Inference Problems.
Computational View Problem can be NP hard (to
approximate) even in very simple cases.
Statistical Physics view Find Dynamics / Markov
Chains that have P as stationary measure.
Statistical Physics Insight
Rapid Convergence of Dynamics ? spatial
correlation decay.
A very active area of research Fascinating
Challenges.
Artificial Intelligence / Neuroscience / Replica
view
Solve problem by Message Passing

G
9
Message Passing Algorithms / The Replica Method

Message Passing Algorithms are used to estimate
probabilities on graphical models.
Examples Warning Propagation, Sum-Product,
Belief Propagation etc.
All of these algorithms do exact calculation for
an associated computation tree.
Example Belief Propagation (BP) is a popular
method in AI/Coding for estimating marginal
probabilities P?(0) a for a Gibbs measure G.
It is equivalent TatikondaJordan02 to
calculating marginal probabilities P?(0) a on
the computation tree T(G).
Question How come message passing algorithms
work in practice?

G
T
10
Message Passing Algorithms in Coding

In coding
BP is used to decode Low Density Parity Check
Codes (LDPC) Gallager62
Proved to be efficient Luby-Mitzenmacher-Shokroll
ahi-Spielman-98, Richardson-Urbanke-01
Message passing algorithms work because
LDPC factor graphs are locally tree-like
Individual constraints push toward the correct
code word.
Actual analysis uses recursion of random
variables on the tree.

11
Message Passing Algorithms -Random 3-SAT
n
x1
x2
x3
x4
x5
x6
x7
x8
m ?n

WalkSAT

Survey propagation
Not satisfiable
Satisfiable
Belief propagation
Not satisfiable
Satisfiable
Myopic
PLR
?
12
Message Passing Algorithms for Random 3-SAT

Message passing algorithms work because
Random-SAT graphs are locally tree-like
Far away variables are uncorrelated
Speculation 1 For Belief Propagation Variables
are un-correlated in a standard sense when ?
3.95
Thm (Maneva-M-Wainwright-05) Survey Propagation
is just Belief Propagation on an extended Markov
Random Field.
Speculation 2 For Survey Propagation Variables
are un-correlated in the extended Markov Random
Field for all ?.
Speculations 1 2 are under heated discussions
between Physicists, Computer Scientists and
Mathematicians

M. Talagrand
G. Parisi
B. Selman
13
Decay of correlation for 3-SAT extended MRF
Partial assignments
0, 1n assignments
01??1?0?
01101???
stars
?10?11??
01101???
????????
14
part IIReconstructing Stochastic Network from
observations
15
Main Problem

How to reconstruct the network topology from
observations at a (sub)-set of the nodes?
The Example Reconstructing Trees.

16
Two Tree Inference Problems

In Evolution
Given a tree of species / mothers, can we infer
ancestral sequence at the root from contemporary
samples?
Phase Transition
Trade-off between noise and duplication?
Reconstructing Evolution
Is it possible to reconstruct evolutionary
history from genetic sequences?

17
Defn Markov Model on a Tree
001100011101000011000100
0
s(r)

Ising/BSC/CFN Model
Tree T (V,E)
Node states
Mutation probabilities
Number of leaves n

pra
prc
0
s(a)
pab
pa3
1
0
s(c)
s(b)
pc4
pc5
pb1
pb2
1
0
0
0
1
0 Purines (A,G) 1 Pyrimidines (C,T)
s(1)
s(4)
s(5)
s(3)
s(2)
18
Defn Phylogenetic Reconstruction Problem

Phylogenetic Reconstruction
Given k i.i.d. samples at the n leaves
Task fully reconstruct the model, i.e. find tree
and mutation probabilities (and, if possible, do
so efficiently)
Studied in
Biology (dozens of books, 1000s of papers)
Felsenstein04
TCS (Learning) Ambainis-Desper-Farach-Kannan97
, Farach-Kannan96, Cryan-Goldberg-Goldberg02
M-Roch
Combinatorial Phylogeny Erdos-Steel-Szekely-Warn
ow97, 98, M07

s(1)
s(4)
s(5)
s(3)
s(2)
1
1
1
0
0
0
1
1
0
0
1
0
0
1
1
0
1
1
1
0
1
1
1
0
0
19
Phase Transition for the Ising model
LOW Temp
HIGH Temp
bias
no bias
typical boundary
typical boundary
2 ?2 gt 1
2?2 lt 1
The transition at 2 ?2 1 was proved
by Bleher-Ruiz-Zagrebnov95,
Ioffe96,Evans-Kenyon-Peres-Schulman00,
Kenyon-Mossel-Peres01,Martinelli-Sinclair-Wei
tz04, Borgs-Chayes-M-Roch06. Also,
spin-glass case studied by Chayes-Chayes-Sethna
-Thouless86. Solvability for 2 ?2 gt 1 was
first proved by Higuchi77 (and
Kesten-Stigum66).
20
Steels Favorite Conjecture
n of leaves k of samples
Reconstruction Problem
Phylogeny
conj
N
k n?(1)
conj
k ?(log n)
Y
proof
N
k n?(1)
M03 (J. Comp. Biol.)
proof
k ?(log n)
Y
Random Cluster Model M-Steel04 (Math.
Biosciences.)
CFN Model M-04 (Transaction of AMS),
Daskalakis-M-Roch (STOC06)
21
Polynomial Lower Bound at High Mutations
Conditional Independence Data Processing Lemma

Proof

In fact
M06 (IEEE. Comp. Bio. BioInfo) Shallow
Part of the tree can be efficiently
reconstructed when k O(log n) for all mutation
rates.
Also in practice Daskalakis-Hill-Jaffe-Mihaescu-M
-Rao (Recomb06)

22
Reconstruction from short sequences

Th Daskalakis-M-Roch (STOC06) If T is a tree
on n leaves s.t.
For all e, ?min lt ?(e)lt ?max and 2?2min gt 1, ?max
lt 1.
Then there exists a polynomial time algorithm
that uses sequences of length k O(log n log
?) to reconstruct the topology with probability
1-? in polynomial time where the constant depends
on (?min, ?max).

23
Proof Distance Methods

Associate to each edge e the weight ln (1- 2pe)
For any two leaves i and j
ln(1 2 pi,j) ? ln (1 2 pe)
where the sum is over all e in the path
connecting a to b.
Reconstruction Algorithm
Estimate pi,j from sequences
Deduce the topology of the tree
Problem Need exp. long sequences
ESSW log n radius neighborhoods determine the
tree ) poly(n) sequence length suffices.

Back
24
Four-Point Method
25
Balanced Trees

Two-Step Algorithm M, 2004
1) Reconstruct one (or a few) level(s)
2) Infer sequences at roots
3) Start over

26
General Trees Daskalakis, M, Roch, 2006
27
Blindfolded Cherry Picking

Need only one extra step in the algorithm
Main Loop
1) Distance estimation
2) Identify cherries from the next level
3) Sequence reconstruction
4) Detect fake cherries

28
Blindfolded Cherry Picking I Edge Disjointness
True Tree
Non Edge-Disjoint Reconstruction
29
Blindfolded Cherry Picking II Weight Estimation
30
Blindfolded Cherry Picking III Collisions
31
Tree Reconstruction in a Nutshell

Similar Techniques apply to other tree networks
for example
Reconstructing Multicast Networks (Liang-M-Yu,
Bhamidi-Rajagopal-Roch)

32
Back to General Problem

How to reconstruct the network topology from
observations at a (sub)-set of the nodes?
Example 3 Reconstructing Markov Random Fields
from observations at a subset of the nodes ???

33
part IIIOptimization over Stochastic Networks
34
Motivating Problem

Problem
Optimization over stochastic models defined on
networks.
Examples
Which Genes to knock out in order to kill a
cancer cell?
Which computers to immune in order make a
networks robust?
Which computers to attack in order to fail the
network?
Which individuals to immune to stop a disease
from spreading.
Viral Marketing Which individuals to expose to a
product so as to maximize its distribution?
One case Study Influence in Social Networks
Joint work with Sebastien Roch.

35
models of collective behavior

examples
joining a riot
adopting a product
going to a movie
model features
binary decision
cascade effect
network structure

36
viral marketing

referrals, word-of-mouth can be very effective
ex. Hotmail
viral marketing
goal mining the network value of potential
customers
how target a small set of trendsetters, seeds
example Domingos-Richardson02
collaborative filtering system
use MRF to compute influence of each customer

37
independent cascade model

when a node is activated
it gets one chance to activate each neighbour
probability of success from u to v is pu,v

0.5
0.33
1.0
0.25
0.5
0.5
0.5
0.5
1.0
0.75
0.5
0.5
0.5
0.25
38
generalized models

graph G(V,E) initial activated set S0
generalized threshold model Kempe-Kleinberg-Tardo
s03,05
activation functions fu(S) where S is set of
activated nodes
threshold value ?u uniform in 0,1
dynamics at time t,set St to St-1 and add all
nodes with fu(St-1) ? ?u
(note the process stops after (at most) n-1
steps)
generalized cascade model KKT03,05
when node u is activated
gets one chance to activate each neighbours
probability of success from u to v pu(v,S) where
S is set of nodes who have already tried (and
failed) to activate u
assumption the pu(v,.)s are order-independent
theorem KKT03 - the two models are equivalent

39
influence maximization

definition - the influence ?(S) given the initial
seed S is the expected size of the infected set
at termination
definition - in the influence maximization
problem (IMP), we want to find the seed S of
fixed size k that maximizes the influence
theorem KKT03 - the IMP is NP-hard
reduction from Set Cover ground set U
u1,,un and collection of cover subsets S1,,Sm

u1
S1
u2
S2
independent cascade model
u3
S3

un
Sm
40
submodularity

definition - a set function f V -gt R is
submodular if for all A, B in V
example f(S) g(S) where g is concave
interpretation discrete concavity or
diminishing returns, indeed submodularity
equivalent to
threshold models
it is natural to assume that the activation
functions have diminishing returns
supported by observations of Leskovec-Adamic-Hube
rman06 in the context of viral marketing

41
main result

theorem M-Roch06 first conjectured in KKT03
- in the generalized threshold model, if all
activation functions are monotone and submodular,
then the influence is also submodular
corollary M-Roch06 - IMP admits a (1 - e-1 -
?)-approximation algorithm (for all ? gt 0)
this follows from a general result on the
approximation of submodular functions
Nemhauser-Wolsey-Fisher78
known special cases KKT03,05
linear threshold model, independent cascade model
decreasing cascade model, normalized submodular
threshold model

42
related work

sociology
threshold models Granovetter78, Morris00
cascades Watts02
data mining
viral marketing KKT03,05, Domingos-Richardso
n02
recommendation networks Leskovec-Singh-Kleinberg
05, Leskovec-Adamic-Huberman06
economics
game-theoretic point of view Ellison93,
Young02
probability theory
Markov random fields, Glauber dynamics
percolation
interacting particle systems voter model,
contact process

43
proof sketch
44
coupling

we use the generalized threshold model
arbitrary sets A, B consider 4 processes
(At) started at A
(Bt) started at B
(Ct) started at A?B
(Dt) started at A?B
it suffices to couple the 4 processes in such a
way that for all t
indeed, at termination
(note this works with . replaced with any w
monotone, submodular)

45
proof ideas

our goal
antisense coupling
obvious way to couple use same ?us for all 4
processes
satisfies (1) but not (2)
antisense using ?u for (At) and (1-?u) for
(Bt) maximizes union
we combine both couplings
piecemeal growth
seed sets can be introduced in stages
we add A?B then A\B and finally B\A
need-to-know
not necessary to pick all ?us at beginning
can unveil only what we need to know

46
piecemeal growth

process started at S (St)
partition of S S(1),,S(K)
consider the process (Tt)
pick ?us
run the process with seed S(1) until termination
add S(2) and continue until termination
add S(3) and so on
lemma - the sets Sn-1 and TKn-1 have the same
distribution

47
antisense coupling

disjoint sets S, T
partition of S S(1),,S(K)
piecemeal process with seeds S(1),,S(K),T (St)
consider the process (Tt)
pick ?us
run piecemeal process with seeds S(1),,S(K)
until termination
add T and continue with threshold values
lemma - the sets S(K1)n-1 and T(K1)n-1 have the
same distribution

48
need-to-know

proof of lemma
run the first K stages identically in both
processes
note that for all v not in SKn-1 TKn-1, ?v is
uniformly distributed in
fv(TKn-1),1
but ?v 1 - ?v fv(TKn-1) has the same
distribution

simulation 1
simulation 2
49
proof I
ANTI
50
proof II
ANTI
51
proof III

new processes have correct final distribution
up to time 2n-1, Bt Ct and At Dt so that
for time 2n, note that
so by monotonicity and submodularity
then proceed by induction

52
general result

we have proved
theorem Mossel-R06 - in the generalized
threshold model, if all activation functions are
submodular, then for any monotone, submodular
function w, the generalized influence
is submodular
Note A closure property for sub-modular
functions!

53
Future Research Directions

Study optimization problems for other stochastic
models defined on networks.
And another annoying problem where discrete
probability may help
Are there (easily computable? Probabilistic?)
invariants of unlabelled graphs that uniquely
determine them?
Motivation Can one efficiently check if two
graphs are isomorphic?

Write a Comment

User Comments (0)