Discrete Probability on Graphs: Estimation, Reconstruction of - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Discrete Probability on Graphs: Estimation, Reconstruction of

Description:

Discrete Probability on Graphs: Estimation, Reconstruction of – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 54
Provided by: sebasti65
Category:

less

Transcript and Presenter's Notes

Title: Discrete Probability on Graphs: Estimation, Reconstruction of


1
Discrete Probability on Graphs Estimation,
Reconstruction of Optimization on Networks
  • Elchanan Mossel
  • UC Berkeley
  • At IPAM Mar 2007

2
Outline Stochastic Models on Networks
  • Disclaimer Big field Biased choice of examples
    - an applied view.
  • Part 0 Two types of Network problems.
  • Part I Estimation of statistical quantities in
    Gibbs-measures / Markov Random Fields
  • part II Reconstruction of Stochastic Networks
    from observations.
  • Tree Networks.
  • - Directed Acyclic graphs.
  • part III Optimization over stochastic models
    defined on networks
  • - Which functions of stochastic models can be
    (approximately) optimized efficiently?

3
Part 0Two Types of Network Problems
4
Two types of Network problems
  • Type 1 Structural Network problems.
  • Type 2 Distributional Network problems.
  • This talk Mostly Distributional network
    problems.
  • Examples of Structural Network problems
  • Clustering Partition a graph G (V,E) to V
    V1,,Vk such that each Vi is big and there is a
    small number of edges between Vi and Vj for i ?
    j.
  • Ranking Given a random walk on a finite set,
    find the stationary distribution.
  • Spectral Techniques are applicable for both
    problems.

5
A hard Structural Network Problem
  • The Graph Isomorphism Problem
  • Given two graphs (G,E) and (H,F) is there an
    isomorphism, f G ! H one to one s.t. (v1,v2)
    2 E iff (f(v1),f(v2)) 2 F.
  • Clear if two graphs isomorphic, then they have
    same spectral structure, but this is not enough
  • Other open problems exits in this area
  • Example of recent work

6
part IEstimation in Markov Random Fields
7
Gibbs Measures / Graphical Models
  • A Gibbs Measure on a (finite) graph G(V,E) is
    given by
  • Node potentials (?v v 2 V) and
  • Edge Potentials (?e e 2 E)
  • The probability of ? (?(v) v 2 V) 2 AV is
    given by
  • P? Z-1
  • ?v 2 V ?v?(v)
  • ?e(v,u) 2 E?e?(v),?(u)

G
  • Gibbs measures introduced in Statistical Physics.
  • Essential in Machine Learning.
  • Also known as Markov Random Fields, Graphical
    Models etc.

8
Message Passing Algorithms / The Replica Method
  • Statistical Problem Given a Gibbs measure
    estimate
  • P?(0) a
  • Equivalent to many other Inference Problems.
  • Computational View Problem can be NP hard (to
    approximate) even in very simple cases.
  • Statistical Physics view Find Dynamics / Markov
    Chains that have P as stationary measure.
  • Statistical Physics Insight
  • Rapid Convergence of Dynamics ? spatial
    correlation decay.
  • A very active area of research Fascinating
    Challenges.
  • Artificial Intelligence / Neuroscience / Replica
    view
  • Solve problem by Message Passing

G
9
Message Passing Algorithms / The Replica Method
  • Message Passing Algorithms are used to estimate
    probabilities on graphical models.
  • Examples Warning Propagation, Sum-Product,
    Belief Propagation etc.
  • All of these algorithms do exact calculation for
    an associated computation tree.
  • Example Belief Propagation (BP) is a popular
    method in AI/Coding for estimating marginal
    probabilities P?(0) a for a Gibbs measure G.
  • It is equivalent TatikondaJordan02 to
    calculating marginal probabilities P?(0) a on
    the computation tree T(G).
  • Question How come message passing algorithms
    work in practice?

G
T
10
Message Passing Algorithms in Coding
  • In coding
  • BP is used to decode Low Density Parity Check
    Codes (LDPC) Gallager62
  • Proved to be efficient Luby-Mitzenmacher-Shokroll
    ahi-Spielman-98, Richardson-Urbanke-01
  • Message passing algorithms work because
  • LDPC factor graphs are locally tree-like
  • Individual constraints push toward the correct
    code word.
  • Actual analysis uses recursion of random
    variables on the tree.

11
Message Passing Algorithms -Random 3-SAT
n
x1
x2
x3
x4
x5
x6
x7
x8
m ?n

WalkSAT

Survey propagation
Not satisfiable
Satisfiable
Belief propagation
Not satisfiable
Satisfiable
Myopic
PLR
?
12
Message Passing Algorithms for Random 3-SAT
  • Message passing algorithms work because
  • Random-SAT graphs are locally tree-like
  • Far away variables are uncorrelated
  • Speculation 1 For Belief Propagation Variables
    are un-correlated in a standard sense when ?
    3.95
  • Thm (Maneva-M-Wainwright-05) Survey Propagation
    is just Belief Propagation on an extended Markov
    Random Field.
  • Speculation 2 For Survey Propagation Variables
    are un-correlated in the extended Markov Random
    Field for all ?.
  • Speculations 1 2 are under heated discussions
    between Physicists, Computer Scientists and
    Mathematicians

M. Talagrand
G. Parisi
B. Selman
13
Decay of correlation for 3-SAT extended MRF
Partial assignments
0, 1n assignments
01??1?0?
01101???
stars
?10?11??
01101???
????????
14
part IIReconstructing Stochastic Network from
observations
15
Main Problem
  • How to reconstruct the network topology from
    observations at a (sub)-set of the nodes?
  • The Example Reconstructing Trees.

16
Two Tree Inference Problems
  • In Evolution
  • Given a tree of species / mothers, can we infer
    ancestral sequence at the root from contemporary
    samples?
  • Phase Transition
  • Trade-off between noise and duplication?
  • Reconstructing Evolution
  • Is it possible to reconstruct evolutionary
    history from genetic sequences?

17
Defn Markov Model on a Tree
001100011101000011000100
0
s(r)
  • Ising/BSC/CFN Model
  • Tree T (V,E)
  • Node states
  • Mutation probabilities
  • Number of leaves n

pra
prc
0
s(a)
pab
pa3
1
0
s(c)
s(b)
pc4
pc5
pb1
pb2
1
0
0
0
1
0 Purines (A,G) 1 Pyrimidines (C,T)
s(1)
s(4)
s(5)
s(3)
s(2)
18
Defn Phylogenetic Reconstruction Problem
  • Phylogenetic Reconstruction
  • Given k i.i.d. samples at the n leaves
  • Task fully reconstruct the model, i.e. find tree
    and mutation probabilities (and, if possible, do
    so efficiently)
  • Studied in
  • Biology (dozens of books, 1000s of papers)
    Felsenstein04
  • TCS (Learning) Ambainis-Desper-Farach-Kannan97
    , Farach-Kannan96, Cryan-Goldberg-Goldberg02
  • M-Roch
  • Combinatorial Phylogeny Erdos-Steel-Szekely-Warn
    ow97, 98, M07

s(1)
s(4)
s(5)
s(3)
s(2)
1
1
1
0
0
0
1
1
0
0
1
0
0
1
1
0
1
1
1
0
1
1
1
0
0
19
Phase Transition for the Ising model
LOW Temp
HIGH Temp
bias
no bias
typical boundary
typical boundary
2 ?2 gt 1
2?2 lt 1
The transition at 2 ?2 1 was proved
by Bleher-Ruiz-Zagrebnov95,
Ioffe96,Evans-Kenyon-Peres-Schulman00,
Kenyon-Mossel-Peres01,Martinelli-Sinclair-Wei
tz04, Borgs-Chayes-M-Roch06. Also,
spin-glass case studied by Chayes-Chayes-Sethna
-Thouless86. Solvability for 2 ?2 gt 1 was
first proved by Higuchi77 (and
Kesten-Stigum66).
20
Steels Favorite Conjecture
n of leaves k of samples
Reconstruction Problem
Phylogeny
conj
N
k n?(1)
conj
k ?(log n)
Y
proof
N
k n?(1)
M03 (J. Comp. Biol.)
proof
k ?(log n)
Y
Random Cluster Model M-Steel04 (Math.
Biosciences.)
CFN Model M-04 (Transaction of AMS),
Daskalakis-M-Roch (STOC06)
21
Polynomial Lower Bound at High Mutations
Conditional Independence Data Processing Lemma
  • Proof
  • In fact
  • M06 (IEEE. Comp. Bio. BioInfo) Shallow
    Part of the tree can be efficiently
    reconstructed when k O(log n) for all mutation
    rates.
  • Also in practice Daskalakis-Hill-Jaffe-Mihaescu-M
    -Rao (Recomb06)

22
Reconstruction from short sequences
  • Th Daskalakis-M-Roch (STOC06) If T is a tree
    on n leaves s.t.
  • For all e, ?min lt ?(e)lt ?max and 2?2min gt 1, ?max
    lt 1.
  • Then there exists a polynomial time algorithm
    that uses sequences of length k O(log n log
    ?) to reconstruct the topology with probability
    1-? in polynomial time where the constant depends
    on (?min, ?max).

23
Proof Distance Methods
  • Associate to each edge e the weight ln (1- 2pe)
  • For any two leaves i and j
  • ln(1 2 pi,j) ? ln (1 2 pe)
  • where the sum is over all e in the path
    connecting a to b.
  • Reconstruction Algorithm
  • Estimate pi,j from sequences
  • Deduce the topology of the tree
  • Problem Need exp. long sequences
  • ESSW log n radius neighborhoods determine the
    tree ) poly(n) sequence length suffices.

Back
24
Four-Point Method
25
Balanced Trees
  • Two-Step Algorithm M, 2004
  • 1) Reconstruct one (or a few) level(s)
  • 2) Infer sequences at roots
  • 3) Start over

26
General Trees Daskalakis, M, Roch, 2006
27
Blindfolded Cherry Picking
  • Need only one extra step in the algorithm
  • Main Loop
  • 1) Distance estimation
  • 2) Identify cherries from the next level
  • 3) Sequence reconstruction
  • 4) Detect fake cherries

28
Blindfolded Cherry Picking I Edge Disjointness
True Tree
Non Edge-Disjoint Reconstruction
29
Blindfolded Cherry Picking II Weight Estimation
30
Blindfolded Cherry Picking III Collisions
31
Tree Reconstruction in a Nutshell
  • Similar Techniques apply to other tree networks
    for example
  • Reconstructing Multicast Networks (Liang-M-Yu,
    Bhamidi-Rajagopal-Roch)

32
Back to General Problem
  • How to reconstruct the network topology from
    observations at a (sub)-set of the nodes?
  • Example 3 Reconstructing Markov Random Fields
    from observations at a subset of the nodes ???

33
part IIIOptimization over Stochastic Networks
34
Motivating Problem
  • Problem
  • Optimization over stochastic models defined on
    networks.
  • Examples
  • Which Genes to knock out in order to kill a
    cancer cell?
  • Which computers to immune in order make a
    networks robust?
  • Which computers to attack in order to fail the
    network?
  • Which individuals to immune to stop a disease
    from spreading.
  • Viral Marketing Which individuals to expose to a
    product so as to maximize its distribution?
  • One case Study Influence in Social Networks
  • Joint work with Sebastien Roch.

35
models of collective behavior
  • examples
  • joining a riot
  • adopting a product
  • going to a movie
  • model features
  • binary decision
  • cascade effect
  • network structure

36
viral marketing
  • referrals, word-of-mouth can be very effective
  • ex. Hotmail
  • viral marketing
  • goal mining the network value of potential
    customers
  • how target a small set of trendsetters, seeds
  • example Domingos-Richardson02
  • collaborative filtering system
  • use MRF to compute influence of each customer

37
independent cascade model
  • when a node is activated
  • it gets one chance to activate each neighbour
  • probability of success from u to v is pu,v

0.5
0.33
1.0
0.25
0.5
0.5
0.5
0.5
1.0
0.75
0.5
0.5
0.5
0.25
38
generalized models
  • graph G(V,E) initial activated set S0
  • generalized threshold model Kempe-Kleinberg-Tardo
    s03,05
  • activation functions fu(S) where S is set of
    activated nodes
  • threshold value ?u uniform in 0,1
  • dynamics at time t,set St to St-1 and add all
    nodes with fu(St-1) ? ?u
  • (note the process stops after (at most) n-1
    steps)
  • generalized cascade model KKT03,05
  • when node u is activated
  • gets one chance to activate each neighbours
  • probability of success from u to v pu(v,S) where
    S is set of nodes who have already tried (and
    failed) to activate u
  • assumption the pu(v,.)s are order-independent
  • theorem KKT03 - the two models are equivalent

39
influence maximization
  • definition - the influence ?(S) given the initial
    seed S is the expected size of the infected set
    at termination
  • definition - in the influence maximization
    problem (IMP), we want to find the seed S of
    fixed size k that maximizes the influence
  • theorem KKT03 - the IMP is NP-hard
  • reduction from Set Cover ground set U
    u1,,un and collection of cover subsets S1,,Sm

u1
S1
u2
S2
independent cascade model
u3
S3


un
Sm
40
submodularity
  • definition - a set function f V -gt R is
    submodular if for all A, B in V
  • example f(S) g(S) where g is concave
  • interpretation discrete concavity or
    diminishing returns, indeed submodularity
    equivalent to
  • threshold models
  • it is natural to assume that the activation
    functions have diminishing returns
  • supported by observations of Leskovec-Adamic-Hube
    rman06 in the context of viral marketing

41
main result
  • theorem M-Roch06 first conjectured in KKT03
    - in the generalized threshold model, if all
    activation functions are monotone and submodular,
    then the influence is also submodular
  • corollary M-Roch06 - IMP admits a (1 - e-1 -
    ?)-approximation algorithm (for all ? gt 0)
  • this follows from a general result on the
    approximation of submodular functions
    Nemhauser-Wolsey-Fisher78
  • known special cases KKT03,05
  • linear threshold model, independent cascade model
  • decreasing cascade model, normalized submodular
    threshold model

42
related work
  • sociology
  • threshold models Granovetter78, Morris00
  • cascades Watts02
  • data mining
  • viral marketing KKT03,05, Domingos-Richardso
    n02
  • recommendation networks Leskovec-Singh-Kleinberg
    05, Leskovec-Adamic-Huberman06
  • economics
  • game-theoretic point of view Ellison93,
    Young02
  • probability theory
  • Markov random fields, Glauber dynamics
  • percolation
  • interacting particle systems voter model,
    contact process

43
proof sketch
44
coupling
  • we use the generalized threshold model
  • arbitrary sets A, B consider 4 processes
  • (At) started at A
  • (Bt) started at B
  • (Ct) started at A?B
  • (Dt) started at A?B
  • it suffices to couple the 4 processes in such a
    way that for all t
  • indeed, at termination
  • (note this works with . replaced with any w
    monotone, submodular)

45
proof ideas
  • our goal
  • antisense coupling
  • obvious way to couple use same ?us for all 4
    processes
  • satisfies (1) but not (2)
  • antisense using ?u for (At) and (1-?u) for
    (Bt) maximizes union
  • we combine both couplings
  • piecemeal growth
  • seed sets can be introduced in stages
  • we add A?B then A\B and finally B\A
  • need-to-know
  • not necessary to pick all ?us at beginning
  • can unveil only what we need to know

46
piecemeal growth
  • process started at S (St)
  • partition of S S(1),,S(K)
  • consider the process (Tt)
  • pick ?us
  • run the process with seed S(1) until termination
  • add S(2) and continue until termination
  • add S(3) and so on
  • lemma - the sets Sn-1 and TKn-1 have the same
    distribution

47
antisense coupling
  • disjoint sets S, T
  • partition of S S(1),,S(K)
  • piecemeal process with seeds S(1),,S(K),T (St)
  • consider the process (Tt)
  • pick ?us
  • run piecemeal process with seeds S(1),,S(K)
    until termination
  • add T and continue with threshold values
  • lemma - the sets S(K1)n-1 and T(K1)n-1 have the
    same distribution

48
need-to-know
  • proof of lemma
  • run the first K stages identically in both
    processes
  • note that for all v not in SKn-1 TKn-1, ?v is
    uniformly distributed in
  • fv(TKn-1),1
  • but ?v 1 - ?v fv(TKn-1) has the same
    distribution

simulation 1
simulation 2
49
proof I
ANTI
50
proof II
ANTI
51
proof III
  • new processes have correct final distribution
  • up to time 2n-1, Bt Ct and At Dt so that
  • for time 2n, note that
  • so by monotonicity and submodularity
  • then proceed by induction

52
general result
  • we have proved
  • theorem Mossel-R06 - in the generalized
    threshold model, if all activation functions are
    submodular, then for any monotone, submodular
    function w, the generalized influence
  • is submodular
  • Note A closure property for sub-modular
    functions!

53
Future Research Directions
  • Study optimization problems for other stochastic
    models defined on networks.
  • And another annoying problem where discrete
    probability may help
  • Are there (easily computable? Probabilistic?)
    invariants of unlabelled graphs that uniquely
    determine them?
  • Motivation Can one efficiently check if two
    graphs are isomorphic?
Write a Comment
User Comments (0)
About PowerShow.com