Bayesian Belief Propagation and Image Interpretation - PowerPoint PPT Presentation

About This Presentation
Title:

Bayesian Belief Propagation and Image Interpretation

Description:

Paper demonstrates that Bayesian Belief Propagation (BBP) is a very good ... Belief Propagation 'Do the right thing' Bayesian algorithm. ... – PowerPoint PPT presentation

Number of Views:164
Avg rating:3.0/5.0
Slides: 44
Provided by: davidsro8
Learn more at: http://www.ai.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Belief Propagation and Image Interpretation


1
Bayesian Belief Propagation and Image
Interpretation
March 13, 2002
Presenter David Rosenberg
2
Overview
  • Deals with problems in which we want to estimate
    local scene properties that may depend, to some
    extent, on global properties
  • Paper demonstrates that Bayesian Belief
    Propagation (BBP) is a very good technique for
    this class of problems
  • In the papers examples, the answers are often
    significantly better and converge significantly
    faster.

3
An Introductory Problem Interpolation
  • Find a sequence of consecutive segments that
  • approximate our data points and
  • has small derivatives for each segment.

?
?
4
Interpolation Problem (continued)
  • We can formalize this problem as minimizing the
    following cost functional

NOTE J(Y) is a sum of terms, each containing
neighboring variables
  • Standard solutions to minimization problems
  • Gradient Descent / Relaxation
  • Gauss-Seidel relaxation
  • Successive over relaxation (SOR)
  • Simulated Annealing

5
The Core Idea
  • We can rewrite certain cost functional
    minimization problems as MAP estimate problems
    for Markov Random Fields
  • This is important to because Bayesian Belief
    Propagation gives optimal solutions very quickly,
    for MRFs with certain graph structures

6
Mapping Cost Minimization to MAP
  • Note that minimizing J(Y) is equivalent to
    maximizing exp( - J(Y) ).
  • Suppose our cost functional has the form
  • Then we can also find Y that maximizes

Already looks like a product of localized
potentials.
7
Mapping Cost Minimization to MAP (continued)
  • By constraining J to be a sum, weve reduced our
    problem to the maximization of
  • Since this function is strictly positive, we can
    normalize to create a PDF.
  • (This could be a Gibbs distribution!)

8
Mapping Cost Minimization to MAP (continued)
  • So finding the ys that minimize J(Y), subject to
    the observations that constrain some ys is
    equivalent to finding the mode (peak) of the
    distribution P(YY).
  • This is just the MAP estimate of Y given Y.

9
Cost Minimization to MAP on MRF (continued)
  • We have
  • If we can associate each r.v. in Y to a node of a
    graph G
  • such that each of the YCs is a clique in G,
  • then P(Y) is a Gibbs distribution w.r.t. G.
  • If P(Y) is a Gibbs distribution w.r.t. a graph G,
  • then the r.v.s Y are a Markov random field
    (MRF),
  • (Hammersley-Clifford Theorem)

10
MAP on MRF to Cost Function Minimization
  • Start with the MAP problem on an MRF.
  • Every MRF has a Gibbs distribution,
  • also by the Hammersley-Clifford theorem.
  • By reversing our steps, we will find a cost
    function J(Y) whose minimization corresponds to
    the MAP estimate on the MRF.
  • Thus any problem we can solve by finding the MAP
    estimate on an MRF, we can also solve by
    minimizing some cost functional.

11
Our Simplified Problem (from paper)
  • We have
  • hidden scene variables Xj
  • observed image variables Yj
  • We assume that the following graph structure is
    implicit in our cost functional
  • The Problem
  • Given some Yjs, estimate the Xjs

12
Straightforward Exact Inference
  • Given the joint PDF
  • typically specified using potential functions
  • We can just marginalize out to
  • get the aposteriori distribution for each Xj
  • We can immediately extract the
  • MAP estimate -- just the mode of the aposteriori
    distribution
  • Least squares estimate -- just the expected value
    of the aposteriori distribution

13
Derivation of belief propagation
14
The posterior factorizes
15
Propagation rules
16
Propagation rules
17
Belief, and message updates
j
j
i

i
18
Optimal solution in a chain or treeBelief
Propagation
  • Do the right thing Bayesian algorithm.
  • For Gaussian random variables over time Kalman
    filter.
  • For hidden Markov models forward/backward
    algorithm (and MAP variant is Viterbi).

19
No factorization with loops!
20
The (Discrete) Interpolation Problem
  • Used the integers 1,,5 as the domain and
    range.
  • Used evidence

21
The (Discrete) Interpolation Problem
  • How do we put the evidence into the MRF?
  • As a prior on the random variables.
  • Comes from the noise or sensor model.
  • I tried two priors
  • 1. (example priors)
  • Observed 1 --gt Prior 25 16 9 4 1
  • Observed 3 --gt Prior 9 16 25 16 9
  • 2. (example priors)
  • Observed 1 --gt Prior 625 256 81 4 1

22
The (Discrete) Interpolation Problem
  • How do we specify the derivative constraint?
  • We adjust the potential functions between
    adjacent random variables
  • We want potential functions that look something
    like
  • 10 1 1 1 11 10 1 1 11 1 10 1
    11 1 1 10 11 1 1 1 10
  • I call the ratio 101 the tightness.

23
Results for First Prior
Tightness 2
Tightness 4
Tightness 6
24
Results for Second Prior
Tightness 2
Tightness 4
Tightness 6
25
Weisss Examples
  • Interior/exterior example.
  • Motion example
  • In both examples, BBP had results that were much
    better, and converged much faster than other
    techniques.

26
Conclusions When to use BBP?
  • Among all problems expressible as cost function
    minimization.
  • Among problems expressible as MAP or MMSE
    problems on MRFs
  • Graph topology should be relatively sparse.
  • Messages per iteration increases linearly with
    the number of edges
  • Reasonably small number of dimensions for r.v.
    distributions.
  • Approximate Inference

27
EXTRA SLIDES
28
Slide on Weisss Motion Detection
29
Mention some approximate inference approaches
30
Complexity issues with message passing
  • How long are messages
  • How many messages do we have to pass per
    iteration
  • How many iterations until convergence
  • Problem quickly becomes intractible

31
Slides on message passing with jointly gaussian
distributions???
32
BACKUP SLIDES
33
Markov Random Fields
  • Let G be an undirected graph
  • nodes 1, , n
  • Associate a random variable X_t to each node t in
    G.
  • (X_1, , X_n) is a Markov random field on G if
  • Every r.v. is independent of its nonneighbors
    conditioned on its neighbors.
  • P(X_tx_t X_s x_s for all s \neq t
    P(X_tx_t X_s x_s for all s\in N(t)),where
    N(s) be the set of neighbors of a node s.

34
Specifying a Markov Random Field
  • Nice if we could just specify P( X N(X) )for
    all r.v.s X (as with Bayesian networks)
  • Unfortunately, this will overspecify the joint
    PDF.
  • E.g. X_1 -- X_2.
  • Joint PDF has 3 degrees of freedom
  • Conditiona PDFs X_1X_2 and X_2X_1 have 2
    degrees of freedom each
  • The Hammersley-Clifford Theorem helps to specify
    MRFs

35
The Gibbs Distribution
  • A Gibbs distribution w.r.t. graph G is a
    probability mass function that can be expressed
    in the form
  • P(x_1, , x_n) Prod _ Cliques C V_C(x_1, ..,
    x_n)
  • where V_C(x_1, , x_n) depends only on those x_I
    in C.
  • We can combine potential functions into products
    from maximal cliques, so
  • P(x_1, , x_n) Prod _ MaxCliques C V_C(x_1,
    .., x_n)
  • This may be better in certain circumstances
    because we dont have to specify as many
    potential functions

36
Hammersley Clifford Theorem
  • Let the r.vs X_j have a positive joint
    probability mass function.
  • Then the Hammersley Clifford Theorem says that
    X_j is a Markov random field on graph G iff it
    has a Gibbs distirubtion w.r.t G.
  • Side Note Hammserley and Clifford discovered
    this theorem in 1971, but they didnt publish it
    because they kept thinking they should be able to
    remove or relax the positivity assumption. They
    couldnt. Clifford published the result in 1990.
  • Specifying the potential functions is equivalent
    to specifying the joint probability distribution
    of all variables.
  • Now its easy to specify a valid MRF
  • still not easy to determine the degrees of
    freedom in the distribution (normalization)

37
(No Transcript)
38
Incorporating Evidence nodes into MRFs
  • We would like to have nodes that dont change
    their beliefs -- they are just observations.
  • Can we do this via the potential functions on the
    non-maximal clique containing just that node?
  • I tink this is what they do in the Yair Weiss
    implementation
  • What if we dont want to specify a potential
    function? Make it identically one, since its in
    a product.

39
From cost functional to transition matrix
40
From cost functional to update rule
41
From update rule to transition matrix
42
The factoriation into pair wise potentials --
good for general Markov networks
43
Other Stuff
  • For shorthand, we will write x (x_1, , x_n).
Write a Comment
User Comments (0)
About PowerShow.com