Extending Expectation Propagation for Graphical Models

About This Presentation

Title:

Extending Expectation Propagation for Graphical Models

Description:

Incorporate the backward message ... Incorporate forward and observation messages. Extensions of EP ... cliques even when only incorporating one off-tree edge ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 72

Provided by: Ala2

Learn more at: https://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Extending Expectation Propagation for Graphical Models

1
Extending Expectation Propagation for Graphical
Models

Yuan (Alan) Qi
Joint work with Tom Minka

2
Motivation

Graphical models are widely used in real-world
applications, such as wireless communications and
bioinformatics.
Inference techniques on graphical models often
sacrifice efficiency for accuracy or sacrifice
accuracy for efficiency.
Need a method that better balances the trade-off
between accuracy and efficiency.

3
Motivation
Current Techniques
Error
Computational Time
4
Outline

Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic
systems
Poisson tracking
Signal detection for wireless communications
Tree-structured EP on loopy graphs
Conclusions and future work

5
Outline

Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic
systems
Poisson tracking
Signal detection for wireless communications
Tree-structured EP on loopy graphs
Conclusions and future work

6
Graphical Models
Directed ( Bayesian networks) Undirected ( Markov networks)

7
Inference on Graphical Models

Bayesian inference techniques
Belief propagation (BP) Kalman filtering
/smoothing, forward-backward algorithm
Monte Carlo Particle filter/smoothers, MCMC
Loopy BP typically efficient, but not accurate
on general loopy graphs
Monte Carlo accurate, but often not efficient

8
Expectation Propagation in a Nutshell

Approximate a probability distribution by
simpler parametric terms
For directed graphs
For undirected graphs
Each approximation term lives in an
exponential family (e.g. Gaussian)

9
EP in a Nutshell

The approximate term minimizes the
following KL divergence by moment matching

Where the leave-one-out approximation is
10
Limitations of Plain EP

Can be difficult or expensive to analytically
compute the needed moments in order to minimize
the desired KL divergence.
Can be expensive to compute and maintain a valid
approximation distribution q(x), which is
coherent under marginalization.
Tree-structured q(x)

11
Three Extensions

1. Instead of choosing the approximate term
to minimize the following KL divergence

use other criteria.
2. Use numerical approximation to compute
moments Quadrature or Monte Carlo.
3. Allow the tree-structured q(x) to be
non-coherent during the iterations. It only needs
to be coherent in the end.
12
Efficiency vs. Accuracy
Loopy BP (Factorized EP)
Error
Extended EP ?
Monte Carlo
Computational Time
13
Outline

Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic
systems
Poisson tracking
Signal detection for wireless communications
Tree-structured EP on loopy graphs
Conclusions and future work

14
Object Tracking
Guess the position of an object given noisy
observations
Object
15
Bayesian Network
e.g.
(random walk)
want distribution of xs given ys
16
Approximation
Factorized and Gaussian in x
17
Message Interpretation
(forward msg)(observation msg)(backward msg)
Forward Message
Backward Message
Observation Message
18
EP on Dynamic Systems

Filtering t 1, , T
Incorporate forward message
Initialize observation message
Smoothing t T, , 1
Incorporate the backward message
Compute the leave-one-out approximation by
dividing out the old observation messages
Re-approximate the new observation messages
Re-filtering t 1, , T
Incorporate forward and observation messages

19
Extensions of EP

Instead of matching moments, use any method for
approximate filtering.
Examples statistical linearization, unscented
Kalman filter (UKF), mixture of Kalman filters
Turn any deterministic filtering method into a
smoothing method!
All methods can be interpreted as finding
linear/Gaussian approximations to original terms.
Use quadrature or Monte Carlo for term
approximations

20
Example Poisson Tracking

is an integer valued Poisson variate with
mean

21
Poisson Tracking Model
22
Extension of EP Approximate Observation Message

is not Gaussian
Moments of x not analytic
Two approaches
Gauss-Hermite quadrature for moments
Statistical linearization instead of
moment-matching (Turn unscented Kalman filters
into a smoothing method)
Both work well

23
Approximate vs. Exact Posterior
p(xTy1T)
xT
24
Extended EP vs. Monte Carlo Accuracy
Mean
Variance
25
Accuracy/Efficiency Tradeoff
26
EP for Digital Wireless Communication

Signal detection problem
Transmitted signal st
vary to encode each symbol
Complex representation

Im
Re
27
Binary Symbols, Gaussian Noise

Symbols are 1 and 1 (in complex plane)
Received signal yt
Optimal detection is easy

28
Fading Channel

Channel systematically changes amplitude and
phase
changes over time

29
Benchmark Differential Detection

Classical technique
Use previous observation to estimate state
Binary symbols only

30
Bayesian network for Signal Detection
31
Extended-EP Joint Signal Detection and Channel
Estimation

Turn mixture of Kalman filters into a smoothing
method
Smoothing over the last observations
Observations before act as prior for the
current estimation

32
Computational Complexity

Expectation propagation O(nLd2)
Stochastic mixture of Kalman filters O(LMd2)
Rao-blackwised particle smoothers O(LMNd2)
n Number of EP iterations (Typically, 4 or 5)
d Dimension of the parameter vector
L Smooth window length
M Number of samples in filtering (Often larger
than 500)
N Number of samples in smoothing (Larger than
50)
EP is about 5,000 times faster than
Rao-blackwised particle smoothers.

33
Experimental Results
(Chen, Wang, Liu 2000)
Signal-Noise-Ratio
Signal-Noise-Ratio
EP outperforms particle smoothers in efficiency
with comparable accuracy.
34
Bayesian Networks for Adaptive Decoding
The information bits et are coded by a
convolutional error-correcting encoder.
35
EP Outperforms Viterbi Decoding
Signal-Noise-Ratio
36
Outline

Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic
systems
Poisson tracking
Signal detection for wireless communications
Tree-structured EP on loopy graphs
Conclusions and future work

37
Inference on Loopy Graphs
Problem estimate marginal distributions of the
variables indexed by the nodes in a loopy graph,
e.g., p(xi), i 1, . . . , 16.
38
4-node Loopy Graph
Joint distribution is product of pairwise
potentials for all edges
Want to approximate by a simpler
distribution
39
BP vs. TreeEP
TreeEP
BP
40
Junction Tree Representation

p(x) q(x)
Junction tree

p(x) q(x)
Junction tree
41
Two Kinds of Edges

On-tree edges, e.g., (x1,x4) exactly
incorporated into the junction tree
Off-tree edges, e.g., (x1,x2) approximated by
projecting them onto the tree structure

42
KL Minimization

KL minimization moment matching
Match single and pairwise marginals of
Reduces to exact inference on single loops
Use cutset conditioning

and
43
Matching Marginals on Graph
(1) Incorporate edge (x3 x4)
(2) Incorporate edge (x6 x7)
44
Drawbacks of Global Propagation

Update all the cliques even when only
incorporating one off-tree edge
Computationally expensive
Store each off-tree data message as a whole tree
Require large memory size

45
Solution Local Propagation

Allow q(x) be non-coherent during the iterations.
It only needs to be coherent in the end.
Exploit the junction tree representation only
locally propagate information within the minimal
loop (subtree) that is directly connected to the
off-tree edge.
Reduce computational complexity
Save memory

46
(1) Incorporate edge(x3 x4)
(2) Propagate evidence
On this simple graph, local propagation runs
roughly 2 times faster and uses 2 times less
memory to store messages than plain EP
(3) Incorporate edge (x6 x7)
47
New Interpretation of TreeEP

Marry EP with Junction algorithm
Can perform efficiently over hypertrees and
hypernodes

48
4-node Graph

TreeEP the proposed method
GBP generalized belief propagation on triangles
TreeVB variational tree
BP loopy belief propagation Factorized EP
MF mean-field

49
Fully-connected graphs

Results are averaged over 10 graphs with randomly
generated potentials
TreeEP performs the same or better than all
other methods in both accuracy and efficiency!

50
8x8 grids, 10 trials
Method FLOPS Error
Exact 30,000 0
TreeEP 300,000 0.149
BP/double-loop 15,500,000 0.358
GBP 17,500,000 0.003
51
TreeEP versus BP and GBP

TreeEP is always more accurate than BP and is
often faster
TreeEP is much more efficient than GBP and more
accurate on some problems
TreeEP converges more often than BP and GBP

52
Outline

Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic
systems
Poisson tracking
Signal detection for wireless communications
Tree-structured EP on loopy graphs
Conclusions and future work

53
Conclusions

Extend EP on graphical models
Instead of minimizing KL divergence, use other
sensible criteria to generate messages.
Effectively turn any deterministic filtering
method into a smoothing method.
Use quadrature to approximate messages.
Local propagation to save the computation and
memory in tree structured EP.

54
Conclusions
State-of-art Techniques
Error
Computational Time

Extended EP algorithms outperform state-of-art
inference methods on graphical models in the
trade-off between accuracy and efficiency

55
Future Work

More extensions of EP
How to choose a sensible approximation family
(e.g. which tree structure)
More flexible approximation mixture of EP?
Error bound?
Bayesian conditional random fields
More real-world applications

56
End
Contact information yuanqi_at_media.mit.edu
57
Extended EP Accuracy Improves Significantly in
only a Few Iterations
58
EP versus BP

EP approximation is in a restricted family, e.g.
Gaussian
EP approximation does not have to be factorized
EP applies to many more problems
e.g. mixture of discrete/continuous variables

59
EP versus Monte Carlo

Monte Carlo is general but expensive
EP exploits underlying simplicity of the problem
if it exists
Monte Carlo is still needed for complex problems
(e.g. large isolated peaks)
Trick is to know what problem you have

60
(Loopy) Belief propagation

Specialize to factorized approximations
Minimize KL-divergence match marginals of
(partially factorized) and
(fully factorized)
send messages

messages
61
Limitation of BP

If the dynamics or measurements are not linear
and Gaussian, the complexity of the posterior
increases with the number of measurements
I.e. BP equations are not closed
Beliefs need not stay within a given family

or any other exponential family
62
Approximate filtering

Compute a Gaussian belief which approximates the
true posterior
E.g. Extended Kalman filter, statistical
linearization, unscented filter, assumed-density
filter

63
EP perspective

Approximate filtering is equivalent to replacing
true measurement/dynamics equations with
linear/Gaussian equations

Gaussian
implies
Gaussian
64
EP perspective

EKF, UKF, ADF are all algorithms for

Linear, Gaussian
Nonlinear, Non-Gaussian
65
Terminology

Filtering p(xty1t )
Smoothing p(xty1tL ) where Lgt0
On-line old data is discarded (fixed memory)
Off-line old data is re-used (unbounded memory)

66
Kalman filtering / Belief propagation

Prediction
Measurement
Smoothing

67
Approximate an Edge by a Tree
Each potential f a in p is projected onto the
tree-structure of q
Correlations between two nodes are not lost, but
projected onto the tree
68
Graphical Models
Directed Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative) Maximum entropy Markov models Conditional random fields
69
EP on Dynamic Systems
Directed Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative) Maximum entropy Markov models Conditional random fields
70
EP on Boltzman machines
Directed Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative) Maximum entropy Markov models Conditional random fields
71
Future Work
Directed Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative) Maximum entropy Markov models Conditional random fields

Write a Comment

User Comments (0)

About PowerShow.com

Extending Expectation Propagation for Graphical Models - PowerPoint PPT Presentation

Extending Expectation Propagation for Graphical Models

Incorporate the backward message ... Incorporate forward and observation messages. Extensions of EP ... cliques even when only incorporating one off-tree edge ... – PowerPoint PPT presentation