Title: Extending Expectation Propagation for Graphical Models
1Extending Expectation Propagation for Graphical
Models
- Yuan (Alan) Qi
- Joint work with Tom Minka
2Motivation
- Graphical models are widely used in real-world
applications, such as wireless communications and
bioinformatics. - Inference techniques on graphical models often
sacrifice efficiency for accuracy or sacrifice
accuracy for efficiency. - Need a method that better balances the trade-off
between accuracy and efficiency.
3Motivation
Current Techniques
Error
Computational Time
4Outline
- Background on expectation propagation (EP)
- Extending EP on Bayesian networks for dynamic
systems - Poisson tracking
- Signal detection for wireless communications
- Tree-structured EP on loopy graphs
- Conclusions and future work
5Outline
- Background on expectation propagation (EP)
- Extending EP on Bayesian networks for dynamic
systems - Poisson tracking
- Signal detection for wireless communications
- Tree-structured EP on loopy graphs
- Conclusions and future work
6Graphical Models
Directed ( Bayesian networks) Undirected ( Markov networks)
7Inference on Graphical Models
- Bayesian inference techniques
- Belief propagation (BP) Kalman filtering
/smoothing, forward-backward algorithm - Monte Carlo Particle filter/smoothers, MCMC
- Loopy BP typically efficient, but not accurate
on general loopy graphs - Monte Carlo accurate, but often not efficient
-
8Expectation Propagation in a Nutshell
- Approximate a probability distribution by
simpler parametric terms - For directed graphs
- For undirected graphs
- Each approximation term lives in an
exponential family (e.g. Gaussian)
9EP in a Nutshell
- The approximate term minimizes the
following KL divergence by moment matching
Where the leave-one-out approximation is
10Limitations of Plain EP
- Can be difficult or expensive to analytically
compute the needed moments in order to minimize
the desired KL divergence. - Can be expensive to compute and maintain a valid
approximation distribution q(x), which is
coherent under marginalization. - Tree-structured q(x)
11Three Extensions
- 1. Instead of choosing the approximate term
to minimize the following KL divergence
use other criteria.
2. Use numerical approximation to compute
moments Quadrature or Monte Carlo.
3. Allow the tree-structured q(x) to be
non-coherent during the iterations. It only needs
to be coherent in the end.
12Efficiency vs. Accuracy
Loopy BP (Factorized EP)
Error
Extended EP ?
Monte Carlo
Computational Time
13Outline
- Background on expectation propagation (EP)
- Extending EP on Bayesian networks for dynamic
systems - Poisson tracking
- Signal detection for wireless communications
- Tree-structured EP on loopy graphs
- Conclusions and future work
14Object Tracking
Guess the position of an object given noisy
observations
Object
15Bayesian Network
e.g.
(random walk)
want distribution of xs given ys
16Approximation
Factorized and Gaussian in x
17Message Interpretation
(forward msg)(observation msg)(backward msg)
Forward Message
Backward Message
Observation Message
18EP on Dynamic Systems
- Filtering t 1, , T
- Incorporate forward message
- Initialize observation message
- Smoothing t T, , 1
- Incorporate the backward message
- Compute the leave-one-out approximation by
dividing out the old observation messages - Re-approximate the new observation messages
- Re-filtering t 1, , T
- Incorporate forward and observation messages
19Extensions of EP
- Instead of matching moments, use any method for
approximate filtering. - Examples statistical linearization, unscented
Kalman filter (UKF), mixture of Kalman filters - Turn any deterministic filtering method into a
smoothing method! - All methods can be interpreted as finding
linear/Gaussian approximations to original terms. - Use quadrature or Monte Carlo for term
approximations
20Example Poisson Tracking
- is an integer valued Poisson variate with
mean
21Poisson Tracking Model
22Extension of EP Approximate Observation Message
- is not Gaussian
- Moments of x not analytic
- Two approaches
- Gauss-Hermite quadrature for moments
- Statistical linearization instead of
moment-matching (Turn unscented Kalman filters
into a smoothing method) - Both work well
23Approximate vs. Exact Posterior
p(xTy1T)
xT
24Extended EP vs. Monte Carlo Accuracy
Mean
Variance
25Accuracy/Efficiency Tradeoff
26EP for Digital Wireless Communication
- Signal detection problem
- Transmitted signal st
- vary to encode each symbol
- Complex representation
Im
Re
27Binary Symbols, Gaussian Noise
- Symbols are 1 and 1 (in complex plane)
- Received signal yt
- Optimal detection is easy
28Fading Channel
- Channel systematically changes amplitude and
phase - changes over time
29Benchmark Differential Detection
- Classical technique
- Use previous observation to estimate state
- Binary symbols only
30Bayesian network for Signal Detection
31Extended-EP Joint Signal Detection and Channel
Estimation
- Turn mixture of Kalman filters into a smoothing
method - Smoothing over the last observations
- Observations before act as prior for the
current estimation
32Computational Complexity
- Expectation propagation O(nLd2)
- Stochastic mixture of Kalman filters O(LMd2)
- Rao-blackwised particle smoothers O(LMNd2)
- n Number of EP iterations (Typically, 4 or 5)
- d Dimension of the parameter vector
- L Smooth window length
- M Number of samples in filtering (Often larger
than 500) - N Number of samples in smoothing (Larger than
50) - EP is about 5,000 times faster than
Rao-blackwised particle smoothers.
33Experimental Results
(Chen, Wang, Liu 2000)
Signal-Noise-Ratio
Signal-Noise-Ratio
EP outperforms particle smoothers in efficiency
with comparable accuracy.
34Bayesian Networks for Adaptive Decoding
The information bits et are coded by a
convolutional error-correcting encoder.
35EP Outperforms Viterbi Decoding
Signal-Noise-Ratio
36Outline
- Background on expectation propagation (EP)
- Extending EP on Bayesian networks for dynamic
systems - Poisson tracking
- Signal detection for wireless communications
- Tree-structured EP on loopy graphs
- Conclusions and future work
37Inference on Loopy Graphs
Problem estimate marginal distributions of the
variables indexed by the nodes in a loopy graph,
e.g., p(xi), i 1, . . . , 16.
384-node Loopy Graph
Joint distribution is product of pairwise
potentials for all edges
Want to approximate by a simpler
distribution
39BP vs. TreeEP
TreeEP
BP
40Junction Tree Representation
p(x) q(x)
Junction tree
41Two Kinds of Edges
- On-tree edges, e.g., (x1,x4) exactly
incorporated into the junction tree - Off-tree edges, e.g., (x1,x2) approximated by
projecting them onto the tree structure
42KL Minimization
- KL minimization moment matching
- Match single and pairwise marginals of
- Reduces to exact inference on single loops
- Use cutset conditioning
and
43Matching Marginals on Graph
(1) Incorporate edge (x3 x4)
(2) Incorporate edge (x6 x7)
44Drawbacks of Global Propagation
- Update all the cliques even when only
incorporating one off-tree edge - Computationally expensive
- Store each off-tree data message as a whole tree
- Require large memory size
45Solution Local Propagation
- Allow q(x) be non-coherent during the iterations.
It only needs to be coherent in the end. - Exploit the junction tree representation only
locally propagate information within the minimal
loop (subtree) that is directly connected to the
off-tree edge. - Reduce computational complexity
- Save memory
46(1) Incorporate edge(x3 x4)
(2) Propagate evidence
On this simple graph, local propagation runs
roughly 2 times faster and uses 2 times less
memory to store messages than plain EP
(3) Incorporate edge (x6 x7)
47New Interpretation of TreeEP
- Marry EP with Junction algorithm
- Can perform efficiently over hypertrees and
hypernodes
484-node Graph
- TreeEP the proposed method
- GBP generalized belief propagation on triangles
- TreeVB variational tree
- BP loopy belief propagation Factorized EP
- MF mean-field
49Fully-connected graphs
- Results are averaged over 10 graphs with randomly
generated potentials - TreeEP performs the same or better than all
other methods in both accuracy and efficiency!
508x8 grids, 10 trials
Method FLOPS Error
Exact 30,000 0
TreeEP 300,000 0.149
BP/double-loop 15,500,000 0.358
GBP 17,500,000 0.003
51TreeEP versus BP and GBP
- TreeEP is always more accurate than BP and is
often faster - TreeEP is much more efficient than GBP and more
accurate on some problems - TreeEP converges more often than BP and GBP
52Outline
- Background on expectation propagation (EP)
- Extending EP on Bayesian networks for dynamic
systems - Poisson tracking
- Signal detection for wireless communications
- Tree-structured EP on loopy graphs
- Conclusions and future work
53Conclusions
- Extend EP on graphical models
- Instead of minimizing KL divergence, use other
sensible criteria to generate messages.
Effectively turn any deterministic filtering
method into a smoothing method. - Use quadrature to approximate messages.
- Local propagation to save the computation and
memory in tree structured EP.
54Conclusions
State-of-art Techniques
Error
Computational Time
- Extended EP algorithms outperform state-of-art
inference methods on graphical models in the
trade-off between accuracy and efficiency
55Future Work
- More extensions of EP
- How to choose a sensible approximation family
(e.g. which tree structure) - More flexible approximation mixture of EP?
- Error bound?
- Bayesian conditional random fields
- More real-world applications
56End
Contact information yuanqi_at_media.mit.edu
57Extended EP Accuracy Improves Significantly in
only a Few Iterations
58EP versus BP
- EP approximation is in a restricted family, e.g.
Gaussian - EP approximation does not have to be factorized
- EP applies to many more problems
- e.g. mixture of discrete/continuous variables
59EP versus Monte Carlo
- Monte Carlo is general but expensive
- EP exploits underlying simplicity of the problem
if it exists - Monte Carlo is still needed for complex problems
(e.g. large isolated peaks) - Trick is to know what problem you have
60(Loopy) Belief propagation
- Specialize to factorized approximations
- Minimize KL-divergence match marginals of
(partially factorized) and
(fully factorized) - send messages
messages
61Limitation of BP
- If the dynamics or measurements are not linear
and Gaussian, the complexity of the posterior
increases with the number of measurements - I.e. BP equations are not closed
- Beliefs need not stay within a given family
or any other exponential family
62Approximate filtering
- Compute a Gaussian belief which approximates the
true posterior - E.g. Extended Kalman filter, statistical
linearization, unscented filter, assumed-density
filter
63EP perspective
- Approximate filtering is equivalent to replacing
true measurement/dynamics equations with
linear/Gaussian equations
Gaussian
implies
Gaussian
64EP perspective
- EKF, UKF, ADF are all algorithms for
Linear, Gaussian
Nonlinear, Non-Gaussian
65Terminology
- Filtering p(xty1t )
- Smoothing p(xty1tL ) where Lgt0
- On-line old data is discarded (fixed memory)
- Off-line old data is re-used (unbounded memory)
66Kalman filtering / Belief propagation
- Prediction
- Measurement
- Smoothing
67Approximate an Edge by a Tree
Each potential f a in p is projected onto the
tree-structure of q
Correlations between two nodes are not lost, but
projected onto the tree
68Graphical Models
Directed Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative) Maximum entropy Markov models Conditional random fields
69EP on Dynamic Systems
Directed Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative) Maximum entropy Markov models Conditional random fields
70EP on Boltzman machines
Directed Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative) Maximum entropy Markov models Conditional random fields
71Future Work
Directed Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative) Maximum entropy Markov models Conditional random fields