Title: Variational Inference and Variational Message Passing
1Variational Inference and Variational Message
Passing
- John WinnMicrosoft Research, Cambridge
12th November 2004 Robotics Research
Group, University of Oxford
2Overview
- Probabilistic models Bayesian inference
- Variational Inference
- Variational Message Passing
- Vision example
3Overview
- Probabilistic models Bayesian inference
- Variational Inference
- Variational Message Passing
- Vision example
4Bayesian networks
- Nodes represent variables
- Conditional distributions at each node
- Defines a joint distribution
P(C,L,S,I)P(L) P(C) P(SC) P(IL,S)
5Bayesian inference
Object class
C
Observed variables V and hidden variables H.
Hidden
Surface colour
Lighting colour
Hidden variables includeparameters and latent
variables.
S
L
Learning/inference involves finding
Image colour
I
P(H1, H2 V)
Observed
6Bayesian inference vs. ML/MAP
- Consider learning one parameter ?
- How should we represent this posterior
distribution?
7Bayesian inference vs. ML/MAP
- Consider learning one parameter ?
Maximum of P(V ?) P(?)
P(V ?) P(?)
?
?MAP
8Bayesian inference vs. ML/MAP
- Consider learning one parameter ?
High probability density
P(V ?) P(?)
?
?MAP
9Bayesian inference vs. ML/MAP
- Consider learning one parameter ?
P(V ?) P(?)
?
?ML
Samples
10Bayesian inference vs. ML/MAP
- Consider learning one parameter ?
P(V ?) P(?)
?
Variational approximation
?ML
11Overview
- Probabilistic models Bayesian inference
- Variational Inference
- Variational Message Passing
- Vision example
12Variational Inference
(in three easy steps)
- Choose a family of variational distributions
Q(H). - Use Kullback-Leibler divergence KL(QP) as a
measure of distance between P(HV) and Q(H). - Find Q which minimises divergence.
13KL Divergence
Exclusive
Minimising KL(QP)
P
Inclusive
14Minimising the KL divergence
maximise
fixed
minimise
where
- We choose a family of Q distributions where L(Q)
is tractable to compute.
15Minimising the KL divergence
KL(Q P)
maximise
ln P(V)
fixed
L(Q)
16Minimising the KL divergence
KL(Q P)
maximise
ln P(V)
fixed
L(Q)
17Minimising the KL divergence
KL(Q P)
maximise
ln P(V)
fixed
L(Q)
18Minimising the KL divergence
KL(Q P)
maximise
ln P(V)
fixed
L(Q)
19Minimising the KL divergence
KL(Q P)
maximise
ln P(V)
fixed
L(Q)
20Factorised Approximation
No further assumptions are required!
21Example Univariate Gaussian
- Likelihood function
- Conjugate prior
- Factorized variational distribution
22Initial Configuration
?
µ
23After Updating Q(µ)
?
µ
24After Updating Q(?)
?
µ
25Converged Solution
?
µ
26Lower Bound for GMM
27Variational Equations for GMM
28Overview
- Probabilistic models Bayesian inference
- Variational Inference
- Variational Message Passing
- Vision example
29Variational Message Passing
- VMP makes it easier and quicker to apply
factorised variational inference. - VMP carries out variational inference using local
computations and message passing on the graphical
model. - Modular algorithm allows modifying, extending or
combining models.
30Local Updates
For factorised Q, update for each factor depends
only on the Markov blanket
?Updates can be carried out locally at each node.
31VMP I The Exponential Family
- Conditional distributions expressed in
exponential family form.
T
)
(
)
(
)
(
)
(
ln
X
f
g
X
X
P
?
u
?
?
sufficient statistics vector
natural parameter vector
32VMP II Conjugacy
- Parents and children are chosen to be conjugate
i.e. same functional form
X
Y
same
Z
- Examples
- Gaussian for the mean of a Gaussian
- Gamma for the precision of a Gaussian
- Dirichlet for the parameters of a discrete
distribution
33VMP III Messages
X
Y
Z
34VMP IV Update
- Optimal Q(X) has same form as P(X?) but with
updated parameter vector ?
Computed from messages from parents
- These messages can also be used to calculate the
bound on the evidence L(Q) see Winn Bishop,
2004.
35VMP Example
- Learning parameters of a Gaussian from N data
points.
µ
?
mean
precision (inverse variance)
x
data
N
36VMP Example
Message from ? to all x.
µ
?
need initial Q(?)
x
N
37VMP Example
Messages from each xn to µ.
µ
?
x
N
Update Q(µ) parameter vector
38VMP Example
Message from updated µ to all x.
µ
?
x
N
39VMP Example
Messages from each xn to ?.
µ
?
x
N
Update Q(?) parameter vector
40Features of VMP
- Graph does not need to be a tree it can contain
loops (but not cycles). - Flexible message passing schedule factors can
be updated in any order. - Distributions can be discrete or continuous,
multivariate, truncated (e.g. rectified
Gaussian). - Can have deterministic relationships (ABC).
- Allows for point estimates e.g. of
hyper-parameters
41VMP Software VIBES
- Free download from vibes.sourceforge.net
42Overview
- Probabilistic models Bayesian inference
- Variational Inference
- Variational Message Passing
- Vision example
43Flexible sprite model
Proposed by Jojic Frey (2001)
Set of images e.g. frames from a video
x
N
44Flexible sprite model
p
f
Sprite appearance and shape
x
N
45Flexible sprite model
p
f
Sprite transform for this image (discretised)
T
m
x
Mask for this image
N
46Flexible sprite model
p
f
b
Background
T
m
Noise
x
ß
N
47VMP
p
f
b
T
m
x
ß
N
Winn Blake (NIPS 2004)
48Results of VMP on hand video
Original video
Learned appearance and mask
Learned transforms (i.e. motion)
49Conclusions
- Variational Message Passing allows approximate
Bayesian inference for a wide range of models. - VMP simplifies the construction, testing,
extension and comparison of models. - You can try VMP for yourself
vibes.sourceforge.net
50Thats all folks!