Title: Inference in Gaussian and Hybrid Bayesian Networks
1Inference in Gaussian and Hybrid Bayesian Networks
2Gaussian Distribution
3(No Transcript)
4(No Transcript)
5Multivariate Gaussian
- Definition
- Let X1,,Xn. Be a set of random variables. A
multivariate Gaussian distribution over X1,,Xn
is a parameterized by an n-dimensional mean
vector ? and an n x n positive definitive
covariance matrix ?. It defines a joint density
via
6Multivariate Gaussian
7Linear Gaussian Distribution
- Definition
- Let Y be a continuous node with continuous
parents X1,,Xk. We say that Y has a linear
Gaussian model if it can be described using
parameters ?0, ,?k and ?2 such that - P(y x1,,xk)N (µy ?1x1 ,?kxk ? )
- N(µy,?1,,?k , ? )
8(No Transcript)
9(No Transcript)
10Linear Gaussian Network
- Definition
- Linear Gaussian Bayesian network is a Bayesian
network all of whose variables are continuous and
where all of the CPTs are linear Gaussians. - Linear Gaussian BN ? Multivariate Gaussian
- gtLinear Gaussian BN has a compact representation
11Inference in Continuous Networks
A
B
12Marginalization
13Problems When we Multiply two arbitrary
Gaussians!
Inverse of K and M is always well
defined. However, this inverse is not!
14Theoretical explanation Why this is the case ?
- Inverse of a matrix of size n x n exists when the
matrix is of rank n. - If all sigmas and ws are assumed to be 1.
- (K-1M-1) has rank 2 and so is not invertible.
15Density vs conditional
- However,
- Theorem If the product of the gaussians
represents a multi-variate gaussian density, then
the inverse always exists. - For example, For P(AB)P(B)P(A,B) N(c,C) then
inverse of C always exists. P(A,B) is a
multi-variate gaussian (density). - But P(AB)P(BX)P(A,BX) N(c,C) then inverse
of C may not exist. P(A,BX) is a conditional
gaussian.
16Inference A general algorithm Computing marginal
of a given variable, say Z.
Step 1 Convert all conditional gaussians to
canonical form
17Inference A general algorithm Computing marginal
of a given variable, say Z.
- Step 2
- Extend all gs,hs and ks to the same domain by
adding 0s.
18Inference A general algorithm Computing marginal
of a given variable, say Z.
- Step 3 Add all gs, all hs and all ks.
- Step 4 Let the variables involved in the
computation be P(X1,X2,,Xk,Z) N(µ,?)
19Inference A general algorithm Computing marginal
of a given variable, say Z.
Step 5 Extract the marginal
20Inference Computing marginal of a given variable
- For a continuous Gaussian Bayesian Network,
inference is polynomial O(N3). - Complexity of matrix inversion
- So algorithms like belief propagation are not
generally used when all variables are Gaussian. - Can we do better than N3?
- Use Bucket elimination.
21 Bucket elimination Algorithm elim-bel (Dechter
1996)
Marginalization operator
22Multiplication Operator
- Convert all functions to canonical form if
necessary. - Extend all functions to the same variables
- (g1,h1,k1)(g2,h2,k2) (g1g2,h1h2,k1k2)
23Again our problem!
h(a,d,c,e) does not represent a density and so
cannot be computed in our usual form N(µ,s)
Marginalization operator
24Solution Marginalize in canonical form
- Although intermediate functions computed in
bucket elimination are conditional, we can
marginalize in canonical form, so we can
eliminate the problem of non-existence of inverse
completely.
25Algorithm
- In each bucket, convert all functions in
canonical form if necessary, multiply them and
marginalize out the variable in the bucket as
shown in the previous slide. - Theorem P(A) is a density and is correct.
- Complexity Time and space O((w1)3) where w is
the width of the ordering used.
26Continuous Node, Discrete Parents
- Definition
- Let X be a continuous node, and let
UU1,U2,,Un be its discrete parents and
YY1,Y2,,Yk be its continuous parents. We say
that X has a conditional linear Gaussian (CLG)
CPT if, for every value u?D(U), we have a a set
of (k1) coefficients au,0, au,1, , au,k1 and a
variance ?u2 such that
27CLG Network
- Definition
- A Bayesian network is called a CLG network if
every discrete node has only discrete parents,
and every continuous node has a CLG CPT.
28Inference in CLGs
- Can we use the same algorithm?
- Yes, but the algorithm is unbounded if we are not
careful. - Reason
- Marginalizing out discrete variables from any
arbitrary function in CLGs is not bounded. - If we marginalize out y and k from f(x,y,i,k) ,
the result is a mixture of 4 gaussians instead of
2. - X and y are continuous variables
- I and k are discrete binary variables.
29Solution Approximate the mixture of Gaussians by
a single gaussian
30Multiplication and Marginalization
Strong marginal when marginalizing continuous
variables
Multiplication
- Convert all functions to canonical form if
necessary. - Extend all functions to the same variables
- (g1,h1,k1)(g2,h2,k2) (g1g2,h1h2,k1k2)
Weak marginal when marginalizing discrete
variables
31Problem while using this marginalization in
bucket elimination
- Requires computing ? and µ which is not possible
due to non-existence of inverse. - Solution Use an ordering such that you never
have to marginalize out discrete variables from a
function that has both discrete and continuous
gaussian variables. - Special case Compute marginal at a discrete node
- Homework Derive a bucket elimination algorithm
for computing marginal of a continuous variable.
32Special Case A marginal on a discrete variable
in a CLG is to be computed.
B,C and D are continuous variables and A and E is
discrete
Marginalization operator
33Complexity of the special case
- Discrete-width (wd) Maximum number of discrete
variables in a clique - Continuous-width (wc) Maximum number of
continuous variables in a clique - Time O(exp(wd)wc3)
- Space O(exp(wd)wc3)
34Algorithm for the general caseComputing Belief
at a continuous node of a CLG
- Convert all functions to canonical form.
- Create a special tree-decomposition
- Assign functions to appropriate cliques (Same as
assigning functions to buckets) - Select a Strong Root
- Perform message passing
35Creating a Special-tree decomposition
- Moralize the Bayesian Network.
- Select an ordering such that all continuous
variables are ordered before discrete variables
(Increases induced width).
36Elimination order
w
x
- Strong elimination order
- First eliminate continuous variables
- Eliminate discrete variable when no available
continuous variables
y
W and X are discrete variables and Y and Z are
continuous.
z
Moralized graph has this edge
37Elimination order (1)
dim 2
dim 2
w
x
y
dim 2
z
1
38Elimination order (2)
dim 2
dim 2
w
x
y
2
z
1
39Elimination order (3)
3
dim 2
w
x
y
2
z
1
40Elimination order (4)
3
4
3
4
w
x
w
x
3
y
w
y
2
2
3
Cliques 2
w
z
y
1
2
separator
y
2
Cliques 1
z
1
41Bucket tree or Junction tree (1)
w
x
w
y
Cliques 2 root
w
y
separator
y
Cliques 1
z
42Algorithm for the general caseComputing Belief
at a continuous node of a CLG
- Convert all functions to canonical form.
- Create a special tree-decomposition
- Assign functions to appropriate cliques (Same as
assigning functions to buckets) - Select a Strong Root
- Perform message passing
43Assigning Functions to cliques
- Select a function and place it in an arbitrary
clique that mentions all variables in the
function.
44Algorithm for the general caseComputing Belief
at a continuous node of a CLG
- Convert all functions to canonical form.
- Create a special tree-decomposition
- Assign functions to appropriate cliques (Same as
assigning functions to buckets) - Select a Strong Root
- Perform message passing
45Strong Root
- We define a strong root as any node R in the
bucket-tree which satisfies the following
property for any pair (V,W) which are neighbors
on the tree with W closer to R than V, we have
46Example Strong root
Strong Root
47Algorithm for the general caseComputing Belief
at a continuous node of a CLG
- Create a special tree-decomposition
- Assign functions to appropriate cliques (Same as
assigning functions to buckets) - Select a Strong Root
- Perform message passing
48Message passing at a typical node
x2
- Node a contains functions assigned to it
according to the tree-decomposition scheme
denoted by pj(a)
49Message Passing
Two pass algorithm Bucket-tree propagation
Figure from P. Green
50Lets look at the messagesCollect Evidence
Strong Root
?C
?Mout
?D
?L
?Min?D
51Distribute Evidence
Strong Root
?E?W,B
?W
?F
?E?W,B
?E?B
52Lauritzens theorem
- When you perform message passing such that
collect evidence contains only strong marginals
and distribute evidence may contain weak
marginals, the junction-tree algorithm in exact
in the sense that - The first (mean) and second moments (variance)
computed are true moments
53Complexity
- Polynomial in of continuous variables in a
clique (n3) - Exponential in of discrete variables in a clique
- Possible options for approximation
- Ignore the strong root assumption and use
approximation like MBTE, IJGP, Sampling - Respect the strong root assumption and use
approximation like MBTE, IJGP, Sampling - Inaccuracies only due to discrete variables if
done in one pass of MBTE.
54Initialization (1)
dim 2
dim 2
x0 0.4
x1 0.6
w0 0.5
w1 0.5
w
x
X0 X1
y
dim 2
z
dim 2
W0 W1
55Initialization (2)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
x0 glog(0.4),h,K
x1 glog(0.6),h,K
w0 glog(0.5),h,K
w1 glog(0.5),h,K
X0 X1
g -4.1245 h -0.02 0.12 K 0.1 0 0 0.1 g -3.0310 h 0.5 -0.5 K 0.5 0.50.5 0.5
W0 W1
g -4.0629 h 0.0889 -0.0111 -0.0556 0.0556 K g -2.7854 h 0.0867 -0.0633 -0.1000 -0.1667 K
56Initialization (3)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
wx00 wx10
g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1 g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5
W0 W1
g -4.7560 h K g -3.4786 h K
57Message Passing
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
Collect evidence
Distribute evidence
58Collect evidence (1)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
y2y3
y2
?(y1,y2)?(y2)
y1y2
59Collect evidence (2)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
W0 W1
g -4.7560 h K g -3.4786 h K
marginalization
W0 W1
g -0.6931 h 0.1388 0 1.0e-16 K 0.2776 -0.06940.0347 01.0e-16 g -0.6931 h 0 0 K 0 0 0 0
60Collect evidence (3)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
W0 W1
g -0.6931 h 0.1388 0 1.0e-16 K 0.2776 -0.06940.0347 01.0e-16 g -0.6931 h 0 0 K 0 0 0 0
multiplication
wx00 wx10
g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1 g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5
wx00 wx10
g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1 g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5
61Distribute evidence (1)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -4.7560 h K g -3.4786 h K
division
W0 W1
g -0.6931 h 0.1388 0 1.0e-16 K 0.2776 -0.06940.0347 01.0e-16 g -0.6931 h 0 0 K 0 0 0 0
62Distribute evidence (2)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -4.0629 h K g -2.7854 h K
63Distribute evidence (3)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
wx00 wx10
g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1 g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5
Marginalize over x
w0 w1
logp -0.6931 mu 0.52 -0.12 Sigma logp -0.6931 mu 0.52 -0.12 Sigma
64Distribute evidence (4)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -4.0629 h K g -2.7854 h K
multiplication
w0 w1
logp -0.6931 mu 0.52 -0.12 Sigma logp -0.6931 mu 0.52 -0.12 Sigma
w0 w1
g -4.3316 h 0.0927 -0.0096 K g -0.6931 h 0.0927 -0.0096 K
Canonical form
65Distribute evidence (5)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -8.3935 h K g -7.1170 h K
66After Message Passing
Cliques 1
Cliques 2 (root)
p(wyz)
p(wxy)
p(wy)
Local marginal distributions