Inference in Gaussian and Hybrid Bayesian Networks - PowerPoint PPT Presentation

About This Presentation

Title:

Inference in Gaussian and Hybrid Bayesian Networks

Description:

Inference in Gaussian and Hybrid Bayesian Networks ICS 275B Gaussian Distribution Multivariate Gaussian Definition: Let X1, ,Xn. Be a set of random variables. – PowerPoint PPT presentation

Number of Views:300

Avg rating:3.0/5.0

Slides: 67

Provided by: vib3

Learn more at: https://ics.uci.edu

Category:

more less

Transcript and Presenter's Notes

Title: Inference in Gaussian and Hybrid Bayesian Networks

1
Inference in Gaussian and Hybrid Bayesian Networks

ICS 275B

2
Gaussian Distribution
3
(No Transcript)
4
(No Transcript)
5
Multivariate Gaussian

Definition
Let X1,,Xn. Be a set of random variables. A
multivariate Gaussian distribution over X1,,Xn
is a parameterized by an n-dimensional mean
vector ? and an n x n positive definitive
covariance matrix ?. It defines a joint density
via

6
Multivariate Gaussian
7
Linear Gaussian Distribution

Definition
Let Y be a continuous node with continuous
parents X1,,Xk. We say that Y has a linear
Gaussian model if it can be described using
parameters ?0, ,?k and ?2 such that
P(y x1,,xk)N (µy ?1x1 ,?kxk ? )
N(µy,?1,,?k , ? )

8
(No Transcript)
9
(No Transcript)
10
Linear Gaussian Network

Definition
Linear Gaussian Bayesian network is a Bayesian
network all of whose variables are continuous and
where all of the CPTs are linear Gaussians.
Linear Gaussian BN ? Multivariate Gaussian
gtLinear Gaussian BN has a compact representation

11
Inference in Continuous Networks
A
B
12
Marginalization
13
Problems When we Multiply two arbitrary
Gaussians!
Inverse of K and M is always well
defined. However, this inverse is not!
14
Theoretical explanation Why this is the case ?

Inverse of a matrix of size n x n exists when the
matrix is of rank n.
If all sigmas and ws are assumed to be 1.
(K-1M-1) has rank 2 and so is not invertible.

15
Density vs conditional

However,
Theorem If the product of the gaussians
represents a multi-variate gaussian density, then
the inverse always exists.
For example, For P(AB)P(B)P(A,B) N(c,C) then
inverse of C always exists. P(A,B) is a
multi-variate gaussian (density).
But P(AB)P(BX)P(A,BX) N(c,C) then inverse
of C may not exist. P(A,BX) is a conditional
gaussian.

16
Inference A general algorithm Computing marginal
of a given variable, say Z.
Step 1 Convert all conditional gaussians to
canonical form
17
Inference A general algorithm Computing marginal
of a given variable, say Z.

Step 2
Extend all gs,hs and ks to the same domain by
adding 0s.

18
Inference A general algorithm Computing marginal
of a given variable, say Z.

Step 3 Add all gs, all hs and all ks.
Step 4 Let the variables involved in the
computation be P(X1,X2,,Xk,Z) N(µ,?)

19
Inference A general algorithm Computing marginal
of a given variable, say Z.
Step 5 Extract the marginal
20
Inference Computing marginal of a given variable

For a continuous Gaussian Bayesian Network,
inference is polynomial O(N3).
Complexity of matrix inversion
So algorithms like belief propagation are not
generally used when all variables are Gaussian.
Can we do better than N3?
Use Bucket elimination.

21
Bucket elimination Algorithm elim-bel (Dechter
1996)
Marginalization operator
22
Multiplication Operator

Convert all functions to canonical form if
necessary.
Extend all functions to the same variables
(g1,h1,k1)(g2,h2,k2) (g1g2,h1h2,k1k2)

23
Again our problem!
h(a,d,c,e) does not represent a density and so
cannot be computed in our usual form N(µ,s)
Marginalization operator
24
Solution Marginalize in canonical form

Although intermediate functions computed in
bucket elimination are conditional, we can
marginalize in canonical form, so we can
eliminate the problem of non-existence of inverse
completely.

25
Algorithm

In each bucket, convert all functions in
canonical form if necessary, multiply them and
marginalize out the variable in the bucket as
shown in the previous slide.
Theorem P(A) is a density and is correct.
Complexity Time and space O((w1)3) where w is
the width of the ordering used.

26
Continuous Node, Discrete Parents

Definition
Let X be a continuous node, and let
UU1,U2,,Un be its discrete parents and
YY1,Y2,,Yk be its continuous parents. We say
that X has a conditional linear Gaussian (CLG)
CPT if, for every value u?D(U), we have a a set
of (k1) coefficients au,0, au,1, , au,k1 and a
variance ?u2 such that

27
CLG Network

Definition
A Bayesian network is called a CLG network if
every discrete node has only discrete parents,
and every continuous node has a CLG CPT.

28
Inference in CLGs

Can we use the same algorithm?
Yes, but the algorithm is unbounded if we are not
careful.
Reason
Marginalizing out discrete variables from any
arbitrary function in CLGs is not bounded.
If we marginalize out y and k from f(x,y,i,k) ,
the result is a mixture of 4 gaussians instead of
2.
X and y are continuous variables
I and k are discrete binary variables.

29
Solution Approximate the mixture of Gaussians by
a single gaussian
30
Multiplication and Marginalization
Strong marginal when marginalizing continuous
variables
Multiplication

Convert all functions to canonical form if
necessary.
Extend all functions to the same variables
(g1,h1,k1)(g2,h2,k2) (g1g2,h1h2,k1k2)

Weak marginal when marginalizing discrete
variables
31
Problem while using this marginalization in
bucket elimination

Requires computing ? and µ which is not possible
due to non-existence of inverse.
Solution Use an ordering such that you never
have to marginalize out discrete variables from a
function that has both discrete and continuous
gaussian variables.
Special case Compute marginal at a discrete node
Homework Derive a bucket elimination algorithm
for computing marginal of a continuous variable.

32
Special Case A marginal on a discrete variable
in a CLG is to be computed.
B,C and D are continuous variables and A and E is
discrete
Marginalization operator
33
Complexity of the special case

Discrete-width (wd) Maximum number of discrete
variables in a clique
Continuous-width (wc) Maximum number of
continuous variables in a clique
Time O(exp(wd)wc3)
Space O(exp(wd)wc3)

34
Algorithm for the general caseComputing Belief
at a continuous node of a CLG

Convert all functions to canonical form.
Create a special tree-decomposition
Assign functions to appropriate cliques (Same as
assigning functions to buckets)
Select a Strong Root
Perform message passing

35
Creating a Special-tree decomposition

Moralize the Bayesian Network.
Select an ordering such that all continuous
variables are ordered before discrete variables
(Increases induced width).

36
Elimination order
w
x

Strong elimination order
First eliminate continuous variables
Eliminate discrete variable when no available
continuous variables

y
W and X are discrete variables and Y and Z are
continuous.
z
Moralized graph has this edge
37
Elimination order (1)
dim 2
dim 2
w
x
y
dim 2
z
1
38
Elimination order (2)
dim 2
dim 2
w
x
y
2
z
1
39
Elimination order (3)
3
dim 2
w
x
y
2
z
1
40
Elimination order (4)
3
4
3
4
w
x
w
x
3
y
w
y
2
2
3
Cliques 2
w
z
y
1
2
separator
y
2
Cliques 1
z
1
41
Bucket tree or Junction tree (1)
w
x
w
y
Cliques 2 root
w
y
separator
y
Cliques 1
z
42
Algorithm for the general caseComputing Belief
at a continuous node of a CLG

Convert all functions to canonical form.
Create a special tree-decomposition
Assign functions to appropriate cliques (Same as
assigning functions to buckets)
Select a Strong Root
Perform message passing

43
Assigning Functions to cliques

Select a function and place it in an arbitrary
clique that mentions all variables in the
function.

44
Algorithm for the general caseComputing Belief
at a continuous node of a CLG

Convert all functions to canonical form.
Create a special tree-decomposition
Assign functions to appropriate cliques (Same as
assigning functions to buckets)
Select a Strong Root
Perform message passing

45
Strong Root

We define a strong root as any node R in the
bucket-tree which satisfies the following
property for any pair (V,W) which are neighbors
on the tree with W closer to R than V, we have

46
Example Strong root
Strong Root
47
Algorithm for the general caseComputing Belief
at a continuous node of a CLG

Create a special tree-decomposition
Assign functions to appropriate cliques (Same as
assigning functions to buckets)
Select a Strong Root
Perform message passing

48
Message passing at a typical node
x2

Node a contains functions assigned to it
according to the tree-decomposition scheme
denoted by pj(a)

49
Message Passing
Two pass algorithm Bucket-tree propagation
Figure from P. Green
50
Lets look at the messagesCollect Evidence
Strong Root
?C
?Mout
?D
?L
?Min?D
51
Distribute Evidence
Strong Root
?E?W,B
?W
?F
?E?W,B
?E?B
52
Lauritzens theorem

When you perform message passing such that
collect evidence contains only strong marginals
and distribute evidence may contain weak
marginals, the junction-tree algorithm in exact
in the sense that
The first (mean) and second moments (variance)
computed are true moments

53
Complexity

Polynomial in of continuous variables in a
clique (n3)
Exponential in of discrete variables in a clique
Possible options for approximation
Ignore the strong root assumption and use
approximation like MBTE, IJGP, Sampling
Respect the strong root assumption and use
approximation like MBTE, IJGP, Sampling
Inaccuracies only due to discrete variables if
done in one pass of MBTE.

54
Initialization (1)
dim 2
dim 2
x0 0.4
x1 0.6
w0 0.5
w1 0.5
w
x
X0 X1

y
dim 2
z
dim 2
W0 W1

55
Initialization (2)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
x0 glog(0.4),h,K
x1 glog(0.6),h,K
w0 glog(0.5),h,K
w1 glog(0.5),h,K
X0 X1
g -4.1245 h -0.02 0.12 K 0.1 0 0 0.1 g -3.0310 h 0.5 -0.5 K 0.5 0.50.5 0.5
W0 W1
g -4.0629 h 0.0889 -0.0111 -0.0556 0.0556 K g -2.7854 h 0.0867 -0.0633 -0.1000 -0.1667 K
56
Initialization (3)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
wx00 wx10
g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1 g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5
W0 W1
g -4.7560 h K g -3.4786 h K
57
Message Passing
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
Collect evidence
Distribute evidence
58
Collect evidence (1)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
y2y3
y2
?(y1,y2)?(y2)
y1y2
59
Collect evidence (2)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
W0 W1
g -4.7560 h K g -3.4786 h K
marginalization
W0 W1
g -0.6931 h 0.1388 0 1.0e-16 K 0.2776 -0.06940.0347 01.0e-16 g -0.6931 h 0 0 K 0 0 0 0
60
Collect evidence (3)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
W0 W1
g -0.6931 h 0.1388 0 1.0e-16 K 0.2776 -0.06940.0347 01.0e-16 g -0.6931 h 0 0 K 0 0 0 0
multiplication
wx00 wx10
g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1 g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5
wx00 wx10
g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1 g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5
61
Distribute evidence (1)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -4.7560 h K g -3.4786 h K
division
W0 W1
g -0.6931 h 0.1388 0 1.0e-16 K 0.2776 -0.06940.0347 01.0e-16 g -0.6931 h 0 0 K 0 0 0 0
62
Distribute evidence (2)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -4.0629 h K g -2.7854 h K
63
Distribute evidence (3)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
wx00 wx10
g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1 g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5
Marginalize over x
w0 w1
logp -0.6931 mu 0.52 -0.12 Sigma logp -0.6931 mu 0.52 -0.12 Sigma
64
Distribute evidence (4)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -4.0629 h K g -2.7854 h K
multiplication
w0 w1
logp -0.6931 mu 0.52 -0.12 Sigma logp -0.6931 mu 0.52 -0.12 Sigma
w0 w1
g -4.3316 h 0.0927 -0.0096 K g -0.6931 h 0.0927 -0.0096 K
Canonical form
65
Distribute evidence (5)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -8.3935 h K g -7.1170 h K
66
After Message Passing
Cliques 1
Cliques 2 (root)
p(wyz)
p(wxy)
p(wy)
Local marginal distributions

Write a Comment

User Comments (0)