Title: Two Approximate Algorithms for Belief Updating
1Two Approximate Algorithms for Belief Updating
- Mini-Clustering - MC
- Robert Mateescu, Rina Dechter, Kalev Kask. "Tree
Approximation for Belief Updating", AAAI-2002 - Iterative Join-Graph Propagation - IJGP
- Rina Dechter, Kalev Kask and Robert Mateescu.
"Iterative Join-Graph Propagation, UAI 2002
2What is Mini-Clustering?
- Mini-Clustering (MC) is an approximate algorithm
for belief updating in Bayesian networks - MC is an anytime version of join-tree clustering
- MC applies message passing along a cluster tree
- The complexity of MC is controlled by a
user-adjustable parameter, the i-bound - Empirical evaluation shows that MC is a very
effective algorithm, in many cases superior to
other approximate schemes (IBP, Gibbs Sampling)
3Motivation
- Probabilistic reasoning using belief networks is
known to be NP-hard - Nevertheless, approximate inference can be a
powerful tool for decision making under
uncertainty - We propose an anytime version of Cluster Tree
Elimination
4Outline
- Preliminaries
- Belief networks
- Tree decompositions
- Tree Clustering algorithm
- Mini-Clustering algorithm
- Experimental results
5Belief networks
- The belief updating problem is the task of
computing the posterior probability P(Ye) of
query nodes Y ? X given evidence e.We focus on
the basic case where Y is a single variable Xi
6Tree decompositions
7Tree decompositions
A B C p(a), p(ba), p(ca,b)
BC
B C D F p(db), p(fc,d)
BF
B E F p(eb,f)
A
B
EF
E F G p(ge,f)
E
C
D
F
Belief network
Tree decomposition
G
8Example Join-tree
A B C p(a), p(ba), p(ca,b)
BC
B C D F p(db), p(fc,d)
BF
B E F p(eb,f)
EF
E F G p(ge,f)
9Cluster Tree Elimination
- Cluster Tree Elimination (CTE) is an exact
algorithm that works by passing messages along a
tree decomposition - Basic idea
- Each node sends only one message to each of its
neighbors - Node u sends a message to its neighbor v only
when u received messages from all its other
neighbors - Previous work on tree clustering
- Lauritzen, Spiegelhalter - 88 (probabilities)
- Jensen, Lauritzen, Olesen - 90 (probabilities)
- Shenoy, Shafer - 90, Shenoy - 97 (general)
- Dechter, Pearl - 89 (constraints)
- Gottlob, Leone, Scarello - 00 (constraints)
10Cluster Tree Elimination
- Cluster Tree Elimination (CTE) is an exact
algorithm - It works by passing messages along a tree
decomposition - Basic idea
- Each node sends only one message to each of its
neighbors - Node u sends a message to its neighbor v only
when u received messages from all its other
neighbors
11Cluster Tree Elimination
- Previous work on tree clustering
- Lauritzen, Spiegelhalter - 88 (probabilities)
- Jensen, Lauritzen, Olesen - 90 (probabilities)
- Shenoy, Shafer - 90, Shenoy - 97 (general)
- Dechter, Pearl - 89 (constraints)
- Gottlob, Leone, Scarello - 00 (constraints)
12Belief Propagation
x1
h(u,v)
v
u
x2
xn
13Belief Propagation
x1
h(u,v)
v
u
x2
xn
14Cluster Tree Elimination - example
ABC
1
BC
BCDF
2
BF
BEF
3
EF
EFG
4
15Cluster Tree Elimination - the messages
A B C p(a), p(ba), p(ca,b)
1
BC
B C D F p(db), p(fc,d) h(1,2)(b,c)
2
sep(2,3)B,F elim(2,3)C,D
BF
B E F p(eb,f), h(2,3)(b,f)
3
EF
E F G p(ge,f)
4
16Cluster Tree Elimination - properties
- Correctness and completeness Algorithm CTE is
correct, i.e. it computes the exact joint
probability of a single variable and the
evidence. - Time complexity O ( deg ? (nN) ? d w1 )
- Space complexity O ( N ? d sep)
- where deg the maximum degree of a node
- n number of variables ( number of CPTs)
- N number of nodes in the tree decomposition
- d the maximum domain size of a variable
- w the induced width
- sep the separator size
17Mini-Clustering - motivation
- Time and space complexity of Cluster Tree
Elimination depend on the induced width w of the
problem - When the induced width w is big, CTE algorithm
becomes infeasible
18Mini-Clustering - the basic idea
- Try to reduce the size of the cluster (the
exponent) partition each cluster into
mini-clusters with less variables - Accuracy parameter i maximum number of
variables in a mini-cluster - The idea was explored for variable elimination
(Mini-Bucket)
19Mini-Clustering
- Motivation
- Time and space complexity of Cluster Tree
Elimination depend on the induced width w of the
problem - When the induced width w is big, CTE algorithm
becomes infeasible - The basic idea
- Try to reduce the size of the cluster (the
exponent) partition each cluster into
mini-clusters with less variables - Accuracy parameter i maximum number of
variables in a mini-cluster - The idea was explored for variable elimination
(Mini-Bucket)
20Mini-Clustering
- Suppose cluster(u) is partitioned into p
mini-clusters mc(1),,mc(p), each containing at
most i variables - TC computes the exact message
- We want to process each ?f?mc(k) f separately
21Mini-Clustering
- Approximate each ?f?mc(k) f , k2,,p and take it
outside the summation - How to process the mini-clusters to obtain
approximations or bounds - Process all mini-clusters by summation - this
gives an upper bound on the joint probability - A tighter upper bound process one mini-cluster
by summation and the others by maximization - Can also use mean operator (average) - this gives
an approximation of the joint probability
22Idea of Mini-Clustering
Split a cluster into mini-clusters gtbound
complexity
23Mini-Clustering - example
ABC
1
BC
BCDF
2
BF
BEF
3
EF
EFG
4
24Mini-Clustering - the messages, i3
A B C p(a), p(ba), p(ca,b)
1
BC
B C D p(db), h(1,2)(b,c) C D F p(fc,d)
2
sep(2,3)B,F elim(2,3)C,D
BF
B E F p(eb,f), h1(2,3)(b), h2(2,3)(f)
3
EF
E F G p(ge,f)
4
25Cluster Tree Elimination vs. Mini-Clustering
ABC
ABC
1
1
BC
BC
BCDF
BCDF
2
2
BF
BF
BEF
BEF
3
3
EF
EF
EFG
EFG
4
4
26Mini-Clustering
- Correctness and completeness Algorithm MC(i)
computes a bound (or an approximation) on the
joint probability P(Xi,e) of each variable and
each of its values. - Time space complexity O(n ? hw ? d i)
- where hw maxu f f ? ?(u) ? ?
27Normalization
- Algorithms for the belief updating problem
compute, in general, the joint probability - Computing the conditional probability
- is easy to do if exact algorithms can be applied
- becomes an important issue for approximate
algorithms
28Normalization
- MC can compute an (upper) bound on
the joint P(Xi,e) - Deriving a bound on the conditional P(Xie) is
not easy when the exact P(e) is not available - If a lower bound would be available, we
could useas an upper bound on the posterior - In our experiments we normalized the results and
regarded them as approximations of the posterior
P(Xie)
29Experimental results
- We tested MC with max and mean operators
- Algorithms
- Exact
- IBP
- Gibbs sampling (GS)
- MC with normalization (approximate)
- Networks (all variables are binary)
- Coding networks
- CPCS 54, 360, 422
- Grid networks (MxM)
- Random noisy-OR networks
- Random networks
30Experimental results
- Measures
- Normalized Hamming Distance
- pick most likely value (for exact and for
approximate) - take ratio between number of disagreements and
total number of variables - average over problems
- BER (Bit Error Rate) - for coding networks
- Absolute error
- difference between exact and the approximate,
averaged over all values, all variables, all
problems - Relative error
- difference between exact and the approximate,
divided by the exact, averaged over all values,
all variables, all problems - Time
31Experimental results
We tested MC with max and mean operators
- Algorithms
- Exact
- IBP
- Gibbs sampling (GS)
- MC with normalization (approximate)
- Networks (all variables are binary)
- Coding networks
- CPCS 54, 360, 422
- Grid networks (MxM)
- Random noisy-OR networks
- Random networks
- Measures
- Normalized Hamming Distance (NHD)
- BER (Bit Error Rate)
- Absolute error
- Relative error
- Time
32Random networks - Absolute error
evidence0
evidence10
33Coding networks - Bit Error Rate
sigma0.22
sigma.51
34Noisy-OR networks - Absolute error
evidence10
evidence20
35CPCS422 - Absolute error
evidence0
evidence10
36Grid 15x15 - 0 evidence
37Grid 15x15 - 10 evidence
38Grid 15x15 - 20 evidence
39Coding Networks 1
N100, P3, w7
40Coding Networks 2
N100, P4, w11
41CPCS54 - w15
42Noisy-OR Networks 1
N50, P2, w10
43Noisy-OR Networks 2
N50, P3, w16
44Random Networks 1
N50, P2, w10
45Random Networks 2
N50, P3, w16
46Conclusion
- MC extends the partition based approximation from
mini-buckets to general tree decompositions for
the problem of belief updating - Empirical evaluation demonstrates its
effectiveness and superiority (for certain types
of problems, with respect to the measures
considered) relative to other existing algorithms