Two Approximate Algorithms for Belief Updating - PowerPoint PPT Presentation

About This Presentation
Title:

Two Approximate Algorithms for Belief Updating

Description:

The idea was explored for variable elimination (Mini-Bucket) ... How to process the mini-clusters to obtain approximations or bounds: ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 31
Provided by: Informatio367
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Two Approximate Algorithms for Belief Updating


1
Two Approximate Algorithms for Belief Updating
  • Mini-Clustering - MC
  • Robert Mateescu, Rina Dechter, Kalev Kask. "Tree
    Approximation for Belief Updating", AAAI-2002
  • Iterative Join-Graph Propagation - IJGP
  • Rina Dechter, Kalev Kask and Robert Mateescu.
    "Iterative Join-Graph Propagation, UAI 2002

2
What is Mini-Clustering?
  • Mini-Clustering (MC) is an approximate algorithm
    for belief updating in Bayesian networks
  • MC is an anytime version of join-tree clustering
  • MC applies message passing along a cluster tree
  • The complexity of MC is controlled by a
    user-adjustable parameter, the i-bound
  • Empirical evaluation shows that MC is a very
    effective algorithm, in many cases superior to
    other approximate schemes (IBP, Gibbs Sampling)

3
Motivation
  • Probabilistic reasoning using belief networks is
    known to be NP-hard
  • Nevertheless, approximate inference can be a
    powerful tool for decision making under
    uncertainty
  • We propose an anytime version of Cluster Tree
    Elimination

4
Outline
  • Preliminaries
  • Belief networks
  • Tree decompositions
  • Tree Clustering algorithm
  • Mini-Clustering algorithm
  • Experimental results

5
Belief networks
  • The belief updating problem is the task of
    computing the posterior probability P(Ye) of
    query nodes Y ? X given evidence e.We focus on
    the basic case where Y is a single variable Xi

6
Tree decompositions
7
Tree decompositions
A B C p(a), p(ba), p(ca,b)
BC
B C D F p(db), p(fc,d)
BF
B E F p(eb,f)
A
B
EF
E F G p(ge,f)
E
C
D
F
Belief network
Tree decomposition
G
8
Example Join-tree
A B C p(a), p(ba), p(ca,b)
BC
B C D F p(db), p(fc,d)
BF
B E F p(eb,f)
EF
E F G p(ge,f)
9
Cluster Tree Elimination
  • Cluster Tree Elimination (CTE) is an exact
    algorithm that works by passing messages along a
    tree decomposition
  • Basic idea
  • Each node sends only one message to each of its
    neighbors
  • Node u sends a message to its neighbor v only
    when u received messages from all its other
    neighbors
  • Previous work on tree clustering
  • Lauritzen, Spiegelhalter - 88 (probabilities)
  • Jensen, Lauritzen, Olesen - 90 (probabilities)
  • Shenoy, Shafer - 90, Shenoy - 97 (general)
  • Dechter, Pearl - 89 (constraints)
  • Gottlob, Leone, Scarello - 00 (constraints)

10
Cluster Tree Elimination
  • Cluster Tree Elimination (CTE) is an exact
    algorithm
  • It works by passing messages along a tree
    decomposition
  • Basic idea
  • Each node sends only one message to each of its
    neighbors
  • Node u sends a message to its neighbor v only
    when u received messages from all its other
    neighbors

11
Cluster Tree Elimination
  • Previous work on tree clustering
  • Lauritzen, Spiegelhalter - 88 (probabilities)
  • Jensen, Lauritzen, Olesen - 90 (probabilities)
  • Shenoy, Shafer - 90, Shenoy - 97 (general)
  • Dechter, Pearl - 89 (constraints)
  • Gottlob, Leone, Scarello - 00 (constraints)

12
Belief Propagation
x1
h(u,v)
v
u
x2
xn
13
Belief Propagation
x1
h(u,v)
v
u
x2
xn
14
Cluster Tree Elimination - example
ABC
1
BC
BCDF
2
BF
BEF
3
EF
EFG
4
15
Cluster Tree Elimination - the messages
A B C p(a), p(ba), p(ca,b)
1
BC
B C D F p(db), p(fc,d) h(1,2)(b,c)
2
sep(2,3)B,F elim(2,3)C,D
BF
B E F p(eb,f), h(2,3)(b,f)
3
EF
E F G p(ge,f)
4
16
Cluster Tree Elimination - properties
  • Correctness and completeness Algorithm CTE is
    correct, i.e. it computes the exact joint
    probability of a single variable and the
    evidence.
  • Time complexity O ( deg ? (nN) ? d w1 )
  • Space complexity O ( N ? d sep)
  • where deg the maximum degree of a node
  • n number of variables ( number of CPTs)
  • N number of nodes in the tree decomposition
  • d the maximum domain size of a variable
  • w the induced width
  • sep the separator size

17
Mini-Clustering - motivation
  • Time and space complexity of Cluster Tree
    Elimination depend on the induced width w of the
    problem
  • When the induced width w is big, CTE algorithm
    becomes infeasible

18
Mini-Clustering - the basic idea
  • Try to reduce the size of the cluster (the
    exponent) partition each cluster into
    mini-clusters with less variables
  • Accuracy parameter i maximum number of
    variables in a mini-cluster
  • The idea was explored for variable elimination
    (Mini-Bucket)

19
Mini-Clustering
  • Motivation
  • Time and space complexity of Cluster Tree
    Elimination depend on the induced width w of the
    problem
  • When the induced width w is big, CTE algorithm
    becomes infeasible
  • The basic idea
  • Try to reduce the size of the cluster (the
    exponent) partition each cluster into
    mini-clusters with less variables
  • Accuracy parameter i maximum number of
    variables in a mini-cluster
  • The idea was explored for variable elimination
    (Mini-Bucket)

20
Mini-Clustering
  • Suppose cluster(u) is partitioned into p
    mini-clusters mc(1),,mc(p), each containing at
    most i variables
  • TC computes the exact message
  • We want to process each ?f?mc(k) f separately

21
Mini-Clustering
  • Approximate each ?f?mc(k) f , k2,,p and take it
    outside the summation
  • How to process the mini-clusters to obtain
    approximations or bounds
  • Process all mini-clusters by summation - this
    gives an upper bound on the joint probability
  • A tighter upper bound process one mini-cluster
    by summation and the others by maximization
  • Can also use mean operator (average) - this gives
    an approximation of the joint probability

22
Idea of Mini-Clustering
Split a cluster into mini-clusters gtbound
complexity
23
Mini-Clustering - example
ABC
1
BC
BCDF
2
BF
BEF
3
EF
EFG
4
24
Mini-Clustering - the messages, i3
A B C p(a), p(ba), p(ca,b)
1
BC
B C D p(db), h(1,2)(b,c) C D F p(fc,d)
2
sep(2,3)B,F elim(2,3)C,D
BF
B E F p(eb,f), h1(2,3)(b), h2(2,3)(f)
3
EF
E F G p(ge,f)
4
25
Cluster Tree Elimination vs. Mini-Clustering
ABC
ABC
1
1
BC
BC
BCDF
BCDF
2
2
BF
BF
BEF
BEF
3
3
EF
EF
EFG
EFG
4
4
26
Mini-Clustering
  • Correctness and completeness Algorithm MC(i)
    computes a bound (or an approximation) on the
    joint probability P(Xi,e) of each variable and
    each of its values.
  • Time space complexity O(n ? hw ? d i)
  • where hw maxu f f ? ?(u) ? ?

27
Normalization
  • Algorithms for the belief updating problem
    compute, in general, the joint probability
  • Computing the conditional probability
  • is easy to do if exact algorithms can be applied
  • becomes an important issue for approximate
    algorithms

28
Normalization
  • MC can compute an (upper) bound on
    the joint P(Xi,e)
  • Deriving a bound on the conditional P(Xie) is
    not easy when the exact P(e) is not available
  • If a lower bound would be available, we
    could useas an upper bound on the posterior
  • In our experiments we normalized the results and
    regarded them as approximations of the posterior
    P(Xie)

29
Experimental results
  • We tested MC with max and mean operators
  • Algorithms
  • Exact
  • IBP
  • Gibbs sampling (GS)
  • MC with normalization (approximate)
  • Networks (all variables are binary)
  • Coding networks
  • CPCS 54, 360, 422
  • Grid networks (MxM)
  • Random noisy-OR networks
  • Random networks

30
Experimental results
  • Measures
  • Normalized Hamming Distance
  • pick most likely value (for exact and for
    approximate)
  • take ratio between number of disagreements and
    total number of variables
  • average over problems
  • BER (Bit Error Rate) - for coding networks
  • Absolute error
  • difference between exact and the approximate,
    averaged over all values, all variables, all
    problems
  • Relative error
  • difference between exact and the approximate,
    divided by the exact, averaged over all values,
    all variables, all problems
  • Time

31
Experimental results
We tested MC with max and mean operators
  • Algorithms
  • Exact
  • IBP
  • Gibbs sampling (GS)
  • MC with normalization (approximate)
  • Networks (all variables are binary)
  • Coding networks
  • CPCS 54, 360, 422
  • Grid networks (MxM)
  • Random noisy-OR networks
  • Random networks
  • Measures
  • Normalized Hamming Distance (NHD)
  • BER (Bit Error Rate)
  • Absolute error
  • Relative error
  • Time

32
Random networks - Absolute error
evidence0
evidence10
33
Coding networks - Bit Error Rate
sigma0.22
sigma.51
34
Noisy-OR networks - Absolute error
evidence10
evidence20
35
CPCS422 - Absolute error
evidence0
evidence10
36
Grid 15x15 - 0 evidence
37
Grid 15x15 - 10 evidence
38
Grid 15x15 - 20 evidence
39
Coding Networks 1
N100, P3, w7
40
Coding Networks 2
N100, P4, w11
41
CPCS54 - w15
42
Noisy-OR Networks 1
N50, P2, w10
43
Noisy-OR Networks 2
N50, P3, w16
44
Random Networks 1
N50, P2, w10
45
Random Networks 2
N50, P3, w16
46
Conclusion
  • MC extends the partition based approximation from
    mini-buckets to general tree decompositions for
    the problem of belief updating
  • Empirical evaluation demonstrates its
    effectiveness and superiority (for certain types
    of problems, with respect to the measures
    considered) relative to other existing algorithms
Write a Comment
User Comments (0)
About PowerShow.com