Approximation Techniques bounded inference - PowerPoint PPT Presentation

About This Presentation

Title:

Approximation Techniques bounded inference

Description:

So we can apply a sum in each mini-bucket, or better, one sum and the rest max, ... recorded by the Mini-Bucket scheme and can be used to estimate ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 68

Provided by: ibm359

Learn more at: https://ics.uci.edu

Category:

more less

Transcript and Presenter's Notes

Title: Approximation Techniques bounded inference

1
Approximation Techniques bounded inference

275b

2
Mini-buckets local inference

The idea is similar to i-consistency
bound the size of recorded dependencies
Computation in a bucket is time and space
exponential in the number of variables
involved
Therefore, partition functions in a bucket
into mini-buckets on smaller number of
variables

3
Mini-bucket approximation MPE task
Split a bucket into mini-buckets gtbound
complexity
4
Approx-mpe(i)

Input i max number of variables allowed in a
mini-bucket
Output lower bound (P of a sub-optimal
solution), upper bound

Example approx-mpe(3) versus elim-mpe
5
Properties of approx-mpe(i)

Complexity O(exp(2i)) time and O(exp(i))
time.
Accuracy determined by upper/lower (U/L) bound.
As i increases, both accuracy and complexity
increase.
Possible use of mini-bucket approximations
As anytime algorithms (Dechter and Rish, 1997)
As heuristics in best-first search (Kask and
Dechter, 1999)
Other tasks similar mini-bucket approximations
for belief updating, MAP and MEU (Dechter and
Rish, 1997)

6
Anytime Approximation
7
Bounded elimination for belief updating

Idea mini-bucket is the same
So we can apply a sum in each mini-bucket, or
better, one sum and the rest max, or min (for
lower-bound)
Approx-bel-max(i,m) generating upper and
lower-bound on beliefs approximates elim-bel
Approx-map(i,m) max buckets will be maximizes,
sum buckets will be sum-max. Approximates
elim-map.

8
Empirical Evaluation(Dechter and Rish, 1997
Rish thesis, 1999)

Randomly generated networks
Uniform random probabilities
Random noisy-OR
CPCS networks
Probabilistic decoding
Comparing approx-mpe and anytime-mpe
versus elim-mpe

9
Random networks

Uniform random 60 nodes, 90 edges (200
instances)
In 80 of cases, 10-100 times speed-up while
U/Llt2
Noisy-OR even better results
Exact elim-mpe was infeasible appprox-mpe took
0.1 to 80 sec.

10
CPCS networks medical diagnosis(noisy-OR model)
Test case no evidence
11
The effect of evidence
More likely evidencegthigher MPE gt higher
accuracy (why?)
Likely evidence versus random (unlikely) evidence
12
Probabilistic decoding
Error-correcting linear block code
State-of-the-art
approximate algorithm iterative belief
propagation (IBP) (Pearls poly-tree algorithm
applied to loopy networks)
13
Iterative Belief Proapagation

Belief propagation is exact for poly-trees
IBP - applying BP iteratively to cyclic networks
No guarantees for convergence
Works well for many coding networks

14
approx-mpe vs. IBP
Bit error rate (BER) as a function of noise
(sigma)
15
Mini-buckets summary

Mini-buckets local inference approximation
Idea bound size of recorded functions
Approx-mpe(i) - mini-bucket algorithm for MPE
Better results for noisy-OR than for random
problems
Accuracy increases with decreasing noise in
Accuracy increases for likely evidence
Sparser graphs -gt higher accuracy
Coding networks approx-mpe outperfroms IBP on
low-induced width codes

16
Heuristic search

Mini-buckets record upper-bound heuristics
The evaluation function over
Best-first expand a node with maximal evaluation
function
Branch and Bound prune if f gt upper bound
Properties
an exact algorithm
Better heuristics lead to more prunning

17
Heuristic Function
Given a cost function
P(a,b,c,d,e) P(a) P(ba) P(ca) P(eb,c)
P(db,a)
Define an evaluation function over a partial
assignment as the probability of its best
extension
0
D
0
B
E
0
D
1
A
B
1
D
E
1
D
f(a,e,d) maxb,c P(a,b,c,d,e) P(a)
maxb,c P)ba) P(ca) P(eb,c) P(da,b)
g(a,e,d) H(a,e,d)
18
Heuristic Function
H(a,e,d) maxb,c P(ba) P(ca) P(eb,c)
P(da,b) maxc P(ca) maxb P(eb,c)
P(ba) P(da,b) maxc P(ca) maxb
P(eb,c) maxb P(ba) P(da,b)
H(a,e,d) f(a,e,d) g(a,e,d) H(a,e,d) ³
f(a,e,d) The heuristic function H is compiled
during the preprocessing stage of the
Mini-Bucket algorithm.
19
Heuristic Function
The evaluation function f(xp) can be computed
using function recorded by the Mini-Bucket scheme
and can be used to estimate the probability of
the best extension of partial assignment xpx1,
, xp,
f(xp)g(xp) H(xp )
For example,
maxB P(eb,c) P(da,b)
P(ba) maxC P(ca) hB(e,c) maxD

hB(d,a) maxE hC(e,a) maxA P(a)
hE(a) hD (a)
H(a,e,d) hB(d,a) hC (e,a)
g(a,e,d) P(a)
20
Properties

Heuristic is monotone
Heuristic is admissible
Heuristic is computed in linear time
IMPORTANT
Mini-buckets generate heuristics of varying
strength using control parameter bound I
Higher bound -gt more preprocessing -gt
stronger heuristics -gt less search
Allows controlled trade-off between preprocessing
and search

21
Empirical Evaluation of mini-bucket heuristics
22
Cluster Tree Elimination - properties

Correctness and completeness Algorithm CTE is
correct, i.e. it computes the exact joint
probability of a single variable and the
evidence.
Time complexity O ( deg ? (nN) ? d w1 )
Space complexity O ( N ? d sep)
where deg the maximum degree of a node
n number of variables ( number of CPTs)
N number of nodes in the tree decomposition
d the maximum domain size of a variable
w the induced width
sep the separator size

23
Mini-Clustering for belief updating

Motivation
Time and space complexity of Cluster Tree
Elimination depend on the induced width w of the
problem
When the induced width w is big, CTE algorithm
becomes infeasible
The basic idea
Try to reduce the size of the cluster (the
exponent) partition each cluster into
mini-clusters with less variables
Accuracy parameter i maximum number of
variables in a mini-cluster
The idea was explored for variable elimination
(Mini-Bucket)

24
Idea of Mini-Clustering
25
Mini-Clustering - example
ABC
1
BC
BCDF
2
BF
BEF
3
EF
EFG
4
26
Cluster Tree Elimination vs. Mini-Clustering
ABC
ABC
1
1
BC
BC
BCDF
BCDF
2
2
BF
BF
BEF
BEF
3
3
EF
EF
EFG
EFG
4
4
27
Mini-Clustering

Correctness and completeness Algorithm MC(i)
computes a bound (or an approximation) on the
joint probability P(Xi,e) of each variable and
each of its values.
Time space complexity O(n ? hw ? d i)
where hw maxu f f ? ?(u) ? ?

28
Experimental results

Algorithms
Exact
IBP
Gibbs sampling (GS)
MC with normalization (approximate)
Networks (all variables are binary)
Coding networks
CPCS 54, 360, 422
Grid networks (MxM)
Random noisy-OR networks
Random networks

Measures
Normalized Hamming Distance (NHD)
BER (Bit Error Rate)
Absolute error
Relative error
Time

29
Random networks - Absolute error
evidence0
evidence10
30
Noisy-OR networks - Absolute error
evidence10
evidence20
31
Grid 15x15 - 10 evidence
32
CPCS422 - Absolute error
evidence0
evidence10
33
Coding networks - Bit Error Rate
sigma0.22
sigma.51
34
Mini-Clustering summary

MC extends the partition based approximation from
mini-buckets to general tree decompositions for
the problem of belief updating
Empirical evaluation demonstrates its
effectiveness and superiority (for certain types
of problems, with respect to the measures
considered) relative to other existing algorithms

35
What is IJGP?

IJGP is an approximate algorithm for belief
updating in Bayesian networks
IJGP is a version of join-tree clustering which
is both anytime and iterative
IJGP applies message passing along a join-graph,
rather than a join-tree
Empirical evaluation shows that IJGP is almost
always superior to other approximate schemes
(IBP, MC)

36
Iterative Belief Propagation - IBP
One step update BEL(U1)
U1
U2
U3

Belief propagation is exact for poly-trees
IBP - applying BP iteratively to cyclic networks
No guarantees for convergence
Works well for many coding networks

X2
X1
37
IJGP - Motivation

IBP is applied to a loopy network iteratively
not an anytime algorithm
when it converges, it converges very fast
MC applies bounded inference along a tree
decomposition
MC is an anytime algorithm controlled by i-bound
MC converges in two passes up and down the tree
IJGP combines
the iterative feature of IBP
the anytime feature of MC

38
IJGP - The basic idea

Apply Cluster Tree Elimination to any join-graph
We commit to graphs that are minimal I-maps
Avoid cycles as long as I-mapness is not violated
Result use minimal arc-labeled join-graphs

39
IJGP - Example
A
B
C
A
C
A
ABC
C
A
AB
BC
C
D
E
ABDE
BCE
BE
C
DE
CE
F
CDEF
G
H
H
FGH
H
F
F
FG
GH
H
GI
I
J
FGI
GHIJ
a) Belief network
a) The graph IBP works on
40
Arc-minimal join-graph
A
C
A
A
ABC
C
A
ABC
C
A
AB
BC
C
AB
BC
ABDE
BCE
ABDE
BCE
BE
C
C
DE
CE
DE
CE
CDEF
CDEF
H
H
FGH
H
FGH
H
F
F
F
FG
GH
H
FG
GH
GI
GI
FGI
GHIJ
FGI
GHIJ
41
Minimal arc-labeled join-graph
A
A
A
ABC
C
A
ABC
C
AB
BC
AB
BC
ABDE
BCE
ABDE
BCE
C
C
DE
CE
DE
CE
CDEF
CDEF
H
H
FGH
H
FGH
H
F
F
FG
GH
F
GH
GI
GI
FGI
GHIJ
FGI
GHIJ
42
Join-graph decompositions
A
A
ABC
C
AB
BC
BC
BC
ABDE
BCE
ABCDE
BCE
ABCDE
BCE
C
DE
CE
CDE
CE
DE
CE
CDEF
CDEF
CDEF
H
FGH
H
FGH
FGH
F
F
F
F
GH
F
GH
F
GH
GI
GI
GI
FGI
GHIJ
FGI
GHIJ
FGI
GHIJ
a) Minimal arc-labeled join graph
b) Join-graph obtained by collapsing nodes of
graph a)
c) Minimal arc-labeled join graph
43
Tree decomposition
BC
ABCDE
BCE
ABCDE
DE
CE
CDE
CDEF
CDEF
FGH
F
F
F
GH
GHI
GI
FGI
GHIJ
FGHI
GHIJ
a) Minimal arc-labeled join graph
a) Tree decomposition
44
Join-graphs
more accuracy
less complexity
45
Message propagation
BC
ABCDE
BCE
ABCDE p(a), p(c), p(bac), p(dabe),p(eb,c)
h(3,1)(bc)
h(3,1)(bc)
CDE
BCD
1
3
CE
BC
CDEF
FGH
h(1,2)
CDE
CE
F
F
GH
CDEF
2
GI
FGI
GHIJ
Minimal arc-labeled sep(1,2)D,E
elim(1,2)A,B,C
Non-minimal arc-labeled sep(1,2)C,D,E
elim(1,2)A,B
46
Bounded decompositions

We want arc-labeled decompositions such that
the cluster size (internal width) is bounded by i
(the accuracy parameter)
the width of the decomposition as a graph
(external width) is as small as possible
Possible approaches to build decompositions
partition-based algorithms - inspired by the
mini-bucket decomposition
grouping-based algorithms

47
Partition-based algorithms
GFE
P(GF,E)
EF
EBF
P(EB,F)
P(FC,D)
BF
F
FCD
BF
CD
CDB
P(DB)
CB
B
CAB
P(CA,B)
BA
BA
P(BA)
A
A
P(A)
a) schematic mini-bucket(i), i3 b) arc-labeled
join-graph decomposition
48
IJGP properties

IJGP(i) applies BP to min arc-labeled
join-graph, whose cluster size is bounded by i
On join-trees IJGP finds exact beliefs
IJGP is a Generalized Belief Propagation
algorithm (Yedidia, Freeman, Weiss 2001)
Complexity of one iteration
time O(deg(nN) d i1)
space O(Nd?)

49
Empirical evaluation

Measures
Absolute error
Relative error
Kulbach-Leibler (KL) distance
Bit Error Rate
Time

Algorithms
Exact
IBP
MC
IJGP

Networks (all variables are binary)
Random networks
Grid networks (MxM)
CPCS 54, 360, 422
Coding networks

50
Random networks - KL at convergence
evidence0
evidence5
51
Random networks - KL vs. iterations
evidence0
evidence5
52
Random networks - Time
53
Coding networks - BER
sigma.22
sigma.32
sigma.51
sigma.65
54
Coding networks - Time
55
IJGP summary

IJGP borrows the iterative feature from IBP and
the anytime virtues of bounded inference from MC
Empirical evaluation showed the potential of
IJGP, which improves with iteration and most of
the time with i-bound, and scales up to large
networks
IJGP is almost always superior, often by a high
margin, to IBP and MC
Based on all our experiments, we think that IJGP
provides a practical breakthrough to the task of
belief updating

56
Random networks
N80, 100 instances, w15
57
Random networks
N80, 100 instances, w15
58
CPCS 54, CPCS360
CPCS360 5 instances, w20 CPCS54 100
instances, w15
59
Graph coloring problems

X1
H3
X1
X2
X3
X4
Xn-1
Xn
X3
H1

X2
H2
H1
H2
H3
H4
60
Graph coloring problems
61
Inference power of IBP - summary

IBPs inference of zero beliefs converges in a
finite number of iterations and is sound The
results extend to generalized belief propagation
algorithms, in particular to IJGP
We identified classes of networks for which IBP
can infer zeros, and therefore is likely to be
good
can not infer zeros, although there are many of
them (graph coloring), and therefore is bad
Based on the analysis it is easy to synthesize
belief networks that are hard for IBP.
The success of IBP for coding networks can be
explained by
Many extreme beliefs
An easy-for-arc-consistency flat network

62
Road map

CSPs complete algorithms
CSPs approximations
Belief nets complete algorithms
Belief nets approximations
Local inference mini-buckets
Stochastic simulations
Variational techniques
MDPs

63
Stochastic Simulation

Forward sampling (logic sampling)
Likelihood weighing
Markov Chain Monte Carlo (MCMC) Gibbs sampling

64
Approximation via Sampling
65
Forward Sampling(logic sampling (Henrion, 1988))

66
Forward sampling (example)
Drawback high rejection rate!
67
Likelihood Weighing(Fung and Chang, 1990
Shachter and Peot, 1990)
Clamping evidenceforward sampling weighing
samples by evidence likelihood
Works well for likely evidence!
68
Gibbs Sampling(Geman and Geman, 1984)
Markov Chain Monte Carlo (MCMC) create a Markov
chain of samples
Advantage guaranteed to converge to
P(X) Disadvantage convergence may be slow
69
Gibbs Sampling (contd)(Pearl, 1988)
Markov blanket

Write a Comment

User Comments (0)