Approximation Techniques bounded inference - PowerPoint PPT Presentation

About This Presentation
Title:

Approximation Techniques bounded inference

Description:

So we can apply a sum in each mini-bucket, or better, one sum and the rest max, ... recorded by the Mini-Bucket scheme and can be used to estimate ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 68
Provided by: ibm359
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Approximation Techniques bounded inference


1
Approximation Techniques bounded inference
  • 275b

2
Mini-buckets local inference
  • The idea is similar to i-consistency
  • bound the size of recorded dependencies
  • Computation in a bucket is time and space
  • exponential in the number of variables
    involved
  • Therefore, partition functions in a bucket
  • into mini-buckets on smaller number of
    variables

3
Mini-bucket approximation MPE task
Split a bucket into mini-buckets gtbound
complexity
4
Approx-mpe(i)
  • Input i max number of variables allowed in a
    mini-bucket
  • Output lower bound (P of a sub-optimal
    solution), upper bound

Example approx-mpe(3) versus elim-mpe
5
Properties of approx-mpe(i)
  • Complexity O(exp(2i)) time and O(exp(i))
    time.
  • Accuracy determined by upper/lower (U/L) bound.
  • As i increases, both accuracy and complexity
    increase.
  • Possible use of mini-bucket approximations
  • As anytime algorithms (Dechter and Rish, 1997)
  • As heuristics in best-first search (Kask and
    Dechter, 1999)
  • Other tasks similar mini-bucket approximations
    for belief updating, MAP and MEU (Dechter and
    Rish, 1997)

6
Anytime Approximation
7
Bounded elimination for belief updating
  • Idea mini-bucket is the same
  • So we can apply a sum in each mini-bucket, or
    better, one sum and the rest max, or min (for
    lower-bound)
  • Approx-bel-max(i,m) generating upper and
    lower-bound on beliefs approximates elim-bel
  • Approx-map(i,m) max buckets will be maximizes,
    sum buckets will be sum-max. Approximates
    elim-map.

8
Empirical Evaluation(Dechter and Rish, 1997
Rish thesis, 1999)
  • Randomly generated networks
  • Uniform random probabilities
  • Random noisy-OR
  • CPCS networks
  • Probabilistic decoding
  • Comparing approx-mpe and anytime-mpe
  • versus elim-mpe

9
Random networks
  • Uniform random 60 nodes, 90 edges (200
    instances)
  • In 80 of cases, 10-100 times speed-up while
    U/Llt2
  • Noisy-OR even better results
  • Exact elim-mpe was infeasible appprox-mpe took
    0.1 to 80 sec.

10
CPCS networks medical diagnosis(noisy-OR model)
Test case no evidence
11
The effect of evidence
More likely evidencegthigher MPE gt higher
accuracy (why?)
Likely evidence versus random (unlikely) evidence
12
Probabilistic decoding
Error-correcting linear block code
State-of-the-art
approximate algorithm iterative belief
propagation (IBP) (Pearls poly-tree algorithm
applied to loopy networks)
13
Iterative Belief Proapagation
  • Belief propagation is exact for poly-trees
  • IBP - applying BP iteratively to cyclic networks
  • No guarantees for convergence
  • Works well for many coding networks

14
approx-mpe vs. IBP
Bit error rate (BER) as a function of noise
(sigma)
15
Mini-buckets summary
  • Mini-buckets local inference approximation
  • Idea bound size of recorded functions
  • Approx-mpe(i) - mini-bucket algorithm for MPE
  • Better results for noisy-OR than for random
    problems
  • Accuracy increases with decreasing noise in
  • Accuracy increases for likely evidence
  • Sparser graphs -gt higher accuracy
  • Coding networks approx-mpe outperfroms IBP on
    low-induced width codes

16
Heuristic search
  • Mini-buckets record upper-bound heuristics
  • The evaluation function over
  • Best-first expand a node with maximal evaluation
    function
  • Branch and Bound prune if f gt upper bound
  • Properties
  • an exact algorithm
  • Better heuristics lead to more prunning

17
Heuristic Function
Given a cost function
P(a,b,c,d,e) P(a) P(ba) P(ca) P(eb,c)
P(db,a)
Define an evaluation function over a partial
assignment as the probability of its best
extension
0
D
0
B
E
0
D
1
A
B
1
D
E
1
D
f(a,e,d) maxb,c P(a,b,c,d,e) P(a)
maxb,c P)ba) P(ca) P(eb,c) P(da,b)
g(a,e,d) H(a,e,d)
18
Heuristic Function
H(a,e,d) maxb,c P(ba) P(ca) P(eb,c)
P(da,b) maxc P(ca) maxb P(eb,c)
P(ba) P(da,b) maxc P(ca) maxb
P(eb,c) maxb P(ba) P(da,b)
H(a,e,d) f(a,e,d) g(a,e,d) H(a,e,d) ³
f(a,e,d) The heuristic function H is compiled
during the preprocessing stage of the
Mini-Bucket algorithm.
19
Heuristic Function
The evaluation function f(xp) can be computed
using function recorded by the Mini-Bucket scheme
and can be used to estimate the probability of
the best extension of partial assignment xpx1,
, xp,
f(xp)g(xp) H(xp )
For example,
maxB P(eb,c) P(da,b)
P(ba) maxC P(ca) hB(e,c) maxD

hB(d,a) maxE hC(e,a) maxA P(a)
hE(a) hD (a)
H(a,e,d) hB(d,a) hC (e,a)
g(a,e,d) P(a)
20
Properties
  • Heuristic is monotone
  • Heuristic is admissible
  • Heuristic is computed in linear time
  • IMPORTANT
  • Mini-buckets generate heuristics of varying
    strength using control parameter bound I
  • Higher bound -gt more preprocessing -gt
  • stronger heuristics -gt less search
  • Allows controlled trade-off between preprocessing
    and search

21
Empirical Evaluation of mini-bucket heuristics
22
Cluster Tree Elimination - properties
  • Correctness and completeness Algorithm CTE is
    correct, i.e. it computes the exact joint
    probability of a single variable and the
    evidence.
  • Time complexity O ( deg ? (nN) ? d w1 )
  • Space complexity O ( N ? d sep)
  • where deg the maximum degree of a node
  • n number of variables ( number of CPTs)
  • N number of nodes in the tree decomposition
  • d the maximum domain size of a variable
  • w the induced width
  • sep the separator size

23
Mini-Clustering for belief updating
  • Motivation
  • Time and space complexity of Cluster Tree
    Elimination depend on the induced width w of the
    problem
  • When the induced width w is big, CTE algorithm
    becomes infeasible
  • The basic idea
  • Try to reduce the size of the cluster (the
    exponent) partition each cluster into
    mini-clusters with less variables
  • Accuracy parameter i maximum number of
    variables in a mini-cluster
  • The idea was explored for variable elimination
    (Mini-Bucket)

24
Idea of Mini-Clustering
25
Mini-Clustering - example
ABC
1
BC
BCDF
2
BF
BEF
3
EF
EFG
4
26
Cluster Tree Elimination vs. Mini-Clustering
ABC
ABC
1
1
BC
BC
BCDF
BCDF
2
2
BF
BF
BEF
BEF
3
3
EF
EF
EFG
EFG
4
4
27
Mini-Clustering
  • Correctness and completeness Algorithm MC(i)
    computes a bound (or an approximation) on the
    joint probability P(Xi,e) of each variable and
    each of its values.
  • Time space complexity O(n ? hw ? d i)
  • where hw maxu f f ? ?(u) ? ?

28
Experimental results
  • Algorithms
  • Exact
  • IBP
  • Gibbs sampling (GS)
  • MC with normalization (approximate)
  • Networks (all variables are binary)
  • Coding networks
  • CPCS 54, 360, 422
  • Grid networks (MxM)
  • Random noisy-OR networks
  • Random networks
  • Measures
  • Normalized Hamming Distance (NHD)
  • BER (Bit Error Rate)
  • Absolute error
  • Relative error
  • Time

29
Random networks - Absolute error
evidence0
evidence10
30
Noisy-OR networks - Absolute error
evidence10
evidence20
31
Grid 15x15 - 10 evidence
32
CPCS422 - Absolute error
evidence0
evidence10
33
Coding networks - Bit Error Rate
sigma0.22
sigma.51
34
Mini-Clustering summary
  • MC extends the partition based approximation from
    mini-buckets to general tree decompositions for
    the problem of belief updating
  • Empirical evaluation demonstrates its
    effectiveness and superiority (for certain types
    of problems, with respect to the measures
    considered) relative to other existing algorithms

35
What is IJGP?
  • IJGP is an approximate algorithm for belief
    updating in Bayesian networks
  • IJGP is a version of join-tree clustering which
    is both anytime and iterative
  • IJGP applies message passing along a join-graph,
    rather than a join-tree
  • Empirical evaluation shows that IJGP is almost
    always superior to other approximate schemes
    (IBP, MC)

36
Iterative Belief Propagation - IBP
One step update BEL(U1)
U1
U2
U3
  • Belief propagation is exact for poly-trees
  • IBP - applying BP iteratively to cyclic networks
  • No guarantees for convergence
  • Works well for many coding networks

X2
X1
37
IJGP - Motivation
  • IBP is applied to a loopy network iteratively
  • not an anytime algorithm
  • when it converges, it converges very fast
  • MC applies bounded inference along a tree
    decomposition
  • MC is an anytime algorithm controlled by i-bound
  • MC converges in two passes up and down the tree
  • IJGP combines
  • the iterative feature of IBP
  • the anytime feature of MC

38
IJGP - The basic idea
  • Apply Cluster Tree Elimination to any join-graph
  • We commit to graphs that are minimal I-maps
  • Avoid cycles as long as I-mapness is not violated
  • Result use minimal arc-labeled join-graphs

39
IJGP - Example
A
B
C
A
C
A
ABC
C
A
AB
BC
C
D
E
ABDE
BCE
BE
C
DE
CE
F
CDEF
G
H
H
FGH
H
F
F
FG
GH
H
GI
I
J
FGI
GHIJ
a) Belief network
a) The graph IBP works on
40
Arc-minimal join-graph
A
C
A
A
ABC
C
A
ABC
C
A
AB
BC
C
AB
BC
ABDE
BCE
ABDE
BCE
BE
C
C
DE
CE
DE
CE
CDEF
CDEF
H
H
FGH
H
FGH
H
F
F
F
FG
GH
H
FG
GH
GI
GI
FGI
GHIJ
FGI
GHIJ
41
Minimal arc-labeled join-graph
A
A
A
ABC
C
A
ABC
C
AB
BC
AB
BC
ABDE
BCE
ABDE
BCE
C
C
DE
CE
DE
CE
CDEF
CDEF
H
H
FGH
H
FGH
H
F
F
FG
GH
F
GH
GI
GI
FGI
GHIJ
FGI
GHIJ
42
Join-graph decompositions
A
A
ABC
C
AB
BC
BC
BC
ABDE
BCE
ABCDE
BCE
ABCDE
BCE
C
DE
CE
CDE
CE
DE
CE
CDEF
CDEF
CDEF
H
FGH
H
FGH
FGH
F
F
F
F
GH
F
GH
F
GH
GI
GI
GI
FGI
GHIJ
FGI
GHIJ
FGI
GHIJ
a) Minimal arc-labeled join graph
b) Join-graph obtained by collapsing nodes of
graph a)
c) Minimal arc-labeled join graph
43
Tree decomposition
BC
ABCDE
BCE
ABCDE
DE
CE
CDE
CDEF
CDEF
FGH
F
F
F
GH
GHI
GI
FGI
GHIJ
FGHI
GHIJ
a) Minimal arc-labeled join graph
a) Tree decomposition
44
Join-graphs
more accuracy
less complexity
45
Message propagation
BC
ABCDE
BCE
ABCDE p(a), p(c), p(bac), p(dabe),p(eb,c)
h(3,1)(bc)
h(3,1)(bc)
CDE
BCD
1
3
CE
BC
CDEF
FGH
h(1,2)
CDE
CE
F
F
GH
CDEF
2
GI
FGI
GHIJ
Minimal arc-labeled sep(1,2)D,E
elim(1,2)A,B,C
Non-minimal arc-labeled sep(1,2)C,D,E
elim(1,2)A,B
46
Bounded decompositions
  • We want arc-labeled decompositions such that
  • the cluster size (internal width) is bounded by i
    (the accuracy parameter)
  • the width of the decomposition as a graph
    (external width) is as small as possible
  • Possible approaches to build decompositions
  • partition-based algorithms - inspired by the
    mini-bucket decomposition
  • grouping-based algorithms

47
Partition-based algorithms
GFE
P(GF,E)
EF
EBF
P(EB,F)
P(FC,D)
BF
F
FCD
BF
CD
CDB
P(DB)
CB
B
CAB
P(CA,B)
BA
BA
P(BA)
A
A
P(A)
a) schematic mini-bucket(i), i3 b) arc-labeled
join-graph decomposition
48
IJGP properties
  • IJGP(i) applies BP to min arc-labeled
    join-graph, whose cluster size is bounded by i
  • On join-trees IJGP finds exact beliefs
  • IJGP is a Generalized Belief Propagation
    algorithm (Yedidia, Freeman, Weiss 2001)
  • Complexity of one iteration
  • time O(deg(nN) d i1)
  • space O(Nd?)

49
Empirical evaluation
  • Measures
  • Absolute error
  • Relative error
  • Kulbach-Leibler (KL) distance
  • Bit Error Rate
  • Time
  • Algorithms
  • Exact
  • IBP
  • MC
  • IJGP
  • Networks (all variables are binary)
  • Random networks
  • Grid networks (MxM)
  • CPCS 54, 360, 422
  • Coding networks

50
Random networks - KL at convergence
evidence0
evidence5
51
Random networks - KL vs. iterations
evidence0
evidence5
52
Random networks - Time
53
Coding networks - BER
sigma.22
sigma.32
sigma.51
sigma.65
54
Coding networks - Time
55
IJGP summary
  • IJGP borrows the iterative feature from IBP and
    the anytime virtues of bounded inference from MC
  • Empirical evaluation showed the potential of
    IJGP, which improves with iteration and most of
    the time with i-bound, and scales up to large
    networks
  • IJGP is almost always superior, often by a high
    margin, to IBP and MC
  • Based on all our experiments, we think that IJGP
    provides a practical breakthrough to the task of
    belief updating

56
Random networks
N80, 100 instances, w15
57
Random networks
N80, 100 instances, w15
58
CPCS 54, CPCS360
CPCS360 5 instances, w20 CPCS54 100
instances, w15
59
Graph coloring problems

X1
H3
X1
X2
X3
X4
Xn-1
Xn
X3
H1

X2
H2
H1
H2
H3
H4
60
Graph coloring problems
61
Inference power of IBP - summary
  • IBPs inference of zero beliefs converges in a
    finite number of iterations and is sound The
    results extend to generalized belief propagation
    algorithms, in particular to IJGP
  • We identified classes of networks for which IBP
  • can infer zeros, and therefore is likely to be
    good
  • can not infer zeros, although there are many of
    them (graph coloring), and therefore is bad
  • Based on the analysis it is easy to synthesize
    belief networks that are hard for IBP.
  • The success of IBP for coding networks can be
    explained by
  • Many extreme beliefs
  • An easy-for-arc-consistency flat network

62
Road map
  • CSPs complete algorithms
  • CSPs approximations
  • Belief nets complete algorithms
  • Belief nets approximations
  • Local inference mini-buckets
  • Stochastic simulations
  • Variational techniques
  • MDPs

63
Stochastic Simulation
  • Forward sampling (logic sampling)
  • Likelihood weighing
  • Markov Chain Monte Carlo (MCMC) Gibbs sampling

64
Approximation via Sampling
65
Forward Sampling(logic sampling (Henrion, 1988))

66
Forward sampling (example)
Drawback high rejection rate!
67
Likelihood Weighing(Fung and Chang, 1990
Shachter and Peot, 1990)
Clamping evidenceforward sampling weighing
samples by evidence likelihood
Works well for likely evidence!
68
Gibbs Sampling(Geman and Geman, 1984)
Markov Chain Monte Carlo (MCMC) create a Markov
chain of samples
Advantage guaranteed to converge to
P(X) Disadvantage convergence may be slow
69
Gibbs Sampling (contd)(Pearl, 1988)
Markov blanket
Write a Comment
User Comments (0)
About PowerShow.com