Inference III: Approximate Inference - PowerPoint PPT Presentation

About This Presentation
Title:

Inference III: Approximate Inference

Description:

Highly stochastic distributions 'Far' evidence ... Stochastic Processes. This behavior of a chain (a Markov Process) is called Mixing. ... Stochastic Sampling ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 46
Provided by: NirFri
Category:

less

Transcript and Presenter's Notes

Title: Inference III: Approximate Inference


1
Inference IIIApproximate Inference
  • Slides by Nir Friedman

2
Global conditioning
Fixing value of A B
Fixing values in the beginning of the summation
can decrease tables formed by variable
elimination. This way space is traded with time.
Special case choose to fix a set of nodes that
break all loops. This method is called
cutset-conditioning. Alternatively, choose to fix
some variables from the largest cliques in a
clique tree.
3
Approximation
  • Until now, we examined exact computation
  • In many applications, approximation are
    sufficient
  • Example P(X xe) 0.3183098861838
  • Maybe P(X xe) ? 0.3 is a good enough
    approximation
  • e.g., we take action only if P(X xe) gt 0.5
  • Can we find good approximation algorithms?

4
Types of Approximations
  • Absolute error
  • An estimate q of P(X x e) has absolute error
    ?, if
  • P(X xe) - ? ? q ? P(X xe) ?
  • equivalently
  • q - ? ? P(X xe) ? q ?
  • Absolute error is not always what we want
  • If P(X x e) 0.0001, then an absolute error
    of 0.001 is unacceptable
  • If P(X x e) 0.3, then an absolute error of
    0.001 is overly precise

1
q
2?
0
5
Types of Approximations
  • Relative error
  • An estimate q of P(X x e) has relative error
    ?, if
  • P(X xe)(1 - ?) ? q ? P(X xe)(1 ?)
  • equivalently
  • q/(1 ?) ? P(X xe) ? q/(1 - ?)
  • Sensitivity of approximation depends on actual
    value of desired result

1
q/(1-?)
q
q/(1?)
0
6
Complexity
  • Recall, exact inference is NP-hard
  • Is approximate inference any easier?
  • Construction for exact inference
  • Input a 3-SAT problem ?
  • Output a BN such that P(Xt) gt 0 iff ? is
    satisfiable

7
Complexity Relative Error
  • Suppose that q is a relative error estimate
    ofP(X t),
  • If ? is not satisfiable, then P(X t)0 . Hence,

0 P(X t)(1 - ?) ? q ? P(X t)(1 ?) 0
namely, q0. Thus, if q gt 0, then ? is
satisfiable
An immediate consequence Thm Given ?, finding
an ?-relative error approximation is NP-hard
8
Complexity Absolute error
  • We can find absolute error approximations to P(X
    x) with high probability (via sampling).
  • We will see such algorithms shortly
  • However, once we have evidence, the problem is
    harder
  • Thm
  • If ? lt 0.5, then finding an estimate of P(Xxe)
    with ? absulote error approximation is NP-Hard

9
Proof
  • Recall our construction

10
Proof (cont.)
  • Suppose we can estimate with ? absolute error
  • Let p1 ? P(Q1 t X t)
  • Assign q1 t if p1 gt 0.5, else q1 f

Let p2 ? P(Q2 t X t, Q1 q1 ) Assign q2
t if p2 gt 0.5, else q2 f
Let pn ? P(Qn t X t, Q1 q1, , Qn-1
qn-1 ) Assign qn t if pn gt 0.5, else qn f
11
Proof (cont.)
  • Claim if ? is satisfiable, then q1 ,, qn is a
    satisfying assignment
  • Suppose ? is satisfiable
  • By induction on i there is a satisfying
    assignment with Q1 q1, , Qi qi

Base case If Q1 t in all satisfying
assignments, ? P(Q1 t X t) 1 ? p1 ? 1 - ?
gt 0.5 ? q1 t If Q1 f, in all satisfying
assignments, then q1 f Otherwise, statement
holds for any choice of q1
12
Proof (cont.)
  • Claim if ? is satisfiable, then q1 ,, qn is a
    satisfying assignment
  • Suppose ? is satisfiable
  • By induction on i there is a satisfying
    assignment with Q1 q1, , Qi qi

Induction argument If Qi1 t in all satisfying
assignments s.t.Q1 q1, , Qi qi ? P(Qi1 t
X t, Q1 q1, , Qi qi ) 1 ? pi1 ? 1 - ?
gt 0.5 ? qi1 t If Qi1 f in all satisfying
assignments s.t.Q1 q1, , Qi qi then qi1 f
13
Proof (cont.)
  • We can efficiently check whether q1 ,, qn is a
    satisfying assignment (linear time)
  • If it is, then ? is satisfiable
  • If it is not, then ? is not satisfiable
  • Suppose we have an approximation procedure with ?
    relative error
  • ? we can determine 3-SAT with n procedure calls
  • ? approximation is NP-hard

14
When can we hope to approximate?
  • Two situations
  • Peaked distributions
  • improbable values are ignored
  • Highly stochastic distributions
  • Far evidence is discarded

15
Peaked distributions
  • If the distribution is peaked, then most of the
    mass is on few instances
  • If we can focus on these instances, we can ignore
    the rest

Instances
16
Stochasticity Approximations
  • Consider a chain
  • P(Xi1 t Xi t) 1- ?P(Xi1 f Xi f)
    1- ?
  • Computing the probability of Xn given X1 , we get

17
Plot of P(Xn t X1 t)
?
18
Stochastic Processes
  • This behavior of a chain (a Markov Process) is
    called Mixing. We return to this as a tool in
    approximation
  • In general networks there is a similar behavior
  • If probabilities are far from 0 1, then effect
    of far evidence vanishes (and so can be
    discarded in approximations).

19
Bounded conditioning
Fixing value of A B
By examining only the probable assignment of A
B, we perform several simple computations instead
of a complex one
20
Bounded conditioning
  • Choose A and B so that P(Y,e a,b) can be
    computed easily. E.g., a cycle cutset.
  • Search for highly probable assignments to A,B.
  • Option 1--- select a,b with high P(a,b).
  • Option 2--- select a,b with high P(a,b e).
  • We need to search for such high mass values and
    that can be hard.

21
Bounded Conditioning
  • Advantages
  • Combines exact inference within approximation
  • Continuous more time can be used to examine more
    cases
  • Bounds unexamined mass used to compute
    error-bars
  • Possible problems
  • P(a,b) is prior mass not the posterior.
  • If posterior is significantly different P(a,b
    e), Computation can be wasted on irrelevant
    assignments

22
Network Simplifications
  • In these approaches, we try to replace the
    original network with a simpler one
  • the resulting network allows fast exact methods

23
Network Simplifications
  • Typical simplifications
  • Remove parts of the network
  • Remove edges
  • Reduce the number of values (value abstraction)
  • Replace a sub-network with a simpler one(model
    abstraction)
  • These simplifications are often w.r.t. to the
    particular evidence and query

24
Stochastic Simulation
  • Suppose we can sample instances ltx1,,xngt
    according to P(X1,,Xn)
  • What is the probability that a random sample
    ltx1,,xngt satisfies e?
  • This is exactly P(e)
  • We can view each sample as tossing a biased coin
    with probability P(e) of Heads

25
Stochastic Sampling
  • Intuition given a sufficient number of samples
    x1,,xN, we can estimate
  • Law of large number implies that as N grows, our
    estimate will converge to p with high probability

26
Sampling a Bayesian Network
  • If P(X1,,Xn) is represented by a Bayesian
    network, can we efficiently sample from it?
  • Idea sample according to structure of the
    network
  • Write distribution using the chain rule, and then
    sample each variable given its parents

27
Logic sampling
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
e
P(r)
0.3
0.001
a
P(c)
0.8
0.05
28
Logic sampling
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
e
P(r)
0.3
0.001
a
P(c)
0.8
0.05
e
29
Logic sampling
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
e
P(r)
0.3
0.001
a
P(c)
0.8
0.05
e
a
30
Logic sampling
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
e
P(r)
0.3
0.001
a
P(c)
0.8
0.05
e
a
c
31
Logic sampling
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
e
P(r)
0.3
0.001
a
P(c)
0.8
0.05
e
a
c
32
Logic Sampling
  • Let X1, , Xn be order of variables consistent
    with arc direction
  • for i 1, , n do
  • sample xi from P(Xi pai )
  • (Note since Pai ? X1,,Xi-1, we already
    assigned values to them)
  • return x1, ,xn

33
Logic Sampling
  • Sampling a complete instance is linear in number
    of variables
  • Regardless of structure of the network
  • However, if P(e) is small, we need many samples
    to get a decent estimate

34
Can we sample from P(X1,,Xn e)?
  • If evidence is in roots of network, easily
  • If evidence is in leaves of network, we have a
    problem
  • Our sampling method proceeds according to order
    of nodes in graph
  • Note, we can use arc-reversal to make evidence
    nodes root.
  • In some networks, however, this will create
    exponentially large tables...

35
Likelihood Weighting
  • Can we ensure that all of our sample satisfy e?
  • One simple solution
  • When we need to sample a variable that is
    assigned value by e, use the specified value
  • For example we know Y 1
  • Sample X from P(X)
  • Then take Y 1
  • Is this a sample from P(X,Y Y 1) ?

36
Likelihood Weighting
  • Problem these samples of X are from P(X)
  • Solution
  • Penalize samples in which P(Y1X) is small
  • We now sample as follows
  • Let xi be a sample from P(X)
  • Let wi be P(Y 1X x i)

37
Likelihood Weighting
  • Why does this make sense?
  • When N is large, we expect to sample NP(X x)
    samples with xi x
  • Thus,

38
Likelihood Weighting
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
r
r
P(r)
0.3
0.001
a
a
P(c)
B E A C R
0.8
0.05
Samples
39
Likelihood Weighting
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
r
r
P(r)
0.3
0.001
a
P(c)
0.8
0.05
e
40
Likelihood Weighting
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
r
r
P(r)
0.3
0.001
a
P(c)
0.8
0.05
0.6
e
41
Likelihood Weighting
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
r
r
P(r)
0.3
0.001
a
P(c)
0.8
0.05
0.6
e
c
42
Likelihood Weighting
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
r
r
P(r)
0.3
0.001
a
P(c)
B E A C R
0.8
0.05
0.6
0.3
e
c
r
Samples
43
Likelihood Weighting
  • Let X1, , Xn be order of variables consistent
    with arc direction
  • w 1
  • for i 1, , n do
  • if Xi xi has been observed
  • w ?w ? P(Xi xi pai )
  • else
  • sample xi from P(Xi pai )
  • return x1, ,xn, and w

44
Likelihood Weighting
  • What can we say about the quality of answer?
  • Intuitively, the weights of sample reflects their
    probability given the evidence. We need collect a
    certain mass.
  • Another factor is the extremeness of CPDs.
  • Thm
  • If P(Xi Pai) ?l,u for all CPDs, and
  • then with probability 1-?, the estimate is ?
    relative error approximation

45
END
Write a Comment
User Comments (0)
About PowerShow.com