Inference III: Approximate Inference - PowerPoint PPT Presentation

About This Presentation

Title:

Inference III: Approximate Inference

Description:

Highly stochastic distributions 'Far' evidence ... Stochastic Processes. This behavior of a chain (a Markov Process) is called Mixing. ... Stochastic Sampling ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 46

Provided by: NirFri

Category:

more less

Transcript and Presenter's Notes

Title: Inference III: Approximate Inference

1
Inference IIIApproximate Inference

Slides by Nir Friedman

2
Global conditioning
Fixing value of A B
Fixing values in the beginning of the summation
can decrease tables formed by variable
elimination. This way space is traded with time.
Special case choose to fix a set of nodes that
break all loops. This method is called
cutset-conditioning. Alternatively, choose to fix
some variables from the largest cliques in a
clique tree.
3
Approximation

Until now, we examined exact computation
In many applications, approximation are
sufficient
Example P(X xe) 0.3183098861838
Maybe P(X xe) ? 0.3 is a good enough
approximation
e.g., we take action only if P(X xe) gt 0.5
Can we find good approximation algorithms?

4
Types of Approximations

Absolute error
An estimate q of P(X x e) has absolute error
?, if
P(X xe) - ? ? q ? P(X xe) ?
equivalently
q - ? ? P(X xe) ? q ?
Absolute error is not always what we want
If P(X x e) 0.0001, then an absolute error
of 0.001 is unacceptable
If P(X x e) 0.3, then an absolute error of
0.001 is overly precise

1
q
2?
0
5
Types of Approximations

Relative error
An estimate q of P(X x e) has relative error
?, if
P(X xe)(1 - ?) ? q ? P(X xe)(1 ?)
equivalently
q/(1 ?) ? P(X xe) ? q/(1 - ?)
Sensitivity of approximation depends on actual
value of desired result

1
q/(1-?)
q
q/(1?)
0
6
Complexity

Recall, exact inference is NP-hard
Is approximate inference any easier?
Construction for exact inference
Input a 3-SAT problem ?
Output a BN such that P(Xt) gt 0 iff ? is
satisfiable

7
Complexity Relative Error

Suppose that q is a relative error estimate
ofP(X t),
If ? is not satisfiable, then P(X t)0 . Hence,

0 P(X t)(1 - ?) ? q ? P(X t)(1 ?) 0
namely, q0. Thus, if q gt 0, then ? is
satisfiable
An immediate consequence Thm Given ?, finding
an ?-relative error approximation is NP-hard
8
Complexity Absolute error

We can find absolute error approximations to P(X
x) with high probability (via sampling).
We will see such algorithms shortly
However, once we have evidence, the problem is
harder
Thm
If ? lt 0.5, then finding an estimate of P(Xxe)
with ? absulote error approximation is NP-Hard

9
Proof

Recall our construction

10
Proof (cont.)

Suppose we can estimate with ? absolute error
Let p1 ? P(Q1 t X t)
Assign q1 t if p1 gt 0.5, else q1 f

Let p2 ? P(Q2 t X t, Q1 q1 ) Assign q2
t if p2 gt 0.5, else q2 f
Let pn ? P(Qn t X t, Q1 q1, , Qn-1
qn-1 ) Assign qn t if pn gt 0.5, else qn f
11
Proof (cont.)

Claim if ? is satisfiable, then q1 ,, qn is a
satisfying assignment
Suppose ? is satisfiable
By induction on i there is a satisfying
assignment with Q1 q1, , Qi qi

Base case If Q1 t in all satisfying
assignments, ? P(Q1 t X t) 1 ? p1 ? 1 - ?
gt 0.5 ? q1 t If Q1 f, in all satisfying
assignments, then q1 f Otherwise, statement
holds for any choice of q1
12
Proof (cont.)

Claim if ? is satisfiable, then q1 ,, qn is a
satisfying assignment
Suppose ? is satisfiable
By induction on i there is a satisfying
assignment with Q1 q1, , Qi qi

Induction argument If Qi1 t in all satisfying
assignments s.t.Q1 q1, , Qi qi ? P(Qi1 t
X t, Q1 q1, , Qi qi ) 1 ? pi1 ? 1 - ?
gt 0.5 ? qi1 t If Qi1 f in all satisfying
assignments s.t.Q1 q1, , Qi qi then qi1 f
13
Proof (cont.)

We can efficiently check whether q1 ,, qn is a
satisfying assignment (linear time)
If it is, then ? is satisfiable
If it is not, then ? is not satisfiable
Suppose we have an approximation procedure with ?
relative error
? we can determine 3-SAT with n procedure calls
? approximation is NP-hard

14
When can we hope to approximate?

Two situations
Peaked distributions
improbable values are ignored
Highly stochastic distributions
Far evidence is discarded

15
Peaked distributions

If the distribution is peaked, then most of the
mass is on few instances
If we can focus on these instances, we can ignore
the rest

Instances
16
Stochasticity Approximations

Consider a chain
P(Xi1 t Xi t) 1- ?P(Xi1 f Xi f)
1- ?
Computing the probability of Xn given X1 , we get

17
Plot of P(Xn t X1 t)
?
18
Stochastic Processes

This behavior of a chain (a Markov Process) is
called Mixing. We return to this as a tool in
approximation
In general networks there is a similar behavior
If probabilities are far from 0 1, then effect
of far evidence vanishes (and so can be
discarded in approximations).

19
Bounded conditioning
Fixing value of A B
By examining only the probable assignment of A
B, we perform several simple computations instead
of a complex one
20
Bounded conditioning

Choose A and B so that P(Y,e a,b) can be
computed easily. E.g., a cycle cutset.
Search for highly probable assignments to A,B.
Option 1--- select a,b with high P(a,b).
Option 2--- select a,b with high P(a,b e).
We need to search for such high mass values and
that can be hard.

21
Bounded Conditioning

Advantages
Combines exact inference within approximation
Continuous more time can be used to examine more
cases
Bounds unexamined mass used to compute
error-bars
Possible problems
P(a,b) is prior mass not the posterior.
If posterior is significantly different P(a,b
e), Computation can be wasted on irrelevant
assignments

22
Network Simplifications

In these approaches, we try to replace the
original network with a simpler one
the resulting network allows fast exact methods

23
Network Simplifications

Typical simplifications
Remove parts of the network
Remove edges
Reduce the number of values (value abstraction)
Replace a sub-network with a simpler one(model
abstraction)
These simplifications are often w.r.t. to the
particular evidence and query

24
Stochastic Simulation

Suppose we can sample instances ltx1,,xngt
according to P(X1,,Xn)
What is the probability that a random sample
ltx1,,xngt satisfies e?
This is exactly P(e)
We can view each sample as tossing a biased coin
with probability P(e) of Heads

25
Stochastic Sampling

Intuition given a sufficient number of samples
x1,,xN, we can estimate
Law of large number implies that as N grows, our
estimate will converge to p with high probability

26
Sampling a Bayesian Network

If P(X1,,Xn) is represented by a Bayesian
network, can we efficiently sample from it?
Idea sample according to structure of the
network
Write distribution using the chain rule, and then
sample each variable given its parents

27
Logic sampling
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
e
P(r)
0.3
0.001
a
P(c)
0.8
0.05
28
Logic sampling
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
e
P(r)
0.3
0.001
a
P(c)
0.8
0.05
e
29
Logic sampling
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
e
P(r)
0.3
0.001
a
P(c)
0.8
0.05
e
a
30
Logic sampling
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
e
P(r)
0.3
0.001
a
P(c)
0.8
0.05
e
a
c
31
Logic sampling
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
e
P(r)
0.3
0.001
a
P(c)
0.8
0.05
e
a
c
32
Logic Sampling

Let X1, , Xn be order of variables consistent
with arc direction
for i 1, , n do
sample xi from P(Xi pai )
(Note since Pai ? X1,,Xi-1, we already
assigned values to them)
return x1, ,xn

33
Logic Sampling

Sampling a complete instance is linear in number
of variables
Regardless of structure of the network
However, if P(e) is small, we need many samples
to get a decent estimate

34
Can we sample from P(X1,,Xn e)?

If evidence is in roots of network, easily
If evidence is in leaves of network, we have a
problem
Our sampling method proceeds according to order
of nodes in graph
Note, we can use arc-reversal to make evidence
nodes root.
In some networks, however, this will create
exponentially large tables...

35
Likelihood Weighting

Can we ensure that all of our sample satisfy e?
One simple solution
When we need to sample a variable that is
assigned value by e, use the specified value
For example we know Y 1
Sample X from P(X)
Then take Y 1
Is this a sample from P(X,Y Y 1) ?

36
Likelihood Weighting

Problem these samples of X are from P(X)
Solution
Penalize samples in which P(Y1X) is small
We now sample as follows
Let xi be a sample from P(X)
Let wi be P(Y 1X x i)

37
Likelihood Weighting

Why does this make sense?
When N is large, we expect to sample NP(X x)
samples with xi x
Thus,

38
Likelihood Weighting
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
r
r
P(r)
0.3
0.001
a
a
P(c)
B E A C R
0.8
0.05
Samples
39
Likelihood Weighting
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
r
r
P(r)
0.3
0.001
a
P(c)
0.8
0.05
e
40
Likelihood Weighting
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
r
r
P(r)
0.3
0.001
a
P(c)
0.8
0.05
0.6
e
41
Likelihood Weighting
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
r
r
P(r)
0.3
0.001
a
P(c)
0.8
0.05
0.6
e
c
42
Likelihood Weighting
P(b)
0.03
P(e)
0.001
b e
P(a)
0.4
0.01
0.98
0.7
r
r
P(r)
0.3
0.001
a
P(c)
B E A C R
0.8
0.05
0.6
0.3
e
c
r
Samples
43
Likelihood Weighting

Let X1, , Xn be order of variables consistent
with arc direction
w 1
for i 1, , n do
if Xi xi has been observed
w ?w ? P(Xi xi pai )
else
sample xi from P(Xi pai )
return x1, ,xn, and w

44
Likelihood Weighting

What can we say about the quality of answer?
Intuitively, the weights of sample reflects their
probability given the evidence. We need collect a
certain mass.
Another factor is the extremeness of CPDs.
Thm
If P(Xi Pai) ?l,u for all CPDs, and
then with probability 1-?, the estimate is ?
relative error approximation

45
END

Write a Comment

User Comments (0)