Title: Introduction to Monte Carlo Methods
1Introduction to Monte Carlo Methods
D.J.C. Mackay
Rasit Onur Topaloglu Ph.D. candidate rot_at_ucsd.edu
2Outline
- Problems Monte Carlo (MC) deals with
- Uniform Sampling
- Importance Sampling
- Rejection Sampling
- Metropolis
- Gibbs Sampling
- Speeding up MC Hybrid MC and Over-relaxation
3Definition of Problems
- Problem1 Generate samples from a distribution
Problem2 Estimate expectation of a function
given probability distribution for a variable
4Monte Carlo for High Dimensions
- Solving first problem gt solving second one
- Just evaluate function using the samples and
average them out
- The accuracy is independent of the dimensionality
of the space sampled
5Sampling from P(x)
- Assume we can evaluate P(x) within a
multiplicative constant
- If P(x) can be evaluated, still a hard problem
as - Z is not known
- Not easy to draw at high dimensions (except
Gaussian) - ex A sample from univariate Gaussian can be
calculated as
u1 and u2 are uniform in 0,1
6Difficulty of Sampling at High Dimensions
- Can discretize the function (figure on right) and
sample the discrete samples
- This is costly at high dimensions BN for B bins
and N dimensions
7Uniform Sampling
- Tries to solve the 2nd problem
- Draw samples uniformly from state space
- ZR is the normalizing constant
- For distributions that have peaks at a small
region, lots of points must be sampled so that
?(x) is calculated a number of times gt requires
lots of samples - Thus, uniform distribution seldom useful
8Importance Sampling
- Tries to solve the 2nd problem
- Introduce a simpler density Q
- Values of x where Q(x)gtP(x) are over-represented
values of x where Q(x)ltP(x) are
under-representedgt - introduce weights
9Reliability of Importance Sampling
The estimate ? vs number of samples Gaussian
sampler Cauchy Sampler
- An importance sampler should have heavy tails in
problems where infrequent samples might be
influential
10Rejection Sampling
- Again, a Q (proposal density) assumed
- Also assume we know a constant c s.t.
- Find r.v. u uniformly in interval 0,cQ(x)
- If ultP(x), accept and add x to the random
number list
11Transition to Markov Chain MC
- Importance and rejection sampling work well if Q
is similar to P - In complex problems, it is difficult to find a
single such Q over the state space
12Metropolis Method
- Proposal density Q depends on current state
x(t)Q(xx(t))
Accept new state if a?1, Else accept with
probability a
- In comparison to rejection sampling, the rejected
points are not discarded and hence influential
on consequent samples gt samples correlated
gt Metropolis may have to be run more to generate
independent samples from P(x)!
13Disadvantages of Large Step Size
- Useful for high dimensions
- Uses a length scale ? much smaller than state
space L
- Because
- Large steps are highly unlikely to be accepted
- gt Limited or no movement in space! gt biased
14A Lower Bound for Independent Samples
- Metropolis will explore using a random walk
- Random walks take a long time
- After T steps of size ?, state only moves T?
- As Monte Carlo is trying to achieve independent
samples, requires (L/ ?)2 samples to get a
sample independent of the initial condition - This rule of thumb be used for a lower bound
15An Example using this Bound
100th iteration
time
400th iteration
1200th iteration
- Takes 102100 steps to reach an end state
- Metropolis still provides biases estimates even
after a large number of iterations
16Gibbs Sampling
- As opposed to previous methods, at least 2
dimensional distributions required - Q defined in terms of conditional distributions
of joint distribution P(x) - The assumption is P(x) complex to evaluate but
P(xixjj?i) is tractable
17Gibbs Sampling on an Example
- Start with x(x1,x2), fix x2(t) and sample x1
from P(x1x2) - Fix x1 and sample x2 from P(x2x1)
18Gibbs Sampling with K Variables
- In comparison to Metropolis, every proposal is
always accepted
- In bigger models, groups of variables jointly
sampled
19Comparison of MC Methods in High Dimensions
- Importance and rejection sampling result in high
weights and constants respectively, resulting in
inaccurate or length simulations gt not practical
- Metropolis requires at least (?max/ ?min)2
samples to acquire independent samples gt might
be lengthy
- Gibbs sampling has similar properties with
Metropolis. No adjustable parameters gt most
practical
20Practical Questions About Monte Carlo
- Can we predict how long it takes for equilibrium
- Use the simple bound proposed
- Can we determine convergence in a running
simulation yet another difficult problem
- Can we speed up convergence time and time
between independent samples?
21Reducing Random Walk in Metropolis Hybrid MC
- Most probabilities can be written in the form
- Introduce a momentum variable p
- Create asymptotic samples from joint distribution
- This results in linear time convergence
22An Illustrative Example for Hybrid MC
Hybrid
Hybrid
Random walk
Random walk
- Over a number of iterations, hybrid trajectories
indicate less - correlated samples
23Reducing Random Walk in Gibbs Over-relaxation
- Use former value x(t) as well to calculate x(t1)
- Suitable to speed up the process when variables
are highly correlated
- Useful for Gaussians, not straightforward for
other types of conditional distributions
24An Illustrative Example for Over-Relaxation
- State space for a bi-variate Gaussian for 40
iterations
- Over-relaxation samples better covers the state
space
25Reducing Random Simulated Annealing
- Introduce a parameter T (temperature) and
gradually reduce it to 1
- High T corresponds to being able to make
transitions easier
- As opposed to its use in optimization, T is not
reduced to 0 but 1
26Applications of Monte Carlo
Differential Equations
Ex Steady-state temperature distribution of an
annulus
The finite difference approximation using grid
dimension h
- With ¼ probability, one of the states is selected
- The value when a boundary is reached is kept, the
mean of many such runs gives the solution
27Applications of Monte Carlo
Integration
To evaluate
- Bound the function with a box
- Ratio of points under function to all points
within the box gives the ratio of the areas of
the function and the box
28Applications of Monte Carlo
Image Processing Automatic eye-glass removal
- MCMC used instead of a gradient-based methods to
solve - MAP criterion that is used to locate points of
eye-glasses
C. Wu, C. Liu, H.-Y. Shum, Y.-Q. Xu and Z.
Zhang, Automatic Eyeglasses Removal from Face
Images, IEEE Tran. On Pattern Analysis and
Machine Intelligence, Vol.26, No.3, pp. 322-336,
Mar.2004
29Applications of Monte Carlo
Image segmentation
- Data-driven techniques such as edge detection,
tracing and clustering combined with MCMC to
speed up the search
Z. Tu and S.-C. Zhu, Image Segmentation by
Data-Driven Markov Chain Monte Carlo, IEEE Tran.
On Pattern Analysis and Machine Intelligence,
Vol.24, No.5, pp. 657-673, May 2002
30Forward Discrete Probability Propagation
Rasit Onur Topaloglu Ph.D. candidate rot_at_ucsd.edu
31The Problem
Lowest Level
Ex gm?(2kId)
High level
- The tree relates physical parameters to circuit
parameters
- Structured according to SPICE formula hierarchy
- Given pdfs for lowest level parameters find
pdfs at highest level
32Motivation for Probability Propagation
- Estimation of distributions of high level
parameters needed to examine effects of process
variations
- Gaussian assumption attributed to these
parameters no longer accurate in current
technologies
Find a novel propagation method
GOALS
Determinism a stochastic output using known
formulas
Algebraic tractability enabling manual
applicability
Speed Accuracy be comparable or outperform
Monte Carlo and other parametric methods
33Parametric Belief Propagation
Calculations handled at each node
- Each node receives and sends messages to parents
and children until equilibrium
- Parent to child (?) causal information
- Parent to parent (?) diagnostic information
34Parametric Belief Propagation
- When arrows in the hierarchy tree indicate linear
addition operations on Gaussians, analytic
formulations possible
- Not straightforward for other distributions or
non- standard distributions
35Shortcomings of Monte Carlo
- Non-determinism Not manually applicable
- Limited for certain distributions Random number
generators in most CAD packages only provide
certain distributions
- Accuracy May miss points that are less likely
to occur due to random sampling unless very large
number of samples used limited by the
performance of random number generator
36Monte Carlo FDPP Comparison
one-to-many relationships and custom pdfs
- Non-standard pdfs not possible without a custom
random number generator
P1
- Monte Carlo overestimates in one-to-many
relationships as same sample is used
P2
P3
P4
37Operations Necessary to Implement FDPP
Analytic operation on continuous distributions
difficult instead operations on discrete
distributions implemented
- F (Forward) Given a function, estimates the
distribution of next node in the formula
hierarchy
- Q (Quantize) Discretize a pdf to operate on its
samples
- B (Bandpass) Decrements number of samples for
computational efficiency
- R (Re-bin) Reduces number of samples for
computational efficiency
38Necessary Operators (Q, F, B, R) on a
Hierarchical Tree
- Repeated until we acquire the high level
distribution (ex. G)
39Probability Discretization Theory QN Operator
p and r domains
pdf(X)
p-domain
r-domain
Certain operators easy to apply in r-domain
40Characterizing an spdf
spdf(X) or ?(X)
r-domain
41F Operator
- F operator implements a function over spdfs
- Function applied to individual impulses
- Individual probabilities multiplied
42Band-pass, Be, Operator
43Re-bin, RN, Operator
Resulting spdf(X)
44Error Analysis
- If quantizer uniform and ? small, quantization
error random variable Q is uniformly distributed
45Algorithm Implementing the F Operator
While each random variable has its spdf computed
For each rv. which has all ancestor spdfs
computed
For each sample in X1
For each sample in Xr
Place an impulse with height p1,..,pr at
xf(v1,..,vr)
Apply B and R algorithms to this rv.
46Algorithm for the B and R Operators
Find maximum and minimum values wi within impulses
Divide this range into M bins
For each bin
Place a quantizing impulse at the center of the
bin with a height pi equal to the sum of all
impulses within bin
Find maximum probability, pi-max, of quantized
impulses within bins
Eliminate impulses within bins which have a
quantized impulse with smaller probability than
error-ratepi-max
Find new maximum and minimum values wi within
impulses
Divide this range into N bins
For each bin
Place an impulse at the center of the bin with
height equal to sum of all impulses within bin
47Monte Carlo FDPP Comparison
Pdf of Vth
Pdf of ID
solid FDPP dotted Monte Carlo
- A close match is observed after interpolation
48Monte Carlo FDPP Comparison with a Low Sample
Number
Pdf of ?F
Pdf of ?F
solid FDPP,100 samples
noisy Monte Carlo, 1000 and 100000 samples
respectively
- Monte Carlo inaccurate for moderate number of
samples
- Indicates FDPP can be manually applied without
major accuracy degradation
49Monte Carlo FDPP Comparison
Pdf of n7
Benchmark example
solid FDPP dotted Monte Carlo
trianglesbelief propagation
- Edges define a linear sum, ex n5n2n3
50Faulty Application of Monte Carlo
Pdf of n7
Benchmark example
solid FDPP dotted Monte Carlo
trianglesbelief propagation
- Distributions at internal nodes n4, n5, n6 should
be re-sampled using Monte Carlo
- Not optimal for internal nodes with non-standard
distributions
51Conclusions
- Forward Discrete Probability Propagation is
introduced as an alternative to Monte Carlo
based methods
- FDPP should be preferred when low probability
samples are important, algebraic intuition
needed, non-standard pdfs are present or
one-to-many relationships are present