Introduction to Monte Carlo Methods - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Monte Carlo Methods

Description:

Introduction to Monte Carlo Methods ... easier As opposed to its use in optimization, ... Default Design MathType 5.0 Equation Bitmap Image Microsoft ... – PowerPoint PPT presentation

Number of Views:220
Avg rating:3.0/5.0
Slides: 52
Provided by: UCSD3
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Monte Carlo Methods


1
Introduction to Monte Carlo Methods
D.J.C. Mackay
Rasit Onur Topaloglu Ph.D. candidate rot_at_ucsd.edu
2
Outline
  • Problems Monte Carlo (MC) deals with
  • Uniform Sampling
  • Importance Sampling
  • Rejection Sampling
  • Metropolis
  • Gibbs Sampling
  • Speeding up MC Hybrid MC and Over-relaxation

3
Definition of Problems
  • Problem1 Generate samples from a distribution

Problem2 Estimate expectation of a function
given probability distribution for a variable
4
Monte Carlo for High Dimensions
  • Solving first problem gt solving second one
  • Just evaluate function using the samples and
    average them out
  • The accuracy is independent of the dimensionality
    of the space sampled

5
Sampling from P(x)
  • Assume we can evaluate P(x) within a
    multiplicative constant
  • If P(x) can be evaluated, still a hard problem
    as
  • Z is not known
  • Not easy to draw at high dimensions (except
    Gaussian)
  • ex A sample from univariate Gaussian can be
    calculated as

u1 and u2 are uniform in 0,1
6
Difficulty of Sampling at High Dimensions
  • Can discretize the function (figure on right) and
    sample the discrete samples
  • This is costly at high dimensions BN for B bins
    and N dimensions

7
Uniform Sampling
  • Tries to solve the 2nd problem
  • Draw samples uniformly from state space
  • ZR is the normalizing constant
  • Estimate by
  • For distributions that have peaks at a small
    region, lots of points must be sampled so that
    ?(x) is calculated a number of times gt requires
    lots of samples
  • Thus, uniform distribution seldom useful

8
Importance Sampling
  • Tries to solve the 2nd problem
  • Introduce a simpler density Q
  • Values of x where Q(x)gtP(x) are over-represented
    values of x where Q(x)ltP(x) are
    under-representedgt
  • introduce weights

9
Reliability of Importance Sampling
The estimate ? vs number of samples Gaussian
sampler Cauchy Sampler
  • An importance sampler should have heavy tails in
    problems where infrequent samples might be
    influential

10
Rejection Sampling
  • Again, a Q (proposal density) assumed
  • Also assume we know a constant c s.t.
  • Generate x using Q(x)
  • Find r.v. u uniformly in interval 0,cQ(x)
  • If ultP(x), accept and add x to the random
    number list

11
Transition to Markov Chain MC
  • Importance and rejection sampling work well if Q
    is similar to P
  • In complex problems, it is difficult to find a
    single such Q over the state space

12
Metropolis Method
  • Proposal density Q depends on current state
    x(t)Q(xx(t))

Accept new state if a?1, Else accept with
probability a
  • In comparison to rejection sampling, the rejected
    points are not discarded and hence influential
    on consequent samples gt samples correlated

gt Metropolis may have to be run more to generate
independent samples from P(x)!
13
Disadvantages of Large Step Size
  • Useful for high dimensions
  • Uses a length scale ? much smaller than state
    space L
  • Because
  • Large steps are highly unlikely to be accepted
  • gt Limited or no movement in space! gt biased

14
A Lower Bound for Independent Samples
  • Metropolis will explore using a random walk
  • Random walks take a long time
  • After T steps of size ?, state only moves T?
  • As Monte Carlo is trying to achieve independent
    samples, requires (L/ ?)2 samples to get a
    sample independent of the initial condition
  • This rule of thumb be used for a lower bound

15
An Example using this Bound
100th iteration
time
400th iteration
1200th iteration
  • Takes 102100 steps to reach an end state
  • Metropolis still provides biases estimates even
    after a large number of iterations

16
Gibbs Sampling
  • As opposed to previous methods, at least 2
    dimensional distributions required
  • Q defined in terms of conditional distributions
    of joint distribution P(x)
  • The assumption is P(x) complex to evaluate but
    P(xixjj?i) is tractable

17
Gibbs Sampling on an Example
  • Start with x(x1,x2), fix x2(t) and sample x1
    from P(x1x2)
  • Fix x1 and sample x2 from P(x2x1)

18
Gibbs Sampling with K Variables
  • In comparison to Metropolis, every proposal is
    always accepted
  • In bigger models, groups of variables jointly
    sampled

19
Comparison of MC Methods in High Dimensions
  • Importance and rejection sampling result in high
    weights and constants respectively, resulting in
    inaccurate or length simulations gt not practical
  • Metropolis requires at least (?max/ ?min)2
    samples to acquire independent samples gt might
    be lengthy
  • Gibbs sampling has similar properties with
    Metropolis. No adjustable parameters gt most
    practical

20
Practical Questions About Monte Carlo
  • Can we predict how long it takes for equilibrium
  • Use the simple bound proposed
  • Can we determine convergence in a running
    simulation yet another difficult problem
  • Can we speed up convergence time and time
    between independent samples?

21
Reducing Random Walk in Metropolis Hybrid MC
  • Most probabilities can be written in the form
  • Introduce a momentum variable p
  • K is kinetic energy
  • Create asymptotic samples from joint distribution
  • Pick p randomly from
  • Update x and p
  • This results in linear time convergence

22
An Illustrative Example for Hybrid MC
Hybrid
Hybrid
Random walk
Random walk
  • Over a number of iterations, hybrid trajectories
    indicate less
  • correlated samples

23
Reducing Random Walk in Gibbs Over-relaxation
  • Use former value x(t) as well to calculate x(t1)
  • Suitable to speed up the process when variables
    are highly correlated
  • Useful for Gaussians, not straightforward for
    other types of conditional distributions

24
An Illustrative Example for Over-Relaxation
  • State space for a bi-variate Gaussian for 40
    iterations
  • Over-relaxation samples better covers the state
    space

25
Reducing Random Simulated Annealing
  • Introduce a parameter T (temperature) and
    gradually reduce it to 1
  • High T corresponds to being able to make
    transitions easier
  • As opposed to its use in optimization, T is not
    reduced to 0 but 1

26
Applications of Monte Carlo
Differential Equations
Ex Steady-state temperature distribution of an
annulus
The finite difference approximation using grid
dimension h
  • With ¼ probability, one of the states is selected
  • The value when a boundary is reached is kept, the
    mean of many such runs gives the solution

27
Applications of Monte Carlo
Integration
To evaluate
  • Bound the function with a box
  • Ratio of points under function to all points
    within the box gives the ratio of the areas of
    the function and the box

28
Applications of Monte Carlo
Image Processing Automatic eye-glass removal
  • MCMC used instead of a gradient-based methods to
    solve
  • MAP criterion that is used to locate points of
    eye-glasses

C. Wu, C. Liu, H.-Y. Shum, Y.-Q. Xu and Z.
Zhang, Automatic Eyeglasses Removal from Face
Images, IEEE Tran. On Pattern Analysis and
Machine Intelligence, Vol.26, No.3, pp. 322-336,
Mar.2004
29
Applications of Monte Carlo
Image segmentation
  • Data-driven techniques such as edge detection,
    tracing and clustering combined with MCMC to
    speed up the search

Z. Tu and S.-C. Zhu, Image Segmentation by
Data-Driven Markov Chain Monte Carlo, IEEE Tran.
On Pattern Analysis and Machine Intelligence,
Vol.24, No.5, pp. 657-673, May 2002
30
Forward Discrete Probability Propagation
Rasit Onur Topaloglu Ph.D. candidate rot_at_ucsd.edu
31
The Problem
Lowest Level
Ex gm?(2kId)
High level
  • The tree relates physical parameters to circuit
    parameters
  • Structured according to SPICE formula hierarchy
  • Given pdfs for lowest level parameters find
    pdfs at highest level

32
Motivation for Probability Propagation
  • Estimation of distributions of high level
    parameters needed to examine effects of process
    variations
  • Gaussian assumption attributed to these
    parameters no longer accurate in current
    technologies

Find a novel propagation method
GOALS
Determinism a stochastic output using known
formulas
Algebraic tractability enabling manual
applicability
Speed Accuracy be comparable or outperform
Monte Carlo and other parametric methods
33
Parametric Belief Propagation
Calculations handled at each node
  • Each node receives and sends messages to parents
    and children until equilibrium
  • Parent to child (?) causal information
  • Parent to parent (?) diagnostic information

34
Parametric Belief Propagation
  • When arrows in the hierarchy tree indicate linear
    addition operations on Gaussians, analytic
    formulations possible
  • Not straightforward for other distributions or
    non- standard distributions

35
Shortcomings of Monte Carlo
  • Non-determinism Not manually applicable
  • Limited for certain distributions Random number
    generators in most CAD packages only provide
    certain distributions
  • Accuracy May miss points that are less likely
    to occur due to random sampling unless very large
    number of samples used limited by the
    performance of random number generator

36
Monte Carlo FDPP Comparison
one-to-many relationships and custom pdfs
  • Non-standard pdfs not possible without a custom
    random number generator

P1
  • Monte Carlo overestimates in one-to-many
    relationships as same sample is used

P2
P3
P4
37
Operations Necessary to Implement FDPP
Analytic operation on continuous distributions
difficult instead operations on discrete
distributions implemented
  • F (Forward) Given a function, estimates the
    distribution of next node in the formula
    hierarchy
  • Q (Quantize) Discretize a pdf to operate on its
    samples
  • B (Bandpass) Decrements number of samples for
    computational efficiency
  • R (Re-bin) Reduces number of samples for
    computational efficiency

38
Necessary Operators (Q, F, B, R) on a
Hierarchical Tree
  • Repeated until we acquire the high level
    distribution (ex. G)

39
Probability Discretization Theory QN Operator
p and r domains
pdf(X)
p-domain
r-domain
Certain operators easy to apply in r-domain
40
Characterizing an spdf
spdf(X) or ?(X)
r-domain
41
F Operator
  • F operator implements a function over spdfs
  • Function applied to individual impulses
  • Individual probabilities multiplied

42
Band-pass, Be, Operator
43
Re-bin, RN, Operator
Resulting spdf(X)
44
Error Analysis
  • If quantizer uniform and ? small, quantization
    error random variable Q is uniformly distributed

45
Algorithm Implementing the F Operator
While each random variable has its spdf computed
For each rv. which has all ancestor spdfs
computed
For each sample in X1
For each sample in Xr
Place an impulse with height p1,..,pr at
xf(v1,..,vr)
Apply B and R algorithms to this rv.
46
Algorithm for the B and R Operators
Find maximum and minimum values wi within impulses
Divide this range into M bins
For each bin
Place a quantizing impulse at the center of the
bin with a height pi equal to the sum of all
impulses within bin
Find maximum probability, pi-max, of quantized
impulses within bins
Eliminate impulses within bins which have a
quantized impulse with smaller probability than
error-ratepi-max
Find new maximum and minimum values wi within
impulses
Divide this range into N bins
For each bin
Place an impulse at the center of the bin with
height equal to sum of all impulses within bin
47
Monte Carlo FDPP Comparison
Pdf of Vth
Pdf of ID
solid FDPP dotted Monte Carlo
  • A close match is observed after interpolation

48
Monte Carlo FDPP Comparison with a Low Sample
Number
Pdf of ?F
Pdf of ?F
solid FDPP,100 samples
noisy Monte Carlo, 1000 and 100000 samples
respectively
  • Monte Carlo inaccurate for moderate number of
    samples
  • Indicates FDPP can be manually applied without
    major accuracy degradation

49
Monte Carlo FDPP Comparison
Pdf of n7
Benchmark example
solid FDPP dotted Monte Carlo
trianglesbelief propagation
  • Edges define a linear sum, ex n5n2n3

50
Faulty Application of Monte Carlo
Pdf of n7
Benchmark example
solid FDPP dotted Monte Carlo
trianglesbelief propagation
  • Distributions at internal nodes n4, n5, n6 should
    be re-sampled using Monte Carlo
  • Not optimal for internal nodes with non-standard
    distributions

51
Conclusions
  • Forward Discrete Probability Propagation is
    introduced as an alternative to Monte Carlo
    based methods
  • FDPP should be preferred when low probability
    samples are important, algebraic intuition
    needed, non-standard pdfs are present or
    one-to-many relationships are present
Write a Comment
User Comments (0)
About PowerShow.com