Gil McVean - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Gil McVean

Description:

In simple cases we can derive analytical expressions for the posterior ... Under the bonnet. Consider a system in which a single parameter can take k possible values ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 19
Provided by: stati3
Category:
Tags: mcvean | bonnet | gil

less

Transcript and Presenter's Notes

Title: Gil McVean


1
Markov Chain Monte Carlo
  • Gil McVean
  • Tuesday 24th February 2009

2
Bayesian inference
  • In Bayesian statistics we want to learn about the
    probability distribution of the parameters of
    interest given the data the posterior
  • In simple cases we can derive analytical
    expressions for the posterior
  • For example, if we have a beta prior Beta(a,b)
    for a binomial parameter, after we have observed
    n successes and m failures, the posterior is also
    beta Beta(an, bm). The key here is conjugacy.

Prior
Likelihood
Posterior
Normalising constant
3
Monte Carlo methods
  • In many situations, the normalising constant
    cannot be calculated analytically
  • Typically this is true if we are dealing with
    multiple parameters, multidimensional parameters,
    complex model structures or complex likelihood
    functions
  • i.e. most situations in modern statistics
  • We can use Monte Carlo methods to sample from
    (and therefore estimate functions of) the
    posterior
  • Perhaps the most widely used method is called
    Markov Chain Monte Carlo, brought to prominence
    by a paper in 1990 by Gelfand and Smith.

4
Markov Chain Monte Carlo
  • The basic idea is to construct a Markov Chain
    whose stationary distribution is the required
    posterior
  • A Markov Chain is a stochastic process that
    generates random variables X1, X2, , Xt, where
    the distribution
  • i.e. the distribution of the next random
    variable depends only on the current random
    variable
  • Note that the Xi are typically highly correlated,
    so each sample is not an independent draw from
    the posterior
  • Thinning of the chain leads to effectively
    independent samples

5
Metropolis sampling
  • Gibbs-sampling is an example of MCMC that uses
    marginal conditional distributions. However, we
    do not always know what these are.
  • However, it is still possible to construct a
    Markov Chain that has the posterior as its
    stationary distribution (Metropolis et al. 1953)
  • In the current step, the value of the parameters
    is Xt. Propose a new set of parameters, Y in a
    symmetric manner i.e. q(Y X) q(X Y)
  • Calculate the prior and likelihood functions for
    the old and new parameter values. Set the
    parameter values in the next step of the chain,
    Xt1 to Y with probability a, otherwise set to Xt

6
Some notation
  • I shall refer to the posterior distribution as
    the target distribution
  • The transition probabilities in the Markov chain
    I shall refer to as the proposal distribution
    also known as the transition kernel
  • When talking about multiple parameters I shall
    refer to the joint posterior
  • and the conditional distributions

7
An example
  • There are two types of bus frequent (average
    arrival time 5 mins) and rare (average arrival
    time 10 mins). I see the following times
  • If I model the inter-arrival times as a mixture
    of two exponential distributions, I want to
    perform inference on the proportion of frequent
    and rare buses

Number
Arrival time
8
  • I will put a beta(1,1) uniform(0,1) prior on
    the mixture proportion f
  • Starting with an initial guess of f 0.2, I will
    propose changes uniform on f - e, f e
  • (Note that if I propose a value lt 0 or gt1 I must
    reject it)
  • 50, 500 and 5000 samples from the chain give me
    the following results

f
f
f
iteration
iteration
iteration
f
f
f
9
Under the bonnet
  • Consider a system in which a single parameter can
    take k possible values
  • My proposal is to select at random from one of
    the k-1 other possible values
  • Now suppose the system has reached its stationary
    distribution so that the probability of the chain
    being in a given state is proportional to its
    prior times its likelihood
  • Consider two states, i and j, with p(Xi) gt p(Xj).
    The rates of flow in the two directions are

Flow i to j
Flow j to i
10
The small print!
  • If detailed balance holds for every pair of
    states, the system remains in its stationary
    distribution
  • We just need to guarantee that the stationary
    distribution is reached
  • There are three requirements for Xt to converge
    to the stationary distribution
  • X must be irreducible. That is every state must
    be (eventually) reachable from every other state
  • X must be aperiodic. This stops the chain from
    oscillating between different states
  • X must be positive recurrent. This states that
    the expected waiting time to return to a state is
    finite and means that a chain will stay in its
    stationary distribution once it first reaches it

11
The Hastings ratio
  • A simple change to the acceptance formula allows
    the use of asymmetric proposal distributions and
    generalises the Metropolis result
  • Moves that multiply or divide parameters need to
    follow change-of-variables rules

Sampling from an Exp(1) distribution
Xt
12
Gibbs sampling as a special case
  • In Gibbs sampling we want to find the posterior
    for a set of parameters
  • Each parameter is updated in turn by sampling
    from the conditional distribution given the data
    and the current value of all other parameters
  • Consider the case of a single parameter updated
    using the Metropolis algorithm where the proposal
    density is the conditional distribution
  • In other words, the Gibbs sampler is an MCMC
    where every proposal is accepted
  • With multiple parameters you have to be careful
    about update order to ensure reversibility

13
Convergence
  • The Markov Chain is guaranteed to have the
    posterior as its stationary distribution (if well
    constructed)
  • BUT this does not tell you how long you have to
    run the chain in order to reach stationarity
  • The initial starting point may have a strong
    influence
  • The proposal distributions may lead to low
    acceptance rates
  • The likelihood surface may have local maxima that
    trap the chain
  • Multiple runs from different initial conditions
    and graphical checks can be used to assess
    convergence
  • The efficiency of a chain can be measured in
    terms of the variance of estimates obtained by
    running the chain for a short period of time

14
Watching your chain
x 0.1 Acceptance 97
  • Three chains, each sampling from an Exp(1)
    distribution with a uniform(-x, x) proposal
    distribution
  • Measure acceptance rates

x 1 Acceptance 62
x 10 Acceptance 8
15
The burn-in
  • Often you start a chain far away from the target
    distribution
  • Truth unknown
  • Check for convergence
  • The first few samples from the chain are poor
    representatives from the stationary distribution
  • These are usually thrown away as burn-in
  • There is no theory that says how much to throw
    away, but its better to err on the side of more
    than less

16
Other uses of MCMC
  • Looking at marginal effects
  • Suppose we have a multidimensional parameter, we
    may just want to know about the marginal
    posterior distribution of that parameter
  • Prediction
  • Given our posterior distribution on parameters,
    we can predict the distribution of future data by
    sampling parameters from the posterior and
    simulating data given those parameters
  • The posterior predictive distribution is a useful
    source of goodness-of-fit testing (if predicted
    data doesnt look like the data youve collected,
    the model is poor)

17
Modifications
  • An important development has been to allow
    trans-dimensional moves (Green 1995), also known
    as Reversible Jump MCMC
  • Useful when looking for change points (e.g. in
    rate) along a sequence
  • Examples include looking for periods of elevated
    accident rate or genomic regions of elevated
    recombination rate
  • There are many subtle variants of basic MCMC that
    allow you to increase efficiency in complex
    situations (see Liu 2001 for many)

Data
Posterior probability of change in rate
Time
18
A good read
  • Markov Chain Monte Carlo in Practice. 1996.
    eds. Gilks WR, Richardson S and Spiegelhalter DJ.
    Chapman Hall/CRC.
  • Bayesian Data Analysis. 2004. Gelman A, Carlin
    JB, Stern HS and Rubin DB. Chapman Hall/CRC.
  • Monte Carlo Strategies in Scientific Computing.
    2001. Liu JS. Springer-Verlag.
  • Chris Holmes short course on Bayesian statistics
  • http//www.stats.ox.ac.uk/cholmes/Courses/BDA/bda
    _mcmc.pdf
Write a Comment
User Comments (0)
About PowerShow.com