Markov Chain Monte Carlo MCMC Methods in Bayesian Estimation - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Markov Chain Monte Carlo MCMC Methods in Bayesian Estimation

Description:

University of Maryland. Slide 1. March 3, 2003. Markov Chain Monte Carlo (MCMC) Methods in Bayesian Estimation. Robert J. Mislevy ... – PowerPoint PPT presentation

Number of Views:263
Avg rating:3.0/5.0
Slides: 23
Provided by: bobmi9
Category:

less

Transcript and Presenter's Notes

Title: Markov Chain Monte Carlo MCMC Methods in Bayesian Estimation


1
Markov Chain Monte Carlo (MCMC) Methods in
Bayesian Estimation
  • Robert J. Mislevy
  • University of Maryland
  • March 3, 2003

2
Topics
  • A generic full Bayesian model for measurement
    models
  • Basic idea of MCMC
  • Properties of MCMC
  • Metropolis sampling
  • Metropolis-Hastings sampling

3
A full Bayesian model A generic measurement
model
  • Xij Response of Person i to Item j
  • qi Parameter(s) of Person i
  • bj Parameter(s) of Item j
  • h Parameter(s) for distribution of qs
  • t Parameter(s) for distribution of bs
  • Note Exchangeability assumed here for qs and
    for bs--i.e., modeling all with the same prior.
    Later well incorporate additional info, about
    people and/or items.

4
A full Bayesian model The recursive expression
of the model
The measurement model Item response given
person item parameters Distributions for person
parameters Distributions for item
parameters Distribution for parameter(s) of
distributions for item parameters Distribution
for parameter(s) of distributions for person
parameters
5
A full Bayesian model The usual MSBNx diagram
  • Addresses just one person
  • Includes all responses for that person
  • Item parameters implicit, in the conditional
    probabilities for item responses
  • q population distribution structure implicit, in
    the prior distribution for this examinee

6
A full Bayesian model A BUGS diagram
bj
pij
qi
t
h
Xij
Items j
Persons i
  • Addresses all responses, all people, and all
    items
  • Plates for people and items
  • Item parameters explicit
  • q population distribution structure explicit

7
A full Bayesian model Bayes theorem
Observe particular data x for all people and
all items. Want to make inferences about qs and
bs, now conditional on x. By Bayes Theorem,
Normalizing constant is nasty
8
A full Bayesian model Bayes theorem
  • Two strategies for drawing inferences without
    having to evaluate the normalizing constant
  • Modal estimation. E.g. BILOG. At any point in
    the posterior for b, can calculate value of
    likelihood and its derivative. Tells you if the
    point is a maximum, and if not, what direction to
    step that might get you to a higher value of the
    posterior.
  • Simulation-based approximation. E.g., BUGS.
    Devise a chain for sampling from full
    conditionals (see next slide). After becoming
    stationery, a draw for a given variable in a
    given cycle has the same distribution as a draw
    from that variables marginal posterior.
    Approximate distributions summary statistics
    from many such draws.

9
Markov Chain Monte Carlo EstimationThe special
case of Gibbs Sampling
  • Draw values from full conditional
    distributions
  • Start with a possible value for each variable in
    cycle 0.
  • In cycle t1,

10
Markov Chain Monte Carlo EstimationGeneralizatio
ns of Gibbs Sampling
  • Dont need to go in the same order every cycle
  • Dont need to hit every variable in every cycle
  • Can sample in blocks of parameters (e.g., the
    three item parameters of each item in the 3PL IRT
    model)
  • Dont need to sample from the exact full
    conditional--can do, for example, Metropolis or
    Metropolis-Hastings approximations within cycles

11
Properties of MCMC (1)
  • Draws in cycle t1 depend on values in cycle t,
    but given them, not on previous cycles--Markov
    property of no memory.
  • Dependence on previous values introduces
    autocorrelations across cycles. Depends on
    problem structure, amount of data.
  • Under regularity conditions (e.g., can cover the
    space, or get from any point to any other
    point.), dependence on starting values is
    forgotten after a sufficiently long run.
    Hence,
  • burn in cycles, left out of summary
    calculations.
  • Run multiple chains from different,
    over-dispersed starting values, to see if they
    look like theyre sampled from the same
    stationery distribution.
  • Gelman-Rubin convergence diagnostics in BUGS
    like ANOVAs

12
Properties of MCMC (2)
  • An example of a violation of regularity
    conditions
  • a Heywood case in a factor analysis run. Needed
    a prior on factor loadings that bounded them away
    from 1 and -1.

13
Properties of MCMC (3)
  • Mixing means how much draws for a given
    parameter can move around the space each cycle.
    More autocorrelation goes along with poor
    mixing.
  • Better mixing means the same number of cycles
    provides more information about the posterior,
    the ceiling being independent draws from the
    posterior. Worse mixing means more cycles are
    needed for (a) burn-in and (b) a given level of
    precision for statistics for the posteriors.

relatively bad mixing
relatively good mixing
14
Metropolis and Metropolis-Hastings sampling
within Gibbs (1)
  • In straight Gibbs sampling, you draw from the
    full conditional posterior for each parameter
    each cycle.
  • Great when they are in a familiar form to sample
    from, but sometimes they arent.
  • Metropolis and Metropolis-Hastings (MH) are
    alternatives that can be used within Gibbs
    sampling, when the full conditional can be
    computed, but cant be sampled from directly.

15
Metropolis and Metropolis-Hastings sampling
within Gibbs (2)
  • Basic idea Draw from a different distribution
    that you can compute AND you can sample from the
    proposal distribution. Draws from the proposal
    distribution are either accepted, or they are
    rejected and the value of this variable in the
    next cycle of the Gibbs sampler remains the same.
  • Almost any proposal distribution will work, as
    long as it is defined over the right range.

16
Metropolis and Metropolis-Hastings sampling
within Gibbs (3)
  • Popular choice Normal distribution, with mean at
    variables previous value and some sd--could be
    determined empirically.
  • Best mixing when 30-40 of the proposals are
    accepted. Worse mixing when too many or too few
    are accepted.
  • What BUGS is doing when it says adapting is
    trying M with trial values of the sd, seeing how
    many proposals are accepted, then widening or
    narrowing the distribution.

17
Metropolis sampling (1)
  • a variable in the posterior we are interested in.
  • its value in cycle t of a Gibbs sampler.
  • the full conditional for z, which includes data
    and most recent draws for all other variables.
  • the proposal distribution, which we note may
    depend on zt-- for example, N(zt,1).
  • a draw from the proposal distribution. Then

y
18
Metropolis sampling (2)
  • Metropolis algorithm holds when the proposal
    distribution is symmetric
  • (E.g., this is the case when the proposal
    distribution is the normal with a specified
    distribution and mean given by the previous
    value.)
  • Then accept y as zt1 with probability

19
Metropolis sampling (3)
  • Proposal distribution Normal with mean at
    previous cycles value

20
Metropolis sampling (4)
y
  • Accept this y as zt1 with probability 1

21
Metropolis sampling (5)
y
  • Accept this y as zt1 with probability .75

22
Metropolis-Hastings sampling
  • Extension of Metropolis sampling, in which the
    proposal distribution need not be symmetricie,
  • Now accept y as zt1 with probability
  • Simplifies to M when symmetry holds.
Write a Comment
User Comments (0)
About PowerShow.com