Title: Markov Chain Monte Carlo MCMC Methods in Bayesian Estimation
1Markov Chain Monte Carlo (MCMC) Methods in
Bayesian Estimation
- Robert J. Mislevy
- University of Maryland
- March 3, 2003
2Topics
- A generic full Bayesian model for measurement
models - Basic idea of MCMC
- Properties of MCMC
- Metropolis sampling
- Metropolis-Hastings sampling
3A full Bayesian model A generic measurement
model
- Xij Response of Person i to Item j
- qi Parameter(s) of Person i
- bj Parameter(s) of Item j
- h Parameter(s) for distribution of qs
- t Parameter(s) for distribution of bs
- Note Exchangeability assumed here for qs and
for bs--i.e., modeling all with the same prior.
Later well incorporate additional info, about
people and/or items.
4A full Bayesian model The recursive expression
of the model
The measurement model Item response given
person item parameters Distributions for person
parameters Distributions for item
parameters Distribution for parameter(s) of
distributions for item parameters Distribution
for parameter(s) of distributions for person
parameters
5A full Bayesian model The usual MSBNx diagram
- Addresses just one person
- Includes all responses for that person
- Item parameters implicit, in the conditional
probabilities for item responses - q population distribution structure implicit, in
the prior distribution for this examinee
6A full Bayesian model A BUGS diagram
bj
pij
qi
t
h
Xij
Items j
Persons i
- Addresses all responses, all people, and all
items - Plates for people and items
- Item parameters explicit
- q population distribution structure explicit
7A full Bayesian model Bayes theorem
Observe particular data x for all people and
all items. Want to make inferences about qs and
bs, now conditional on x. By Bayes Theorem,
Normalizing constant is nasty
8A full Bayesian model Bayes theorem
- Two strategies for drawing inferences without
having to evaluate the normalizing constant - Modal estimation. E.g. BILOG. At any point in
the posterior for b, can calculate value of
likelihood and its derivative. Tells you if the
point is a maximum, and if not, what direction to
step that might get you to a higher value of the
posterior. - Simulation-based approximation. E.g., BUGS.
Devise a chain for sampling from full
conditionals (see next slide). After becoming
stationery, a draw for a given variable in a
given cycle has the same distribution as a draw
from that variables marginal posterior.
Approximate distributions summary statistics
from many such draws. -
9Markov Chain Monte Carlo EstimationThe special
case of Gibbs Sampling
- Draw values from full conditional
distributions - Start with a possible value for each variable in
cycle 0. - In cycle t1,
10Markov Chain Monte Carlo EstimationGeneralizatio
ns of Gibbs Sampling
- Dont need to go in the same order every cycle
- Dont need to hit every variable in every cycle
- Can sample in blocks of parameters (e.g., the
three item parameters of each item in the 3PL IRT
model) - Dont need to sample from the exact full
conditional--can do, for example, Metropolis or
Metropolis-Hastings approximations within cycles
11Properties of MCMC (1)
- Draws in cycle t1 depend on values in cycle t,
but given them, not on previous cycles--Markov
property of no memory. - Dependence on previous values introduces
autocorrelations across cycles. Depends on
problem structure, amount of data. - Under regularity conditions (e.g., can cover the
space, or get from any point to any other
point.), dependence on starting values is
forgotten after a sufficiently long run.
Hence, - burn in cycles, left out of summary
calculations. - Run multiple chains from different,
over-dispersed starting values, to see if they
look like theyre sampled from the same
stationery distribution. - Gelman-Rubin convergence diagnostics in BUGS
like ANOVAs
12Properties of MCMC (2)
- An example of a violation of regularity
conditions - a Heywood case in a factor analysis run. Needed
a prior on factor loadings that bounded them away
from 1 and -1.
13Properties of MCMC (3)
- Mixing means how much draws for a given
parameter can move around the space each cycle.
More autocorrelation goes along with poor
mixing. - Better mixing means the same number of cycles
provides more information about the posterior,
the ceiling being independent draws from the
posterior. Worse mixing means more cycles are
needed for (a) burn-in and (b) a given level of
precision for statistics for the posteriors.
relatively bad mixing
relatively good mixing
14Metropolis and Metropolis-Hastings sampling
within Gibbs (1)
- In straight Gibbs sampling, you draw from the
full conditional posterior for each parameter
each cycle. - Great when they are in a familiar form to sample
from, but sometimes they arent. - Metropolis and Metropolis-Hastings (MH) are
alternatives that can be used within Gibbs
sampling, when the full conditional can be
computed, but cant be sampled from directly.
15Metropolis and Metropolis-Hastings sampling
within Gibbs (2)
- Basic idea Draw from a different distribution
that you can compute AND you can sample from the
proposal distribution. Draws from the proposal
distribution are either accepted, or they are
rejected and the value of this variable in the
next cycle of the Gibbs sampler remains the same. - Almost any proposal distribution will work, as
long as it is defined over the right range.
16Metropolis and Metropolis-Hastings sampling
within Gibbs (3)
- Popular choice Normal distribution, with mean at
variables previous value and some sd--could be
determined empirically. - Best mixing when 30-40 of the proposals are
accepted. Worse mixing when too many or too few
are accepted. - What BUGS is doing when it says adapting is
trying M with trial values of the sd, seeing how
many proposals are accepted, then widening or
narrowing the distribution.
17Metropolis sampling (1)
- a variable in the posterior we are interested in.
- its value in cycle t of a Gibbs sampler.
- the full conditional for z, which includes data
and most recent draws for all other variables. - the proposal distribution, which we note may
depend on zt-- for example, N(zt,1). - a draw from the proposal distribution. Then
y
18Metropolis sampling (2)
- Metropolis algorithm holds when the proposal
distribution is symmetric - (E.g., this is the case when the proposal
distribution is the normal with a specified
distribution and mean given by the previous
value.) - Then accept y as zt1 with probability
19Metropolis sampling (3)
- Proposal distribution Normal with mean at
previous cycles value
20Metropolis sampling (4)
y
- Accept this y as zt1 with probability 1
21Metropolis sampling (5)
y
- Accept this y as zt1 with probability .75
22Metropolis-Hastings sampling
- Extension of Metropolis sampling, in which the
proposal distribution need not be symmetricie, - Now accept y as zt1 with probability
- Simplifies to M when symmetry holds.