Title: Bayesian Analysis of the Normal Distribution, Part II
1Bayesian Analysis of the Normal Distribution,
Part II
- - Set-up of the basic model of a normally
distributed random variable with unknown mean and
variance - (a two-parameter model).
- - Discuss philosophies of prior selection
- - Implementation of different priors with a
discussion of MCMC methods.
2The normal model with unknown mean and variance
- Lets extend the normal model to the case where
the variance parameter is assumed to be unknown.
Thus, yi N(? , ?2), where ? and ?2 are both
unknown random variables. - This is our first example of a multi-parameter
model. The Bayesian set-up should still look
familiar - p(? , ?2 y) ? p(? , ?2) p(y ? , ?2).
- Note we would like to make inferences about the
marginal distributions p(? y) and p(?2 y)
rather than the conditional distribution p(? , ?2
y). Ultimately, wed like to find - p(? y) ? p(? ?2 , y) p(?2 y) d?2
- What should we choose for the prior distribution
p(? , ?2)?
3Different types of Bayesians choose different
priors
- Classical Bayesians the prior is a necessary
evil. - ? choose priors that interject the least
information possible. - Modern Parametric Bayesians the prior is a
useful convenience. - ? choose prior distributions with desirable
properties (e.g. conjugacy). Given a
distributional choice, prior parameters are
chosen to interject the least information
possible. - Subjective Bayesians the prior is a summary of
old beliefs - ? choose prior distributions based on previous
knowledgeeither the results of earlier studies
or non-scientific opinion.
4The Classical Bayesian and the normal model with
unknown mean and variance
- y N(? , ?2) where ? and ?2 are both unknown
random variables. - What prior distribution would we choose to
represent the absence of any knowledge in this
instance? - What if we assumed that the two parameters were
independent, so p(? , ?2) p(?)p(?2)?
5Classical Bayesians and the normal model with
unknown mean and variance
- y N(? , ?2) where ? and ?2 are both unknown
random variables. What prior would a classical
Bayesian use? - If p(? , ?2) p(?)p(?2), one option would be to
assume uniform prior distributions for both
parameters. Thus, - p(?) ? c for -?? lt ? lt ?
- p(?2) ? 1/?2 for 0 lt ?2 lt ?
- And the joint density would be p(? , ?2) ? 1/?2
- Are these distributions proper?
6- yi N(? , ?2) for i ? 1,,n and p(? , ?2) ?
1/?2
It can be shown that the conditional posterior
distribution
Notice the parallel between these results and
what we would find in classical theory.
7yi N(? , ?2) for i ? 1,,n and p(? , ?2) ?
1/?2
From the previous slide, we saw that
What is the marginal distribution of ?2?
ICBST this is a scaled inverse-?2 density ?2
y Inv-?2 (n-1,s2) ? ? Inv-gamma( (n-1)/2),
(n-1)s2/2 )
8Two Different methods to sample from the
posterior in this case
- Method 1
- Sample directly from each of the two marginal
distributions. - Method 2
- Two stages
- 1) Sample a value from the distribution ?2y
- 2) Sample a value from the distribution ??2,y
- Repeat these stages lots of times.
9Example 2.1 from Congdon, but with the assumption
of known variance relaxed
- yi denotes the systolic blood pressure of
individuals i 1, 20 - Assume that yi N(?, ?2) and p(?, ?2) ? 1/?2
- From the previous results for the normal
distribution with the improper prior 1/?2, we
found that
10- Method 1--Directly sample from both marginal
posterior distributions - model
- for (i in 1N)
-
- ytempi lt- (yi - mean( y ) )(yi -
mean( y ) ) -
- GENERATE SAMPLE QUANTITIES
- sample mean
- ybar lt- mean(y)
- sample variance
- s2 lt- sum( ytemp ) / (N-1)
- GENERATE RANDOM DRAWS FROM THE MARGINAL
DISTRIBUTION OF SIGMA-SQ - If sigma2 inverse-gamma(alpha, beta) then
1/sigma2 gamma(alpha, beta) - sigma2 lt- 1/tau
- alpha lt- (N-1)/2
- beta lt- (N-1)s2 / 2
- this generates the draws from the distribution
of the posterior precision.
This is general code that can be used to model
(among other things) Congdon 2.1 for the case
with unknown mean and variance
11How do the analytic results and those from method
1 compare?
Analytic Results
WinBugs Results
12Plots of posterior densities from WinBugsMethod 1
13 Method 2--First draw from the marginal
distribution of the posterior precision, then
from the conditional distribution of the
posterior meanmodel for (i in 1N)
ytempi lt- (yi - mean( y ) )(yi - mean(
y ) ) GENERATE SAMPLE QUANTITIES sample
meanybar lt- mean(y) sample variances2 lt-
sum( ytemp ) / (N-1) GENERATE RANDOM DRAWS
FROM THE MARGINAL DISTRIBUTION OF SIGMA-SQ If
sigma2 inverse-gamma(alpha, beta) then 1/sigma2
gamma(alpha, beta)sigma2 lt- 1/taualpha lt-
(N-1)/2beta lt- (N-1)s2 / 2 this generates
the draws from the distribution of the posterior
precision. tau dgamma(alpha , beta)
GENERATE RANDOM DRAWS FROM THE MARGINAL
DISTRIBUTION OF MU specify the parameters of
the t-distributiondf lt- N-1
degrees of freedomsigma2prime lt- sigma2/N
variance of the sample meaninv_s2 lt-
1/sigma2prime precision of the sample
meantauprime lt- tauN this generates the
draws from the distribution of the posterior
mean.mu dnorm(ybar, tauprime)
This is general code that can be used to model
(among other things) Congdon 2.1 for the case
with unknown mean and variance
14How do the analytic results and those from method
2 compare?
Analytic Results
WinBugs Results
15Method Two is a simple example of Markov Chain
Monte Carlo Numerical Integration
- As it may be becoming apparent, the greatest
practical difficult in Bayesian inference is
computing the integrals. In fact, except for
simple cases like conjugate priors, researchers
lacked the ability to do Bayesian analysis. - This integration must be done to identify the
marginal posterior distribution, which is the
object of analysis. - ? Many Bayesians suggest that the inability to
compute these integrals is the only reason why we
use the hypothesis testing framework today. - Markov Chain Monte Carlo (MCMC)a computationally
intensive simulation method to replace exact
integrals developed in the 1980s made it possible
to tackle more complex, realistic problems.
16Basics of MCMC
- The goal of MCMC is to use simulation so that we
can take a large number of random draws from the
posterior distribution . The resulting sample can
be used to estimate the posterior mean, median,
Bayesian credible interval (i.e. HDR), etc. - In class, we will implement the Gibbs sampler,
the most common form of MCMC, with WinBugs. - ? Bugs Bayesian inference Using Gibbs Sampling
- We will generally duck the details of how the
program is making its calculations, but it really
isnt that complex is theory.
17The intuition behind the Gibbs Sampler
- Consider a problem with two parameters ?1 and ?2
and let X denote the data. - Suppose further that we know the conditional
distributions p(?1 ?2 , X) and p(?2 ?1 , x) - We need to find p(?1 X) and p(?2 X)
18- The Gibbs Sampler proceeds by choosing some
initial point which we will denote ?10, ?20 from
the parameter space. - ? This can be any reasonable value of ?
- Then, we take draws from the two conditional
distributions in the following sequence - ?11 p(?1 ?20 , Y)
- ?21 p(?2 ?11 , Y)
- ?12 p(?1 ?21 , Y)
- ?22 p(?2 ?12 , Y)
- ?13 p(?1 ?22 , Y)
- ?23 p(?2 ?13 , Y)
- etc.
This sequence of draws is a Markov Chain because
the values at step t only depend on the value at
step t-1. If allowed to run long enough, the
Gibbs sampler will converge to the true
posterior distribution. If allowed to run for
sufficiently long after convergence, the Gibbs
sampler produces a complete sample from the
distribution of ?.
19- This algorithm can be easily generalized to the
n-parameter case ?1, ?2, , ?n. - For the tth iteration of the Gibbs Sampler we
have - ?1t p(?1 ?2t-1, ?3t-1, ?kt-1, Y)
- ?2t p(?2 ?1t, ?3t-1, ?kt-1, Y)
-
- ?nt p(?2 ?1t, ?2t, ?n-1t-1, Y)
20The Gibbs Sampler is not Rocket Science even if
the original application was in physics
- We have already created a Gibbs Sampler when we
implemented Method 2 above. - In the code, we used WinBugs random number
generators rather than allow the program to
identify the conditional distribution which will
be the usual case. - We had to do this because WinBugs does not allow
researchers to use improper priors. - When we get to implementing the modern parametric
Bayesians priors, I will show how we can even
use Excel to do MCMC.
21Modern Parametric Bayesians and the normal model
with unknown mean and variance
- y N(? , ?2) where ? and ?2 are both unknown
random variables. - What prior distribution would a modern parametric
Bayesian choose to satisfy the demands of
convenience? - What if we used the definition of conditional
probability, so p(? , ?2) p(??2)p(?2)?
22Modern Parametric Bayesians and the normal model
with unknown mean and variance
- y N(? , ?2) where ? and ?2 are both unknown
random variables. - A modern parameteric Bayesian would typically
choose a conjugate prior. - For the normal model with unknown mean and
variance, the conjugate prior for the joint
distribution of ? and ?2 is the normal
inverse-gamma (?) distribution (i.e.
normal-inverse-?2) - p( ?, ?2 ) N-Inv-?2(?0, ?02/k0 v0,?02)
Four Parameters in the prior
23- Suppose p(?, ?2) N-Inv-?2(?0, ?02/k0 v0, ?02)
- ICBST the above expression can be factored such
that - p(?,?2) p(??2)p(?2)
- where ??2 N(?0, ?2/k0) and ?2
Inv-?2(v0,?02) - Because this is a conjugate distribution for the
normal distribution with unknown mean and
variance, the posterior distribution will also be
normal-Inv-?2.
24The posterior distribution if y N(?,?2) and
?,?2 N-Inv-?2(?0, ?02/k0 v0, ?02)
- p(?,?2 y) N-Inv-?2(?n, ?n2/kk vn, ?n2)
Weighted average of the prior mean and the data.
Weighted sum of the prior variance, the sample
variance and the distance between the sample and
prior means
25If p(?,?2 y) N-Inv-?2(?n, ?n2/kn vn, ?n2) we
can factor the posterior distribution just like
the prior
These are essentially the same posterior
distributions as we found with the improper
priors. To implement the conjugate priors in
WinBugs we would use the same code, but
substitute the values ?n, ?n2/kn vn, ?n2 into
the posterior.
26Excel Implementation of the Normal model with
conjugate priors for the mean and variance
- To implement a Gibbs Sampler in this case, we
need to perform the following steps - Everything subscripted with an n is a function of
stuff that is known
In Excel, for each iteration t ?2(t)
1/GAMMAINV(rand(),vn,1/?n2) ?(t)
NORMINV(rand(), ?n , ?2(t) / kn) where ?2(t)
spreadsheet cell correspond to the tth draw
27Lazy Modern Parametric Bayesians and the normal
model with unknown mean and variance
- Suppose that y N(?, ?) where ? was the prior
precision. - From here on when we talk about the normal
distribution you should expect that we will speak
in terms of the precision ? rather than the
variance ?2. This is because WinBugs is
programmed to use ? rather than ?2 - Suppose also that you dont want to think too
hard about the prior joint distribution of ? and
?, and assume that - p(?, ?) p(?)p(?)
- What distributions would you choose for p(?) and
p(?)?
28Suppose that y N(?, ?) What priors would you
choose for ? and ??
- I would choose
- ? N( 0 , t ) (where t was a large number)
- This is because, if we expect something like the
central limit theorem to hold, then the
distribution of the sample mean should be
approximately normal for large n. - ? ?( a , b ) (where a, b are small numbers)
- This is because this distribution is bounded
below at zero and unlike the ?2 distribution
which shares this property it is not constrained
to have an equal mean and variance. - ? Note how we now have to talk about the mean of
the distribution of the variance.
Gamma
29- model
- for (i in 1N)
- yi dnorm( mu, tau)
- mu dnorm(0, .001)
- tau dgamma(.01 , .001)
-
- list(N20,yc(98,160,136,128,130,114,123,
- 134,128,107,123,125,129,132,154,115,126,132,136,13
0))
30WinBugs results for the model where y N(?,
?)and ? N(0, .001), ? ?(.01 , .001)
Notice that the mean of the posterior mean is
less than 128 despite the fact that we are using
what appear to be diffuse prior beliefs.
31The Subjective Bayesian and the normal model with
unknown mean and variance
- The subjective Bayesian framework provides little
guidance about what prior distribution that one
should choose. - In a sense, that is the point of the subjective
approachit is subjective. - You are free to pick whatever prior distribution
you want multi-modal, uniform, high or low
variance, skewed, constrained to lie between a
certain set of values, etc. - One of the key difficulties is that the prior
distributions probably are not independent (i.e.
p(?1, ?2)?p(?1)p(?2)). - For example, regression coefficients are
generally not independent, even if that isnt
transparent in your STATA output. If you want to
incorporate subjective beliefs, this
non-independence should be taken into account.
32Some General Guidelines
- I recommend the following general guidelines
- 1) if possible, use standard distributions (i.e.
conjugate or semi-conjugate) and choose
parameters that fix the mean, variance, kurtosis,
etc. to be some desirable level. -
- 2) sample from the prior predictive distribution
and check to see if your results make sense. - - Mechanically, perform the following steps
- i) take a random draw ? from the joint prior
distribution of ?. - ii) take a random draw Y from the pdf Y? with ?
? - iii) repeat steps i and ii several thousand times
to provide a sample that you can use to summarize
the prior predictive distribution. - iv) generate various summaries of the prior
predictive distribution and check to see if the
models predictions are consistent with your
beliefs about the data-generating process.