Bayesian Analysis of the Normal Distribution, Part II - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Bayesian Analysis of the Normal Distribution, Part II

Description:

Method 2--First draw from the marginal distribution of the posterior precision, ... Method Two is a simple example of Markov Chain Monte Carlo Numerical Integration ... – PowerPoint PPT presentation

Number of Views:381
Avg rating:3.0/5.0
Slides: 33
Provided by: jeffgry
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Analysis of the Normal Distribution, Part II


1
Bayesian Analysis of the Normal Distribution,
Part II
  • - Set-up of the basic model of a normally
    distributed random variable with unknown mean and
    variance
  • (a two-parameter model).
  • - Discuss philosophies of prior selection
  • - Implementation of different priors with a
    discussion of MCMC methods.

2
The normal model with unknown mean and variance
  • Lets extend the normal model to the case where
    the variance parameter is assumed to be unknown.
    Thus, yi N(? , ?2), where ? and ?2 are both
    unknown random variables.
  • This is our first example of a multi-parameter
    model. The Bayesian set-up should still look
    familiar
  • p(? , ?2 y) ? p(? , ?2) p(y ? , ?2).
  • Note we would like to make inferences about the
    marginal distributions p(? y) and p(?2 y)
    rather than the conditional distribution p(? , ?2
    y). Ultimately, wed like to find
  • p(? y) ? p(? ?2 , y) p(?2 y) d?2
  • What should we choose for the prior distribution
    p(? , ?2)?

3
Different types of Bayesians choose different
priors
  • Classical Bayesians the prior is a necessary
    evil.
  • ? choose priors that interject the least
    information possible.
  • Modern Parametric Bayesians the prior is a
    useful convenience.
  • ? choose prior distributions with desirable
    properties (e.g. conjugacy). Given a
    distributional choice, prior parameters are
    chosen to interject the least information
    possible.
  • Subjective Bayesians the prior is a summary of
    old beliefs
  • ? choose prior distributions based on previous
    knowledgeeither the results of earlier studies
    or non-scientific opinion.

4
The Classical Bayesian and the normal model with
unknown mean and variance
  • y N(? , ?2) where ? and ?2 are both unknown
    random variables.
  • What prior distribution would we choose to
    represent the absence of any knowledge in this
    instance?
  • What if we assumed that the two parameters were
    independent, so p(? , ?2) p(?)p(?2)?

5
Classical Bayesians and the normal model with
unknown mean and variance
  • y N(? , ?2) where ? and ?2 are both unknown
    random variables. What prior would a classical
    Bayesian use?
  • If p(? , ?2) p(?)p(?2), one option would be to
    assume uniform prior distributions for both
    parameters. Thus,
  • p(?) ? c for -?? lt ? lt ?
  • p(?2) ? 1/?2 for 0 lt ?2 lt ?
  • And the joint density would be p(? , ?2) ? 1/?2
  • Are these distributions proper?

6
  • yi N(? , ?2) for i ? 1,,n and p(? , ?2) ?
    1/?2

It can be shown that the conditional posterior
distribution
Notice the parallel between these results and
what we would find in classical theory.
7
yi N(? , ?2) for i ? 1,,n and p(? , ?2) ?
1/?2
From the previous slide, we saw that
What is the marginal distribution of ?2?
ICBST this is a scaled inverse-?2 density ?2
y Inv-?2 (n-1,s2) ? ? Inv-gamma( (n-1)/2),
(n-1)s2/2 )
8
Two Different methods to sample from the
posterior in this case
  • Method 1
  • Sample directly from each of the two marginal
    distributions.
  • Method 2
  • Two stages
  • 1) Sample a value from the distribution ?2y
  • 2) Sample a value from the distribution ??2,y
  • Repeat these stages lots of times.

9
Example 2.1 from Congdon, but with the assumption
of known variance relaxed
  • yi denotes the systolic blood pressure of
    individuals i 1, 20
  • Assume that yi N(?, ?2) and p(?, ?2) ? 1/?2
  • From the previous results for the normal
    distribution with the improper prior 1/?2, we
    found that

10
  • Method 1--Directly sample from both marginal
    posterior distributions
  • model
  • for (i in 1N)
  • ytempi lt- (yi - mean( y ) )(yi -
    mean( y ) )
  • GENERATE SAMPLE QUANTITIES
  • sample mean
  • ybar lt- mean(y)
  • sample variance
  • s2 lt- sum( ytemp ) / (N-1)
  • GENERATE RANDOM DRAWS FROM THE MARGINAL
    DISTRIBUTION OF SIGMA-SQ
  • If sigma2 inverse-gamma(alpha, beta) then
    1/sigma2 gamma(alpha, beta)
  • sigma2 lt- 1/tau
  • alpha lt- (N-1)/2
  • beta lt- (N-1)s2 / 2
  • this generates the draws from the distribution
    of the posterior precision.

This is general code that can be used to model
(among other things) Congdon 2.1 for the case
with unknown mean and variance
11
How do the analytic results and those from method
1 compare?
Analytic Results
WinBugs Results
12
Plots of posterior densities from WinBugsMethod 1
13
Method 2--First draw from the marginal
distribution of the posterior precision, then
from the conditional distribution of the
posterior meanmodel for (i in 1N)
ytempi lt- (yi - mean( y ) )(yi - mean(
y ) ) GENERATE SAMPLE QUANTITIES sample
meanybar lt- mean(y) sample variances2 lt-
sum( ytemp ) / (N-1) GENERATE RANDOM DRAWS
FROM THE MARGINAL DISTRIBUTION OF SIGMA-SQ If
sigma2 inverse-gamma(alpha, beta) then 1/sigma2
gamma(alpha, beta)sigma2 lt- 1/taualpha lt-
(N-1)/2beta lt- (N-1)s2 / 2 this generates
the draws from the distribution of the posterior
precision. tau dgamma(alpha , beta)
GENERATE RANDOM DRAWS FROM THE MARGINAL
DISTRIBUTION OF MU specify the parameters of
the t-distributiondf lt- N-1
degrees of freedomsigma2prime lt- sigma2/N
variance of the sample meaninv_s2 lt-
1/sigma2prime precision of the sample
meantauprime lt- tauN this generates the
draws from the distribution of the posterior
mean.mu dnorm(ybar, tauprime)
This is general code that can be used to model
(among other things) Congdon 2.1 for the case
with unknown mean and variance
14
How do the analytic results and those from method
2 compare?
Analytic Results
WinBugs Results
15
Method Two is a simple example of Markov Chain
Monte Carlo Numerical Integration
  • As it may be becoming apparent, the greatest
    practical difficult in Bayesian inference is
    computing the integrals. In fact, except for
    simple cases like conjugate priors, researchers
    lacked the ability to do Bayesian analysis.
  • This integration must be done to identify the
    marginal posterior distribution, which is the
    object of analysis.
  • ? Many Bayesians suggest that the inability to
    compute these integrals is the only reason why we
    use the hypothesis testing framework today.
  • Markov Chain Monte Carlo (MCMC)a computationally
    intensive simulation method to replace exact
    integrals developed in the 1980s made it possible
    to tackle more complex, realistic problems.

16
Basics of MCMC
  • The goal of MCMC is to use simulation so that we
    can take a large number of random draws from the
    posterior distribution . The resulting sample can
    be used to estimate the posterior mean, median,
    Bayesian credible interval (i.e. HDR), etc.
  • In class, we will implement the Gibbs sampler,
    the most common form of MCMC, with WinBugs.
  • ? Bugs Bayesian inference Using Gibbs Sampling
  • We will generally duck the details of how the
    program is making its calculations, but it really
    isnt that complex is theory.

17
The intuition behind the Gibbs Sampler
  • Consider a problem with two parameters ?1 and ?2
    and let X denote the data.
  • Suppose further that we know the conditional
    distributions p(?1 ?2 , X) and p(?2 ?1 , x)
  • We need to find p(?1 X) and p(?2 X)

18
  • The Gibbs Sampler proceeds by choosing some
    initial point which we will denote ?10, ?20 from
    the parameter space.
  • ? This can be any reasonable value of ?
  • Then, we take draws from the two conditional
    distributions in the following sequence
  • ?11 p(?1 ?20 , Y)
  • ?21 p(?2 ?11 , Y)
  • ?12 p(?1 ?21 , Y)
  • ?22 p(?2 ?12 , Y)
  • ?13 p(?1 ?22 , Y)
  • ?23 p(?2 ?13 , Y)
  • etc.

This sequence of draws is a Markov Chain because
the values at step t only depend on the value at
step t-1. If allowed to run long enough, the
Gibbs sampler will converge to the true
posterior distribution. If allowed to run for
sufficiently long after convergence, the Gibbs
sampler produces a complete sample from the
distribution of ?.
19
  • This algorithm can be easily generalized to the
    n-parameter case ?1, ?2, , ?n.
  • For the tth iteration of the Gibbs Sampler we
    have
  • ?1t p(?1 ?2t-1, ?3t-1, ?kt-1, Y)
  • ?2t p(?2 ?1t, ?3t-1, ?kt-1, Y)
  • ?nt p(?2 ?1t, ?2t, ?n-1t-1, Y)

20
The Gibbs Sampler is not Rocket Science even if
the original application was in physics
  • We have already created a Gibbs Sampler when we
    implemented Method 2 above.
  • In the code, we used WinBugs random number
    generators rather than allow the program to
    identify the conditional distribution which will
    be the usual case.
  • We had to do this because WinBugs does not allow
    researchers to use improper priors.
  • When we get to implementing the modern parametric
    Bayesians priors, I will show how we can even
    use Excel to do MCMC.

21
Modern Parametric Bayesians and the normal model
with unknown mean and variance
  • y N(? , ?2) where ? and ?2 are both unknown
    random variables.
  • What prior distribution would a modern parametric
    Bayesian choose to satisfy the demands of
    convenience?
  • What if we used the definition of conditional
    probability, so p(? , ?2) p(??2)p(?2)?

22
Modern Parametric Bayesians and the normal model
with unknown mean and variance
  • y N(? , ?2) where ? and ?2 are both unknown
    random variables.
  • A modern parameteric Bayesian would typically
    choose a conjugate prior.
  • For the normal model with unknown mean and
    variance, the conjugate prior for the joint
    distribution of ? and ?2 is the normal
    inverse-gamma (?) distribution (i.e.
    normal-inverse-?2)
  • p( ?, ?2 ) N-Inv-?2(?0, ?02/k0 v0,?02)

Four Parameters in the prior
23
  • Suppose p(?, ?2) N-Inv-?2(?0, ?02/k0 v0, ?02)
  • ICBST the above expression can be factored such
    that
  • p(?,?2) p(??2)p(?2)
  • where ??2 N(?0, ?2/k0) and ?2
    Inv-?2(v0,?02)
  • Because this is a conjugate distribution for the
    normal distribution with unknown mean and
    variance, the posterior distribution will also be
    normal-Inv-?2.

24
The posterior distribution if y N(?,?2) and
?,?2 N-Inv-?2(?0, ?02/k0 v0, ?02)
  • p(?,?2 y) N-Inv-?2(?n, ?n2/kk vn, ?n2)

Weighted average of the prior mean and the data.
Weighted sum of the prior variance, the sample
variance and the distance between the sample and
prior means
25
If p(?,?2 y) N-Inv-?2(?n, ?n2/kn vn, ?n2) we
can factor the posterior distribution just like
the prior
These are essentially the same posterior
distributions as we found with the improper
priors. To implement the conjugate priors in
WinBugs we would use the same code, but
substitute the values ?n, ?n2/kn vn, ?n2 into
the posterior.
26
Excel Implementation of the Normal model with
conjugate priors for the mean and variance
  • To implement a Gibbs Sampler in this case, we
    need to perform the following steps
  • Everything subscripted with an n is a function of
    stuff that is known

In Excel, for each iteration t ?2(t)
1/GAMMAINV(rand(),vn,1/?n2) ?(t)
NORMINV(rand(), ?n , ?2(t) / kn) where ?2(t)
spreadsheet cell correspond to the tth draw
27
Lazy Modern Parametric Bayesians and the normal
model with unknown mean and variance
  • Suppose that y N(?, ?) where ? was the prior
    precision.
  • From here on when we talk about the normal
    distribution you should expect that we will speak
    in terms of the precision ? rather than the
    variance ?2. This is because WinBugs is
    programmed to use ? rather than ?2
  • Suppose also that you dont want to think too
    hard about the prior joint distribution of ? and
    ?, and assume that
  • p(?, ?) p(?)p(?)
  • What distributions would you choose for p(?) and
    p(?)?

28
Suppose that y N(?, ?) What priors would you
choose for ? and ??
  • I would choose
  • ? N( 0 , t ) (where t was a large number)
  • This is because, if we expect something like the
    central limit theorem to hold, then the
    distribution of the sample mean should be
    approximately normal for large n.
  • ? ?( a , b ) (where a, b are small numbers)
  • This is because this distribution is bounded
    below at zero and unlike the ?2 distribution
    which shares this property it is not constrained
    to have an equal mean and variance.
  • ? Note how we now have to talk about the mean of
    the distribution of the variance.

Gamma
29
  • model
  • for (i in 1N)
  • yi dnorm( mu, tau)
  • mu dnorm(0, .001)
  • tau dgamma(.01 , .001)
  • list(N20,yc(98,160,136,128,130,114,123,
  • 134,128,107,123,125,129,132,154,115,126,132,136,13
    0))

30
WinBugs results for the model where y N(?,
?)and ? N(0, .001), ? ?(.01 , .001)
Notice that the mean of the posterior mean is
less than 128 despite the fact that we are using
what appear to be diffuse prior beliefs.
31
The Subjective Bayesian and the normal model with
unknown mean and variance
  • The subjective Bayesian framework provides little
    guidance about what prior distribution that one
    should choose.
  • In a sense, that is the point of the subjective
    approachit is subjective.
  • You are free to pick whatever prior distribution
    you want multi-modal, uniform, high or low
    variance, skewed, constrained to lie between a
    certain set of values, etc.
  • One of the key difficulties is that the prior
    distributions probably are not independent (i.e.
    p(?1, ?2)?p(?1)p(?2)).
  • For example, regression coefficients are
    generally not independent, even if that isnt
    transparent in your STATA output. If you want to
    incorporate subjective beliefs, this
    non-independence should be taken into account.

32
Some General Guidelines
  • I recommend the following general guidelines
  • 1) if possible, use standard distributions (i.e.
    conjugate or semi-conjugate) and choose
    parameters that fix the mean, variance, kurtosis,
    etc. to be some desirable level.
  • 2) sample from the prior predictive distribution
    and check to see if your results make sense.
  • - Mechanically, perform the following steps
  • i) take a random draw ? from the joint prior
    distribution of ?.
  • ii) take a random draw Y from the pdf Y? with ?
    ?
  • iii) repeat steps i and ii several thousand times
    to provide a sample that you can use to summarize
    the prior predictive distribution.
  • iv) generate various summaries of the prior
    predictive distribution and check to see if the
    models predictions are consistent with your
    beliefs about the data-generating process.
Write a Comment
User Comments (0)
About PowerShow.com