Bayesian analysis of a oneparameter model - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Bayesian analysis of a oneparameter model

Description:

So, we use summary statistics (e.g. mean, var, HDR). 2 Methods for generating summary stats: ... Prior Var. Prior Mean. Summary Table. Summary Statistics of ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 29
Provided by: jeffgry
Category:

less

Transcript and Presenter's Notes

Title: Bayesian analysis of a oneparameter model


1
Bayesian analysis of a one-parameter model
  • I. The binomial distributionuniform prior
  • Integration tricks
  • II. Posterior Interpretation
  • III. Binomial distributionbeta prior
  • Conjugate priors and sufficient statistics

2
Review of the Bayesian Setup
  • From the Bayesian perspective, there are known
    and unknown quantities.
  • - The known quantity is the data, denoted D.
  • - The unknown quantities are the parameters
    (e.g. mean, variance, missing data), denoted ?.
  • To make inferences about the unknown quantities,
    we stipulate a joint probability function that
    describes how we believe these quantities behave
    in conjunction, p(?,D).
  • Using Bayes Rule, this joint probability
    function can be rearranged to make inference
    about ?
  • p( ? D ) p( ? ) p( D ? ) / p( D )

3
Review of the Bayesian Set-Up cont.
  • L( ? D ) is the likelihood function for ?
  • ??p(?)p(D ?)d? is the normalizing constant or
    the prior predictive distribution.
  • It is the normalizing constant because it ensures
    that the posterior distribution of ? integrates
    to one.
  • It is the prior predictive distribution because
    it is not conditional on a previous observation
    of the data-generating process (prior) and
    because it is the distribution of an observable
    quantity (predictive).

4
Review of the Bayesian Set-Up cont.
Why are we allowed to do this? Why might it not
be as useful?
5
Example The Binomial Distribution
  • Suppose X1, X2, , Xn are independent random
    draws from the same Bernoulli distribution with
    parameter ?.
  • Thus, Xi Bernoulli( ? ) for i ? 1, ... , n
  • or equivalently, Y ? Xi Binomial(?? , n)
  • The joint distribution of Y and ? is the product
    of the conditional distribution of Y and the
    prior distribution ?.
  • What distribution might be a reasonable choice
    for the prior distribution of ?? Why?

6
Binomial Distribution cont.
  • If Y Bin(?, n), a reasonable prior distribution
    for p must be bounded between zero and one.
  • One option is the uniform dist. ? Unif( 0, 1 ).

As it happens, this is a proper posterior density
function. How can you tell?
7
Binomial Distribution cont.
  • Let Y Bin(?, n) and ? Unif( 0, 1 ).

You cannot just call the posterior a binomial
distribution because you are conditioning on Y
and ? is a random variable, not the other way
around.
This is the normalization constant to transform
?y(1-?)n-y into a beta distribution.
8
Application-The Cultural Consensus Model
  • A researcher examined the level of consensus
    denoted ? among n 24 Guatemalan women about
    whether or not polio (as well as other diseases)
    was thought to be contagious. In this case, 17
    women said polio was contagious.
  • Let Xi 1 if respondent i thought polio was
    contagious and Xi 0 otherwise.
  • Let ?i Xi Y Bin(?,24) and let ? Unif(0,1)
  • Based on the previous slide
  • p(?Y,n) Beta(Y1, n-Y1).
  • Substitute n 24 and Y 17 into the posterior
    distribution.
  • Thus, p(?Y,n) Beta(18,8)

9
The Posterior Distribution
  • The posterior distribution summarizes all that we
    know after analyzing the data
  • How do we interpret the posterior distribution
  • p(?Y,n) Beta(18,8)
  • One option is graphically

10
Posterior Summaries
  • The full posterior contains too much information,
    especially in multi-parameter models. So, we use
    summary statistics (e.g. mean, var, HDR).
  • 2 Methods for generating summary stats
  • 1) Analytical Solutions use the well-known
    analytic solutions for the mean, variance, etc.
    of the various posterior distribution.
  • 2) Numerical Solutions use a random number
    generator to draw a large number of values from
    the posterior distribution, then compute summary
    stats from those random draws.

11
Analytic Summaries of the Posterior
  • Analytic summaries are based on standard results
    from probability theory (see the handout from
    Gills Text).
  • Continuing our example, p(?Y,n) Beta(18,8)

12
Numerical Summaries of the Posterior
  • To create numerical summaries from the posterior,
    you need a random number generator.
  • To summarize p(?Y,n) Beta(18,8)
  • Draw a large number of random samples from a
    Beta(18,8) distribution
  • Calculate the sample statistics from that set of
    random samples.

13
Numerical Summaries of the Posterior
  • S-Plus code (should work in R) for Beta(18,8)
    summary
  • true posterior plot (see before)
  • xlt-01000/1000
  • postlt-dbeta(x,18,8)
  • plot(x,post)
  • take 1000 draws from the posterior
  • rands lt- rbeta(1000,18,8)
  • create summaries of those draws)
  • mean(rands)
  • median(rands)
  • var(rands)
  • hist(rands,20)

Mean(?).70 Median(?).70 Var(?).01
14
Highest Posterior Density Regions (also known
as Bayesian confidence or credible intervals)
  • Highest Density Regions (HDRs) are intervals
    containing a specified posterior probability. The
    figure below plots the 95 highest posterior
    density region.

Beta(18,8)
95 HDR .51,.84
15
Identification of the HDR
  • It is easiest to find the Highest Density Region
    numerically.
  • In S-Plus, to find the 95 HDR
  • take 1000 draws from the posterior
  • rands lt- rbeta(1000,18,8)
  • sort the random from highest to lowest, then
    identify the thresholds for the 95 credible
    interval.
  • Quantile(rands,c(.025,.975))

16
An alternative HDR
  • With asymmetric posterior distributions, it makes
    more sense to identify regions of equal heights,
    rather than of equal mass
  • (I havent figured out a cute way to do this
    numerically).

Beta(18,8)
HDR
17
Confidence Intervals vs. Bayesian Credible
Intervals
  • Differing interpretations
  • The Bayesian credible interval is the probability
    given the data that a true value of ? lies in the
    interval.
  • Technically, P(??Interval)X)?Intervalp( ? X
    )d?
  • The frequentist ?-percent confidence interval is
    the region of the sampling distribution for ?
    such that given the observed data one would
    expect (100-?) percent of the future estimates of
    ? to be outside that interval.
  • Technically, ? 1-?a to b g( u ? )du

U is a dummy variable of integration for the
estimated value of ?
These limits are functions of the data
18
Confidence Intervals vs. Bayesian Credible
Intervals
  • But often the results appear similar
  • If Bayesians use non-informative priors and
    there is a large number of observations, often
    several dozen will do, HDRs and frequentist
    confidence intervals will coincide numerically.
  • We will talk more about this when we cover the
    great p-value debate, but this is only a
    coincidence.
  • The interpretation of the two quantities is
    entirely different.

19
Returning to the Binomial Distribution
  • If Y Bin(n,?), the uniform prior is just one of
    an infinite number of possible prior
    distributions.
  • What other distributions could we use?
  • A reasonable alternative to the unif(0,1)
    distribution is the beta distribution.

Can you show that Beta(1,1) is a uniform(0,1)
distribution?
20
Prior ConsequencesPlots of 4 Different Beta
Distributions
Beta(5,5)
Beta(3,10)
Beta(10,3)
Beta(100,30)
21
The Binomial Distribution with Beta Prior
  • If Y Bin(n,?) and ? Beta(?,?), then

This is a very nasty looking integral. Rather
than computing it directly, we shall use a
standard trick in the Bayesian toolbox. 1)
Find some multiplicative constant c such that
f(y)c 1. ? i.e. try to transform f(y) into a
well-known pdf. 2) Multiply by c and c-1 3)
Since cf(y)1, the original numerator multiplied
by c-1 is the posterior distribution.
22
The posterior predictive distribution
This is the kernel of the beta distribution
This is called a beta-binomial distribution
23
The posterior of the binomial model with beta
priors
This is a Beta(Y?, n-Y?) distribution.
Beautifully, it worked out that the posterior
distribution is a form of the prior distribution
updated by the new data. In general, when this
occurs we say the prior is conjugate.
24
Continuing the earlier example, if 17 of 24 women
say polio is contagious (so Y17 and n 24,
where Y is a binomial) and you use a Beta(5,5)
prior, the posterior distribution is
Beta(175,24-175) Beta(22,12)
Posterior Mean .65 Posterior Variance .01
Posterior
Prior
25
What is the mle for this likelihood?
  • Have the students derive the maximum likelihood
    estimate to serve as a basis of comparison.

26
Prior ConsequencesPlots of 4 Different Beta
Distributions
Beta(5,5)
Beta(3,10)
Beta(10,3)
Beta(100,30)
27
Comparison of four different posterior
distributions (in red) for the four different
priors (black)
Prior Beta(5,5) Post Beta(22,12)
Prior Beta(10,3) Post Beta(27,10)
Prior Beta(3,10) Post Beta(20,17)
Prior Beta(100,30)Post Beta(117,37)
28
Summary Statistics of the Findings for different
priors
Write a Comment
User Comments (0)
About PowerShow.com