Title: Bayesian Hypothesis Testing and Bayes Factors
1Bayesian Hypothesis Testing and Bayes Factors
- Bayesian p-values
- Bayes Factors for model comparison
- Easy to implement alternatives for model
comparison
2Bayesian Hypothesis Testing
- Bayesian hypothesis testing is less formal than
non-Bayesian varieties. - In fact, Bayesian researchers typically summarize
the posterior distribution without applying a
rigid decision process. - Since social scientists dont actually make
important decisions based on their findings,
posterior summaries are more than adequate. - If one wanted to apply a formal process, Bayesian
decision theory is the way to go because it is
possible to get a probability distribution over
the parameter space and one can make expected
utility calculations based on the costs and
benefits of different outcomes. - Considerable energy has been given, however, in
trying to map Bayesian statistical models into
the null hypothesis hypothesis testing framework,
with mixed results at best.
3Similarities between Bayesian and Frequentist
Hypothesis Testing
- Maximum likelihood estimates of parameter means
and standard errors and Bayesian estimates with
flat priors are equivalent. - Asymptotically, the data will overwhelm the
choice of prior, so if we had infinite data sets,
priors would be irrelevant and Bayesian and
frequentist results would converge. - Frequentist one-tailed tests are basically
equivalent to what a Bayesian would get using
credible intervals.
4Differences between Frequentist and Bayesian
Hypothesis Testing
- The most important pragmatic difference between
Bayesian and frequentist hypothesis testing is
that Bayesian methods are poorly suited for
two-tailed tests. - Why? Because the probability of zero in a
continuous distribution is zero. - The best solution proposed so far is to calculate
the probability that, say, a regression
coefficient is in some range near zero. - e.g. two sided p-value Pr(-e lt B lt e)
- However, the choice of e seems very ad hoc unless
there is some decision theoretic basis. - The other important difference is more
philosophical. Frequentist p-values violate the
likelihood principle.
5Bayes FactorsNotes taken from Gill (2002)
- Bayes Factors are the dominant method of Bayesian
model testing. They are the Bayesian analogues
of likelihood ratio tests. - The basic intuition is that prior and posterior
information are combined in a ratio that provides
evidence in favor of one model specification
verses another. - Bayes Factors are very flexible, allowing
multiple hypotheses to be compared simultaneously
and nested models are not required in order to
make comparisons --? it goes without saying that
compared models should obviously have the same
dependent variable.
6The General Form for Bayes Factors
- Suppose that we observe data X and with to test
two competing modelsM1 and M2, relating these
data to two different sets of parameters, ?1 and
?2. - We would like to know which of the following
likelihood specifications is better - M1 f1(x ?1) and M2 f2(x ?1)
- Obviously, we would need prior distributions for
the ?1 and ?2 and prior probabilities for M1 and
M2 - The posterior odds ratio in favor of M1 over M2
is
Rearranging terms, we find that the Bayes Factor
is
If we have nested models and P(M1) P(M2) .5,
then the Bayes Factor reduces to the likelihood
ratio.
7Rule of Thumb
With this setup, if we interpret model 1 as the
null model, then If B(x) ?? 1 then model 1 is
supported If 1 gt B(x) ?? 10-1/2 then minimal
evidence against model 1. If 10-1/2 gt B(x) ??
10-1 then substantial evidence against model
1. If 10-1 gt B(x) ?? 10-2 then strong evidence
against model 1. If 10-2 gt B(x) ? then decisive
evidence against model 1.
8The Bad News
- Unfortunately, while Bayes Factors are rather
intuitive, as a practical matter they are often
quite difficult to calculate. - Some examples of determining the Bayes Factor in
WinBugs for a variable mean can be found in
Congdon (example 2.2) and more complex models in
Congdon Chapter 10. - You also may want to use Carlin and Chibs
technique for computing Bayes Factors for
competing non-nested regression models reported
in Journal of Royal Statistical Society. Series
B. vol 573 1995. - ? this technique is implemented in the Pines
example in BUGS, and is reported on the Winbugs
website under the new examples section. - Our discussion will focus on alternatives to the
Bayes Factor.
9Alternatives to the Bayes Factor for model
assessment
- Let ? denote your estimates of the parameter
means (or medians or modes) in your model and
suppose that the Bayes estimate is approximately
equal to the maximum likelihood estimate, then
the following stats used in frequentist
statistics will be useful diagnostics. - Good The Likelihood Ratio
- Ratio -2log-L(?Restricted Modely)
log-L(??Full Modely) - This statistic will always favor the unrestricted
model, but when the Bayes estimators or
equivalent to the maximum likelihood estimates,
then the Ratio is distributed as a ?2 where the
number of degrees of freedom is equal to the
number of test parameters.
10Alternatives to the Bayes Factor for model
assessment
- Let ? denote your estimates of the parameter
means (or medians or modes) in your model and
suppose that the Bayes estimate is approximately
equal to the maximum likelihood estimate, then
the following stats used in frequentist
statistics will be useful diagnostics. - Better Akaike Information Criterion (AIC)
- AIC -2log-L(?y) 2p
- (where p the number of parameters including the
intercept). - To compare two models, compare the AIC from model
1 against the AIC from model 2. Smaller numbers
are better. - ? Models do not need to be nested
- ? The AIC tends to be biased in favor of more
complicated models, because the log-likelihood
tends to increase faster than the number of
parameters.
11Alternatives to the Bayes Factor for model
assessment
- Better still Bayesian Information Criterion
(BIC) - BIC -2log-L(?y) 2plog(n)
- (where p the number of parameters and n is the
sample size). - ? This statistic can also be used for non-nested
models - ? BIC1 BICl2 ?? the -2 log(Bayes Factor12) for
model 1 vs. model 2.
12Alternatives to the Bayes Factor for model
assessment
- Best ??? Deviance Information Criterion (DIC)
- This is a new statistic introduced by the
developers of WinBugs (and can therefore be
reported in WinBugs!). - Spiegelhalter, et al. 2002. Bayesian measures
of model complexity and fit. Journal of the
Royal Statistical Society Series B. pp. 583-639. - It is not an approximation of the Bayes Factor!
- DIC Mean -2log-L(?ty)
Mean-2log-L(?ty) - 2log-L(?y)
This is the Deviance (DBar) it is the average of
the log-likelihoods calculated at the end of an
iteration of the Gibbs Sampler.
This (Dhat) is the log-likelihood calculated
using the posterior means of ?
The second expression (Dbar - Dhat pD) is the
penalty for over-parameterizing the model. To see
this, note that having a lot of insignificant
parameters with large variances will yield
iterations of the Gibbs Sampler with likelihoods
far from Dhat.
13ExamplePredictors of International Birth Rates,
2000
- Dependent Variable
- - Births per female
- Independent Variables
- - GDP Growth Rate
- - Female Illiteracy
- - Dependents / Working Age Population
- Probability Model
- Births N(b0 b1 GDP b2 Illiteracy b3
AgeDep b4, t) - bj N(0, .001) for all j and t G(.001,.001)
- The Null Model
- Births N(a , tnull)
- a N(0, .001) and tnull G(.001,.001)
14WinBugs Code
- model
- for (i in 1150)
- birthratei dnorm(mui, tau)
- mui lt- b1gdpratei b2femilliti
b3agedepi b4 -
- for (j in 14) bj dnorm(0, .0001)
- tau dgamma(.001,.001)
- code to create a null model
- for (i in 1150)
- nulli lt- birthratei
- nulli dnorm(mu2i, tau2)
- mu2i lt- a
-
- a dnorm(0,.0001)
- tau2 dgamma(.001,.001)
-
15Results
- node mean sd MC error
- a 3.119 0.133 0.002157
- tau2 0.369 0.043 5.977E-4
- gdprate 0.011 0.016 2.499E-4
- femillit 0.014 0.004 5.137E-5
- agedep 6.581 0.519 0.007473
- intercept -1.537 0.301 0.004301
- tau 1.943 0.226 0.003065
- Deviance 903.4 3.844 0.05402
- (Sum for Null model deviance and Full Model)
Null Model
Full Model
Dbar post.mean of -2logL Dhat -2LogL at
post.mean of stochastic nodes Dbar Dhat pD DIC f
ull 327.145 322.104 5.041 332.186 null 576.221 57
4.244 1.977 578.198 total 903.366 896.348 7.018 9
10.384
16Results
Dbar post.mean of -2logL Dhat -2LogL at
post.mean of stochastic nodes Dbar Dhat pD DIC f
ull 327.145 322.104 5.041 332.186 null 576.221 57
4.244 1.977 578.198 total 903.366 896.348 7.018 9
10.384
Let Dhat -2LogL(?) Then we can implement each
of the three diagnostic tests. Likelihood Ratio
-2log-L(?Nully) log-L(??Fully)
574.244 - 322.104 ? 252 ?23 ? reject null
model AICnull -2log-L(?nully) 2p 574.244
21 576 AICfull -2log-L(?fully) 2p
332.186 24 340 ? favors full BICnull
-2log-L(?nully) 2plog(n) 574.244
21log(150) 584 BICfull -2log-L(?fully)
2plog(n) 332.186 24log(150) 370 ? favors
full DICnull 578.198 DICfull 332.186
17Calculating MSE and R2 in WinBugs
- Mean Squared Error and the R2 are two very common
diagnostics for regression models. Calculating
these quantities in WinBugs is rather
straightforward if we monitor nodes programmed to
calculate these statistics just like the deviance
statistic is a monitored value of the likelihood.
- Recall that MSE ?i ( yi pred(yi) )2 / n
and R2 ?i ( pred(yi) mean(y) )2 / ?i ( yi
mean(y) )2 - and note that in WinBugs-Speak pred(yi) mui
- model
- for (i in 1150)
- birthratei dnorm(mui, tau)
- mui lt- b1gdpratei b2femilliti
b3agedepi b4 - numeratori lt- (mui-mean(birthrate))(mui-
mean(birthrate)) - denominatori lt- (birthratei-mean(birthrate)
)(netfti-mean(birthrate)) - sei lt- (mui-birthratei)(mui-birthratei
) - for (j in 14) bj dnorm(0, .0001)
- tau dgamma(.001,.001)
- R2 lt- sum(numerator)/sum(denominator)
- SSE lt- sum(se)
18A final diagnostic
- Researchers should always check residual plots in
a linear regression model to see if the errors
are approximately normal. - In WinBugs, if the likelihood function is
specified in the following way - yi dnorm(mui , tau)
- You may set the sample monitor to mu. This will
monitor the expected value of your dependent
variable given the regression coefficients.