Title: Data, Models and the Search for Exchangeability
1Data, Models and theSearch for Exchangeability
- Mark Hopkins,
- Department of Economics
- Math Department Colloquium
- Gettysburg College
- April 14, 2005
2Torture the data, and they will confess
- Theory
- Is data mining a dirty word?
- Statistics vs. econometrics and the role of the
ex ante theory - Information extraction amounts to a conditioning
problem - Conditioning bias vs. variance, or a search for
exchangeability? - Propagating model uncertainty into our
parameter estimates - Using new Bayesian statistical methods in
econometrics - What do economists have to learn from
statisticians? - Application
- Why do some countries become rich faster than
others?
3Preliminaries Recalling Bayes Rule
- Bayes Rule tells us how we can update our
beliefs (about event A) given some data
(knowledge that event B happened) - Example What is the probability that Saddam had
weapons of mass destruction (WMD), given that
none have been found (NF)? - The answer depends both on the strength of the
data p(NFWMD) and ones own (subjective) prior
beliefs about p(WMD) - The statistician's job is (should be) to help you
update your own personal beliefs all truth is
subjective in a Bayesian world
4Prior beliefs modify our view of the information
contained in data
5Statistical Inference A Review
- The goal observe the world (gather data, D) and
then draw conclusions and/or make predictions - This requires a theory (or model, M) to organize
relationships - Mathematics (Probability Theory)
- A statistical model is simply a probability
distribution, p(DM), where M ? ?,A consists of - A set of structural assumptions (A), and
- some vector (?) parameterizing the probability
distribution. This usually represents the
question of interest e.g. ?,?2 - Statistical inference
- Drawing conclusions refers to p(? D,A)
- Making predictions refers to p(Dnew ?,A)
6Estimating p(? D,A)Two Practical ( Related)
Problems
- 1 Inference about ? is conditional on model
assumptions - In practice, we dont know the true structural
assumptions (A) - What do we know? Bayes Rule p(M D) ?
p(DM)p(M) - Hypothesis testing can reject a model, but it can
neither confirm it nor tell you the correct
alternative! - Statistics vs. econometrics what role does the
prior p(M) play? - Traditional statistics recognizes uncertainty
about ? but not A. - Result run a specification search for A, but
pretend you didnt! - 2 What if data are not drawn from the same
distribution? - Inference about ? is based on averaging repeated
draws - A fundamental statistical issue We are each a
population of 1! - A methodological guide for ? conditional
exchangeability
7The Conditioning Problem A Familiar Example
- Data D X,Y we want to know the effect of X
on Y - We are interested in the regression (or C.E.F.)
EYX - Define the residual or error as ? ? Y EYX
- Familiar Linear Example model M is EYX ?0
?1X - so Y ?0 ?1X ?
- Estimation / inference
- Estimation find ?0, ?1 that minimize some loss
function L(? ) - Inference conditional on our information set ?,
? must be exchangeable
8The Benefits of Using the Bayesian Approach of
Exchangeability
- Classical (Frequentist) i.i.d.vs. Bayesian
exchangeability - A foundation for statistical inference on
population data - DeFinettis Representation Theorem states
- If a sample X1, X2,,Xn is a subset of an
infinite exchangeable sequence, X?, then it is
as if p(D ?,A) exists, where ? p(? ) - Clarifies the goal of conditioning / model search
process - We are trying to achieve anonymity of
regression residuals - Clarifies the relationship between model search
and prediction - What is the basis for using the past to make
predictions of the future? - when the past and future are part of an
exchangeable sequence!
9Example of a Conditioning ProblemThe Sources of
Economic Growth
- Why have some countries grown richer faster than
others do? - Data (D) growth rates (g) assorted country
characteristics (X) - Observations are countries (n ? 100)
- Ex ante theory The Solow Model of Capital
Accumulation - The Problem What about other variables that may
affect g ? - Omitted variable bias robustness problems
- D.o.F. problem Theories gt Observations
(plus multicollinearity!) - Specifying functional forms for variables like
democracy, ethnic diversity - Population heterogeneity Are France, Taiwan, and
Sudan really all draws from the same
distribution? Inference about ?2?
10Exchangeability in Cross-Country Growth
Regressions
- Inference requires conditional exchangeability
- France, Taiwan, and Sudan are not exchangeable,
but can we find appropriate vector X such that ?
? g EgX are exchangeable? - Conditioning just boils down to a problem of
model selection! - The classical approach to model selection is
hypothesis testing - However, D.o.F. problem has led to upward
specification search! - In summary
- Two types of uncertainty sampling (variance),
model (bias) - Model Selection usually involve an artful
trade-off of bias vs. variance - However, classical methods do not propagate our
model uncertainty into coefficient estimates - Can Bayesian statistics help us bring science to
the art of selection?
11The Growth Literature, Take 1OLS estimates w/
controls dummies
12The Growth Literature, Take 2Explaining
Parameter Heterogeneity
- Tree Regressions
- Local Linear Regressions (Spline models)
- Varying Coefficient / Hierarchical Models
13A Tree Regression
s60lt0.095
EQINVlt0.0144
laamlt0.5
s60lt0.03
-0.0072
0.0040
0.0159
NONEQINVlt0.1624
DEMOC65lt0.8435
FRAClt0.155
0.0213
0.0130
0.0068
EQINVlt0.04949
EQINVlt0.05405
lny60lt8.49696
0.0170
0.0330
0.0532
0.0390
0.0250
14An Additive Spline Model Investment
15An Additive Spline Model Schooling
16An Additive Spline ModelPopulation Growth
17Using splines to reveal non-linearities Solow
s(FRAC)
18Does democracy modify effects of investment and
schooling?
19A Varying Coefficient Model
20Specification Searches
- A specification search is a search for the mode
of P(M D) - Bayes Rule
- Problem 1 How strong is your prior belief about
M? - Problem 2 Can you characterize your prior
beliefs? - Problem 3 Using the same data to find M and to
estimate ? ? - Danger! Why?
- Problem 4 By conditioning model on M not p(M)
, you are understating uncertainty about
coefficient estimates!
21Bayesian Model Averaging (BMA)
- An alternative to trying to find the single best
model (i.e., the mode of p(M) is to consider
the entire distribution of specifications - Suppose you assign probability p(Ak) to K
specifications, then - Averaging over model space improves statistical
inference - Coefficient estimates tend to have better
predictive ability - Standard errors reflect model, as well as
parametric uncertainty
22Some nasty theoretical details
- Choosing the space of models and model priors
- Managing summation in BMA can be trickywith 12
possible covariates, there are 212 4,096
different models to combine! - Occams Window suggested by Rafferty (1994)
eliminate larger and/or less probable models - MC3 techniques transit across model space.
Compute p(?,A) from p(?A) and p(AD) - Computing the integral p(DA)
??p(D?,A)p(?A)d? - This is done directly in MC3 techniques for BMA,
otherwise - Can approximate using p(D ?MLE,A)
23Bayesian Model Selection Results
24Bayesian Model Averaging Results
25Conclusions
- Standard statistical inference is conditional on
the chosen model - A data-driven model search is usually an
unavoidable fact of life - Model must include appropriate vector of controls
(bias vs. variance) - Model should address parameter heterogeneity and
functional form - A methodological guide for conditioning is
exchangeability - Of course, the very fact that we are searching
for a model means we are really less certain
about our estimates that we are stating - BMA techniques help to propagate model
uncertainty into coefficient estimates and
standard errors