Data, Models and the Search for Exchangeability - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Data, Models and the Search for Exchangeability

Description:

some vector ( ) parameterizing the probability distribution. ... Can Bayesian statistics help us bring science to the art of selection? ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 26
Provided by: informatio131
Category:

less

Transcript and Presenter's Notes

Title: Data, Models and the Search for Exchangeability


1
Data, Models and theSearch for Exchangeability
  • Mark Hopkins,
  • Department of Economics
  • Math Department Colloquium
  • Gettysburg College
  • April 14, 2005

2
Torture the data, and they will confess
  • Theory
  • Is data mining a dirty word?
  • Statistics vs. econometrics and the role of the
    ex ante theory
  • Information extraction amounts to a conditioning
    problem
  • Conditioning bias vs. variance, or a search for
    exchangeability?
  • Propagating model uncertainty into our
    parameter estimates
  • Using new Bayesian statistical methods in
    econometrics
  • What do economists have to learn from
    statisticians?
  • Application
  • Why do some countries become rich faster than
    others?

3
Preliminaries Recalling Bayes Rule
  • Bayes Rule tells us how we can update our
    beliefs (about event A) given some data
    (knowledge that event B happened)
  • Example What is the probability that Saddam had
    weapons of mass destruction (WMD), given that
    none have been found (NF)?
  • The answer depends both on the strength of the
    data p(NFWMD) and ones own (subjective) prior
    beliefs about p(WMD)
  • The statistician's job is (should be) to help you
    update your own personal beliefs all truth is
    subjective in a Bayesian world

4
Prior beliefs modify our view of the information
contained in data
5
Statistical Inference A Review
  • The goal observe the world (gather data, D) and
    then draw conclusions and/or make predictions
  • This requires a theory (or model, M) to organize
    relationships
  • Mathematics (Probability Theory)
  • A statistical model is simply a probability
    distribution, p(DM), where M ? ?,A consists of
  • A set of structural assumptions (A), and
  • some vector (?) parameterizing the probability
    distribution. This usually represents the
    question of interest e.g. ?,?2
  • Statistical inference
  • Drawing conclusions refers to p(? D,A)
  • Making predictions refers to p(Dnew ?,A)

6
Estimating p(? D,A)Two Practical ( Related)
Problems
  • 1 Inference about ? is conditional on model
    assumptions
  • In practice, we dont know the true structural
    assumptions (A)
  • What do we know? Bayes Rule p(M D) ?
    p(DM)p(M)
  • Hypothesis testing can reject a model, but it can
    neither confirm it nor tell you the correct
    alternative!
  • Statistics vs. econometrics what role does the
    prior p(M) play?
  • Traditional statistics recognizes uncertainty
    about ? but not A.
  • Result run a specification search for A, but
    pretend you didnt!
  • 2 What if data are not drawn from the same
    distribution?
  • Inference about ? is based on averaging repeated
    draws
  • A fundamental statistical issue We are each a
    population of 1!
  • A methodological guide for ? conditional
    exchangeability

7
The Conditioning Problem A Familiar Example
  • Data D X,Y we want to know the effect of X
    on Y
  • We are interested in the regression (or C.E.F.)
    EYX
  • Define the residual or error as ? ? Y EYX
  • Familiar Linear Example model M is EYX ?0
    ?1X
  • so Y ?0 ?1X ?
  • Estimation / inference
  • Estimation find ?0, ?1 that minimize some loss
    function L(? )
  • Inference conditional on our information set ?,
    ? must be exchangeable

8
The Benefits of Using the Bayesian Approach of
Exchangeability
  • Classical (Frequentist) i.i.d.vs. Bayesian
    exchangeability
  • A foundation for statistical inference on
    population data
  • DeFinettis Representation Theorem states
  • If a sample X1, X2,,Xn is a subset of an
    infinite exchangeable sequence, X?, then it is
    as if p(D ?,A) exists, where ? p(? )
  • Clarifies the goal of conditioning / model search
    process
  • We are trying to achieve anonymity of
    regression residuals
  • Clarifies the relationship between model search
    and prediction
  • What is the basis for using the past to make
    predictions of the future?
  • when the past and future are part of an
    exchangeable sequence!

9
Example of a Conditioning ProblemThe Sources of
Economic Growth
  • Why have some countries grown richer faster than
    others do?
  • Data (D) growth rates (g) assorted country
    characteristics (X)
  • Observations are countries (n ? 100)
  • Ex ante theory The Solow Model of Capital
    Accumulation
  • The Problem What about other variables that may
    affect g ?
  • Omitted variable bias robustness problems
  • D.o.F. problem Theories gt Observations
    (plus multicollinearity!)
  • Specifying functional forms for variables like
    democracy, ethnic diversity
  • Population heterogeneity Are France, Taiwan, and
    Sudan really all draws from the same
    distribution? Inference about ?2?

10
Exchangeability in Cross-Country Growth
Regressions
  • Inference requires conditional exchangeability
  • France, Taiwan, and Sudan are not exchangeable,
    but can we find appropriate vector X such that ?
    ? g EgX are exchangeable?
  • Conditioning just boils down to a problem of
    model selection!
  • The classical approach to model selection is
    hypothesis testing
  • However, D.o.F. problem has led to upward
    specification search!
  • In summary
  • Two types of uncertainty sampling (variance),
    model (bias)
  • Model Selection usually involve an artful
    trade-off of bias vs. variance
  • However, classical methods do not propagate our
    model uncertainty into coefficient estimates
  • Can Bayesian statistics help us bring science to
    the art of selection?

11
The Growth Literature, Take 1OLS estimates w/
controls dummies
12
The Growth Literature, Take 2Explaining
Parameter Heterogeneity
  • Tree Regressions
  • Local Linear Regressions (Spline models)
  • Varying Coefficient / Hierarchical Models

13
A Tree Regression
s60lt0.095

EQINVlt0.0144
laamlt0.5
s60lt0.03
-0.0072
0.0040
0.0159
NONEQINVlt0.1624
DEMOC65lt0.8435
FRAClt0.155
0.0213
0.0130
0.0068
EQINVlt0.04949
EQINVlt0.05405
lny60lt8.49696
0.0170
0.0330
0.0532
0.0390
0.0250
14
An Additive Spline Model Investment
15
An Additive Spline Model Schooling
16
An Additive Spline ModelPopulation Growth
17
Using splines to reveal non-linearities Solow
s(FRAC)
18
Does democracy modify effects of investment and
schooling?
19
A Varying Coefficient Model
20
Specification Searches
  • A specification search is a search for the mode
    of P(M D)
  • Bayes Rule
  • Problem 1 How strong is your prior belief about
    M?
  • Problem 2 Can you characterize your prior
    beliefs?
  • Problem 3 Using the same data to find M and to
    estimate ? ?
  • Danger! Why?
  • Problem 4 By conditioning model on M not p(M)
    , you are understating uncertainty about
    coefficient estimates!

21
Bayesian Model Averaging (BMA)
  • An alternative to trying to find the single best
    model (i.e., the mode of p(M) is to consider
    the entire distribution of specifications
  • Suppose you assign probability p(Ak) to K
    specifications, then
  • Averaging over model space improves statistical
    inference
  • Coefficient estimates tend to have better
    predictive ability
  • Standard errors reflect model, as well as
    parametric uncertainty

22
Some nasty theoretical details
  • Choosing the space of models and model priors
  • Managing summation in BMA can be trickywith 12
    possible covariates, there are 212 4,096
    different models to combine!
  • Occams Window suggested by Rafferty (1994)
    eliminate larger and/or less probable models
  • MC3 techniques transit across model space.
    Compute p(?,A) from p(?A) and p(AD)
  • Computing the integral p(DA)
    ??p(D?,A)p(?A)d?
  • This is done directly in MC3 techniques for BMA,
    otherwise
  • Can approximate using p(D ?MLE,A)

23
Bayesian Model Selection Results
24
Bayesian Model Averaging Results
25
Conclusions
  • Standard statistical inference is conditional on
    the chosen model
  • A data-driven model search is usually an
    unavoidable fact of life
  • Model must include appropriate vector of controls
    (bias vs. variance)
  • Model should address parameter heterogeneity and
    functional form
  • A methodological guide for conditioning is
    exchangeability
  • Of course, the very fact that we are searching
    for a model means we are really less certain
    about our estimates that we are stating
  • BMA techniques help to propagate model
    uncertainty into coefficient estimates and
    standard errors
Write a Comment
User Comments (0)
About PowerShow.com