Title: Hierarchical Models cont'
1Hierarchical Models cont.
- Review of the basic hierarchical regression
probability model - WinBugs implementation of Bayesian hierarchical
models - Hierarchical models with explanatory models for
covariates - - Western (1998)
2Hierarchical Regression Models
- Suppose we have a standard multiple regression
model where observations i cluster across
sub-populations j. - (where j indexes, for example, geographic
location, social group, period in history) - But, we do not want to assume that regression
coefficients are identical across
sub-populations. - We also want to allow for unequal variances
across sub-populations. - We assume that each observation is distributed
normally with an expected value determined by
both observation-specific and sub-population
characteristics and level of aggregation-specific
variance. Thus, yij N(mij, tj).
3The Random Coefficient Model
- Suppose that yij N(mij, tj)
- then mij b1j b2jx1i bmjxmi
- For a random coefficient (hierarchical) model, we
assume that - bkj ?k ?kj, where
- - ? represents the overall effect of bk
- - ?j represents the difference in the
coefficient between sub-population j and the
overall coefficient, where E?kj0 - This framework implies that if
- mij b1j b2jx1i bmjxmi , then
- mij (?1 ?1j) (?2 ?2j )x2i (?m
?mj )xmi
4Priors for the random coefficient model
- Suppose that yij N(mij, tj)
- where mij b1j b2jx1i bmjxmi
- (?1 ?1j) (?2 ?2j )x2i (?m ?mj
)xmi - We shall assume that tj gamma(.001,.001) for
all j - Two basic strategies for defining priors for the
coefficients - 1) Specify priors for both ?k and ?kj as follows
- Let ?k N(prior mean, prior prec) priors are
numbers - and ?kj (0 , ?k) where ?k Gamma(.001,
.001) - 2) Use Hierarchical-centering as follows
- bkj Normal(?k , ?k),
- where ?k Normal(prior mean, prior prec)
priors are numbers - and ?k Gamma(.001,.001)
- Method 2 improves MCMC markedly in some cases
(see Gilks and Roberts, Strategies for improving
MCMC in MCMC in Practice)
5Example from last time
- Dependent variable
- - Percentage of seats won by the Democratic
Party in the House of Representatives in state i
in election t. - Independent variable
- - Level of ideological conflict within state is
Democratic Party delegation to the House in
period t-1. - - Control variables include dummy variables for
the various states measuring their preference for
the Democratic Party and for each election.
6The probability model
- Electoral Successit Normal( mit , ? ),
- where mit ai bi Intra-Party Conflictit-1,
- ai Normal( A , ?A ) for all i
- A Normal( 0 , .01 )
- ?A Gamma( .1, .1 )
-
- bi Normal( B , ?B ) for all i
- B Normal( 0 , .01 )
- ?B Gamma( .1, .1 )
- and ? Gamma( .1 , .1 )
Notice that rather than estimating a model with
an intercept and m-1 dummy variables, m dummy
variables are used
Similarly, notice that m state-specific effects
are estimated
Here I assume fixed variances (precisions) across
states. This assumption can and should be relaxed
to see if homoskedasticity is reasonable
7WinBugs Implementation
- model
- macro model for hyperparameters
- for (i in 121) loop over states
- b i, 1 dnorm(eb1, varb1)
- b i, 2 dnorm(eb2, varb2)
-
- micro model
- for (k in 1164)
- perseatk dnorm(muperseatk, tau)
- muperseatk lt- b statek, 1 b
statek, 2 (vdim1k-mean(vdim1)) -
-
- priors for variance parameters for
micro-level model - tau dgamma(1,1)
- eb1 dnorm(0, .01)
- eb2 dnorm(0, .01)
- varb1 dgamma( .1, .1 )
- varb2 dgamma( .1, .1 )
Data Matrix state vdim1 perseat 4 0.00014
6333 0.5 16 0.000253667 1 5 0.000338
0.2 11 0.000338 1 14 0.00049625
0.56 16 0.000648 1 14 0.000730333 0.7
8WinBugs Implementation
- Hierarchical models in WinBugs requires care.
- 1) Except for very simple models, you will run
into convergence problems. Some subset of the
following will almost certainly be required - - standardizing variables (even dummy variables,
and especially variables with small variances
relative to the others) - - hierarchical centering
- - multivariate priors
- - over-relaxation algorithm
- - thinning your Markov Chain
9WinBugs Implementation
- Hierarchical models in WinBugs requires care.
- 2) Hierarchical models often require complex sets
of loops. This means that if you are not careful
you will lose track of how you assign prior
distributions. - - Be certain that all parameters have been
assigned appropriate priors - - Be careful not to transpose indexes when
working within loops - - When you monitor results, make sure that you
monitor the parameter and not the number you
assigned to the prior.
10WinBugs Implementation
- Hierarchical models in WinBugs require special
care - 3) Data arrays are cumbersome
- - your data sets must contain the information
about the observation-level data and the data
about the level of aggregation. - - you may prefer to load two different data
sets, one for the observational data and one for
the population data (esp. once we get to fuller
models of the hierarchical structure)
11The General 2-Layer Hierarchical (Multi-level)
Model
- What if we believe that the lower-level
regression coefficients across units of analysis
are dependent on particular features of the
sub-population (which could violate the
exchangeability assumption)? - In this case, we can model the sub-population
specific regression coefficients as a function of
particular covariates. - Suppose that yij N(mij, tj)
- then mij b1j b2jx1i bmjxmi
- For a general multi-level (hierarchical) model,
we assume that - bkj Zj ?k ?kj, where
- - ? represents a vector of predictors for the
sub-population regression coefficients - - Z represents a data matrix of the
sub-population characteristics. - - ?j represents the difference in the
coefficient between sub-population j and the
overall coefficient, where E?kj0 - One could substitute the expression bkj into the
expression for mij for an estimate of the full
expression for mij. - It is possible to add layer upon layer to the
model, but in practice the data will typically
only reveal information for a second layer.
12The General 2-Layer Hierarchical (Multi-level)
Model
- General hierarchical models have a number of
advantages over least squares with standard dummy
variables with interaction terms (which is a
special case when the variances for the
coefficient distributions are infinity)
(Steenbergen and Jones ). - 1) Estimation of separate regressions for each
sub-population is not efficient. Borrowing
strength across sub-groups allows the full
population to provide information about both the
overall effect and each of the sub-populations. - 2) Least squares dummy variables are not able to
explain the sources of heterogeneity in the
behavior of various sub-populations. - (causal heterogeneity)
- 3) Ignoring sub-population information may lead
to too frequent rejection of null hypotheses of
no effects. This is because observations are
treated as independent even though they are in
fact dependent because of the hierarchical
nesting structure.
13ExampleCausal Heterogeneity in Comparative
Research (Western)
- Research Question the political determinants of
economic growth in OECD countries - This is a pooled time-series cross section
research design where the researcher believes
that the causal effects across countries will not
be identical. - ? this is because institutions condition the
political determinants of economic growth and
these institutions tend to be stable across time
within countries. - To estimate this process, Western treats each
country-level effect as if it was drawn from a
common distribution. - ? so, the coefficients are treated as
exchangeable
14Western Paper
- Dependent Variable yij change in real GDP in
state i in period j - Independent variables
- Gij Leftist Govt of cabinet seats held by
leftist parties - Li Labor Org. Measure of labor union
concentration/density - ? this variable is time-invariant
- Dij Vulnerability to demand in OECD area
- Iij Price Movements of OECD imports
- Eij Price Movements of OECD exports
- yij-1 A lagged dependent variable
- Plus Li x G ij an interaction term between
Labor Org and Leftist Govt
15Western Paper
- In the original study that Western replications,
the effects of all variables was assumed to be
constant across countries. - Thus, Eyit B0 B1 yij-1 B2 Dij B3 Iij
B4Eij B5 Li B6 Gij B7 Li x G ij - This is clearly flawed because if political
institutions matter, then we would expect
differences in the effect of these various
variables across countries. - This model can (and was) expanded to include
interaction terms between the institutional
variable Labor and everything else. - However, it is impossible to include a set of
country-specific intercepts because the
intercepts or coefficients because they would
perfectly predict the Labor variable (perfect
multicollinearity).
16Western Probability Model
- yit N(?it , ?)
- ?it B0i B1i yij-1 B2i Dij B3i Iij B4i
Eij B5i Gij - (dropping labor variable)
- And Bki N(?bki, ?bk) for k 1, , 5
- where ?bki ?k0 ?k1Li
- The priors are given by
- ?ki N ( .001 , .001 )
- ? N ( .001 , .001 )
- ?bk N ( .001 , .001 )
17Winbugs Code will be put on the blackboard in
class
- See Simon Jackmans MCMC page for a richer
discussion of the results. This is his
corporatism example. - http//tamarama.stanford.edu/mcmc/
- It takes quite some time to go so I cant run it
in class and his canned page of results only
opens in WinBugs 1.3 - Observations on the online setup
- 1) Western (and Jackman) dont use all of the
available tricks like standardizing variables, so
it takes him 100,000 iterations to reach
convergence. - 2) Notice how the data is formatted. He
essentially reads in two different data setsone
for the individual level data and one for the
state-level data.
18Data matrices for one of my projects
For a hierarchical model I estimated recently, I
had the following two data sets
This is the micro-level data percdist female
age black educ strpid strlibcon series
0.953944013 -1.039166274 -1.376793464 -0.33736372
2 -0.781001642 0.129087854 0.092924611 1 0.9539440
13 0.962263102 1.095396869 -0.337363722 -0.7810016
42 0.129087854 -0.997560551 1 0.530307732 -1.03916
6274 1.684013615 -0.337363722 -0.781001642 1.16704
1028 1.183409773 1 0.106671451 -1.039166274 1.1542
58544 -0.337363722 -0.781001642 0.129087854 -0.997
560551 1
This is the population level datanote the year
data here corresponds to the series response
able year eucdist 1 -0.927065327 2 -0.9365984
52 3 -0.763715526
To read the data into WinBugs, you may click on
percdist in data set one like usual and then
click load data. Then click year in data set two
and then load data. Then proceed like you
normally would.