Hierarchical Models cont' - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Hierarchical Models cont'

Description:

We shall assume that tj ~ gamma(.001,.001) for all j ... and kj ~ (0 , k) where k ~ Gamma(.001, .001) 2) Use 'Hierarchical ... and k ~ Gamma(.001,.001) ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 19

Provided by: jeffgry

Category:

more less

Transcript and Presenter's Notes

Title: Hierarchical Models cont'

1
Hierarchical Models cont.

Review of the basic hierarchical regression
probability model
WinBugs implementation of Bayesian hierarchical
models
Hierarchical models with explanatory models for
covariates
- Western (1998)

2
Hierarchical Regression Models

Suppose we have a standard multiple regression
model where observations i cluster across
sub-populations j.
(where j indexes, for example, geographic
location, social group, period in history)
But, we do not want to assume that regression
coefficients are identical across
sub-populations.
We also want to allow for unequal variances
across sub-populations.
We assume that each observation is distributed
normally with an expected value determined by
both observation-specific and sub-population
characteristics and level of aggregation-specific
variance. Thus, yij N(mij, tj).

3
The Random Coefficient Model

Suppose that yij N(mij, tj)
then mij b1j b2jx1i bmjxmi
For a random coefficient (hierarchical) model, we
assume that
bkj ?k ?kj, where
- ? represents the overall effect of bk
- ?j represents the difference in the
coefficient between sub-population j and the
overall coefficient, where E?kj0
This framework implies that if
mij b1j b2jx1i bmjxmi , then
mij (?1 ?1j) (?2 ?2j )x2i (?m
?mj )xmi

4
Priors for the random coefficient model

Suppose that yij N(mij, tj)
where mij b1j b2jx1i bmjxmi
(?1 ?1j) (?2 ?2j )x2i (?m ?mj
)xmi
We shall assume that tj gamma(.001,.001) for
all j
Two basic strategies for defining priors for the
coefficients
1) Specify priors for both ?k and ?kj as follows
Let ?k N(prior mean, prior prec) priors are
numbers
and ?kj (0 , ?k) where ?k Gamma(.001,
.001)
2) Use Hierarchical-centering as follows
bkj Normal(?k , ?k),
where ?k Normal(prior mean, prior prec)
priors are numbers
and ?k Gamma(.001,.001)
Method 2 improves MCMC markedly in some cases
(see Gilks and Roberts, Strategies for improving
MCMC in MCMC in Practice)

5
Example from last time

Dependent variable
- Percentage of seats won by the Democratic
Party in the House of Representatives in state i
in election t.
Independent variable
- Level of ideological conflict within state is
Democratic Party delegation to the House in
period t-1.
- Control variables include dummy variables for
the various states measuring their preference for
the Democratic Party and for each election.

6
The probability model

Electoral Successit Normal( mit , ? ),
where mit ai bi Intra-Party Conflictit-1,
ai Normal( A , ?A ) for all i
A Normal( 0 , .01 )
?A Gamma( .1, .1 )
bi Normal( B , ?B ) for all i
B Normal( 0 , .01 )
?B Gamma( .1, .1 )
and ? Gamma( .1 , .1 )

Notice that rather than estimating a model with
an intercept and m-1 dummy variables, m dummy
variables are used
Similarly, notice that m state-specific effects
are estimated
Here I assume fixed variances (precisions) across
states. This assumption can and should be relaxed
to see if homoskedasticity is reasonable
7
WinBugs Implementation

model
macro model for hyperparameters
for (i in 121) loop over states
b i, 1 dnorm(eb1, varb1)
b i, 2 dnorm(eb2, varb2)
micro model
for (k in 1164)
perseatk dnorm(muperseatk, tau)
muperseatk lt- b statek, 1 b
statek, 2 (vdim1k-mean(vdim1))
priors for variance parameters for
micro-level model
tau dgamma(1,1)
eb1 dnorm(0, .01)
eb2 dnorm(0, .01)
varb1 dgamma( .1, .1 )
varb2 dgamma( .1, .1 )

Data Matrix state vdim1 perseat 4 0.00014
6333 0.5 16 0.000253667 1 5 0.000338
0.2 11 0.000338 1 14 0.00049625
0.56 16 0.000648 1 14 0.000730333 0.7
8
WinBugs Implementation

Hierarchical models in WinBugs requires care.
1) Except for very simple models, you will run
into convergence problems. Some subset of the
following will almost certainly be required
- standardizing variables (even dummy variables,
and especially variables with small variances
relative to the others)
- hierarchical centering
- multivariate priors
- over-relaxation algorithm
- thinning your Markov Chain

9
WinBugs Implementation

Hierarchical models in WinBugs requires care.
2) Hierarchical models often require complex sets
of loops. This means that if you are not careful
you will lose track of how you assign prior
distributions.
- Be certain that all parameters have been
assigned appropriate priors
- Be careful not to transpose indexes when
working within loops
- When you monitor results, make sure that you
monitor the parameter and not the number you
assigned to the prior.

10
WinBugs Implementation

Hierarchical models in WinBugs require special
care
3) Data arrays are cumbersome
- your data sets must contain the information
about the observation-level data and the data
about the level of aggregation.
- you may prefer to load two different data
sets, one for the observational data and one for
the population data (esp. once we get to fuller
models of the hierarchical structure)

11
The General 2-Layer Hierarchical (Multi-level)
Model

What if we believe that the lower-level
regression coefficients across units of analysis
are dependent on particular features of the
sub-population (which could violate the
exchangeability assumption)?
In this case, we can model the sub-population
specific regression coefficients as a function of
particular covariates.
Suppose that yij N(mij, tj)
then mij b1j b2jx1i bmjxmi
For a general multi-level (hierarchical) model,
we assume that
bkj Zj ?k ?kj, where
- ? represents a vector of predictors for the
sub-population regression coefficients
- Z represents a data matrix of the
sub-population characteristics.
- ?j represents the difference in the
coefficient between sub-population j and the
overall coefficient, where E?kj0
One could substitute the expression bkj into the
expression for mij for an estimate of the full
expression for mij.
It is possible to add layer upon layer to the
model, but in practice the data will typically
only reveal information for a second layer.

12
The General 2-Layer Hierarchical (Multi-level)
Model

General hierarchical models have a number of
advantages over least squares with standard dummy
variables with interaction terms (which is a
special case when the variances for the
coefficient distributions are infinity)
(Steenbergen and Jones ).
1) Estimation of separate regressions for each
sub-population is not efficient. Borrowing
strength across sub-groups allows the full
population to provide information about both the
overall effect and each of the sub-populations.
2) Least squares dummy variables are not able to
explain the sources of heterogeneity in the
behavior of various sub-populations.
(causal heterogeneity)
3) Ignoring sub-population information may lead
to too frequent rejection of null hypotheses of
no effects. This is because observations are
treated as independent even though they are in
fact dependent because of the hierarchical
nesting structure.

13
ExampleCausal Heterogeneity in Comparative
Research (Western)

Research Question the political determinants of
economic growth in OECD countries
This is a pooled time-series cross section
research design where the researcher believes
that the causal effects across countries will not
be identical.
? this is because institutions condition the
political determinants of economic growth and
these institutions tend to be stable across time
within countries.
To estimate this process, Western treats each
country-level effect as if it was drawn from a
common distribution.
? so, the coefficients are treated as
exchangeable

14
Western Paper

Dependent Variable yij change in real GDP in
state i in period j
Independent variables
Gij Leftist Govt of cabinet seats held by
leftist parties
Li Labor Org. Measure of labor union
concentration/density
? this variable is time-invariant
Dij Vulnerability to demand in OECD area
Iij Price Movements of OECD imports
Eij Price Movements of OECD exports
yij-1 A lagged dependent variable
Plus Li x G ij an interaction term between
Labor Org and Leftist Govt

15
Western Paper

In the original study that Western replications,
the effects of all variables was assumed to be
constant across countries.
Thus, Eyit B0 B1 yij-1 B2 Dij B3 Iij
B4Eij B5 Li B6 Gij B7 Li x G ij
This is clearly flawed because if political
institutions matter, then we would expect
differences in the effect of these various
variables across countries.
This model can (and was) expanded to include
interaction terms between the institutional
variable Labor and everything else.
However, it is impossible to include a set of
country-specific intercepts because the
intercepts or coefficients because they would
perfectly predict the Labor variable (perfect
multicollinearity).

16
Western Probability Model

yit N(?it , ?)
?it B0i B1i yij-1 B2i Dij B3i Iij B4i
Eij B5i Gij
(dropping labor variable)
And Bki N(?bki, ?bk) for k 1, , 5
where ?bki ?k0 ?k1Li
The priors are given by
?ki N ( .001 , .001 )
? N ( .001 , .001 )
?bk N ( .001 , .001 )

17
Winbugs Code will be put on the blackboard in
class

See Simon Jackmans MCMC page for a richer
discussion of the results. This is his
corporatism example.
http//tamarama.stanford.edu/mcmc/
It takes quite some time to go so I cant run it
in class and his canned page of results only
opens in WinBugs 1.3
Observations on the online setup
1) Western (and Jackman) dont use all of the
available tricks like standardizing variables, so
it takes him 100,000 iterations to reach
convergence.
2) Notice how the data is formatted. He
essentially reads in two different data setsone
for the individual level data and one for the
state-level data.

18
Data matrices for one of my projects
For a hierarchical model I estimated recently, I
had the following two data sets
This is the micro-level data percdist female
age black educ strpid strlibcon series
0.953944013 -1.039166274 -1.376793464 -0.33736372
2 -0.781001642 0.129087854 0.092924611 1 0.9539440
13 0.962263102 1.095396869 -0.337363722 -0.7810016
42 0.129087854 -0.997560551 1 0.530307732 -1.03916
6274 1.684013615 -0.337363722 -0.781001642 1.16704
1028 1.183409773 1 0.106671451 -1.039166274 1.1542
58544 -0.337363722 -0.781001642 0.129087854 -0.997
560551 1
This is the population level datanote the year
data here corresponds to the series response
able year eucdist 1 -0.927065327 2 -0.9365984
52 3 -0.763715526
To read the data into WinBugs, you may click on
percdist in data set one like usual and then
click load data. Then click year in data set two
and then load data. Then proceed like you
normally would.

Write a Comment

User Comments (0)