Hierarchical Models cont' - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Hierarchical Models cont'

Description:

We shall assume that tj ~ gamma(.001,.001) for all j ... and kj ~ (0 , k) where k ~ Gamma(.001, .001) 2) Use 'Hierarchical ... and k ~ Gamma(.001,.001) ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 19
Provided by: jeffgry
Category:

less

Transcript and Presenter's Notes

Title: Hierarchical Models cont'


1
Hierarchical Models cont.
  • Review of the basic hierarchical regression
    probability model
  • WinBugs implementation of Bayesian hierarchical
    models
  • Hierarchical models with explanatory models for
    covariates
  • - Western (1998)

2
Hierarchical Regression Models
  • Suppose we have a standard multiple regression
    model where observations i cluster across
    sub-populations j.
  • (where j indexes, for example, geographic
    location, social group, period in history)
  • But, we do not want to assume that regression
    coefficients are identical across
    sub-populations.
  • We also want to allow for unequal variances
    across sub-populations.
  • We assume that each observation is distributed
    normally with an expected value determined by
    both observation-specific and sub-population
    characteristics and level of aggregation-specific
    variance. Thus, yij N(mij, tj).

3
The Random Coefficient Model
  • Suppose that yij N(mij, tj)
  • then mij b1j b2jx1i bmjxmi
  • For a random coefficient (hierarchical) model, we
    assume that
  • bkj ?k ?kj, where
  • - ? represents the overall effect of bk
  • - ?j represents the difference in the
    coefficient between sub-population j and the
    overall coefficient, where E?kj0
  • This framework implies that if
  • mij b1j b2jx1i bmjxmi , then
  • mij (?1 ?1j) (?2 ?2j )x2i (?m
    ?mj )xmi

4
Priors for the random coefficient model
  • Suppose that yij N(mij, tj)
  • where mij b1j b2jx1i bmjxmi
  • (?1 ?1j) (?2 ?2j )x2i (?m ?mj
    )xmi
  • We shall assume that tj gamma(.001,.001) for
    all j
  • Two basic strategies for defining priors for the
    coefficients
  • 1) Specify priors for both ?k and ?kj as follows
  • Let ?k N(prior mean, prior prec) priors are
    numbers
  • and ?kj (0 , ?k) where ?k Gamma(.001,
    .001)
  • 2) Use Hierarchical-centering as follows
  • bkj Normal(?k , ?k),
  • where ?k Normal(prior mean, prior prec)
    priors are numbers
  • and ?k Gamma(.001,.001)
  • Method 2 improves MCMC markedly in some cases
    (see Gilks and Roberts, Strategies for improving
    MCMC in MCMC in Practice)

5
Example from last time
  • Dependent variable
  • - Percentage of seats won by the Democratic
    Party in the House of Representatives in state i
    in election t.
  • Independent variable
  • - Level of ideological conflict within state is
    Democratic Party delegation to the House in
    period t-1.
  • - Control variables include dummy variables for
    the various states measuring their preference for
    the Democratic Party and for each election.

6
The probability model
  • Electoral Successit Normal( mit , ? ),
  • where mit ai bi Intra-Party Conflictit-1,
  • ai Normal( A , ?A ) for all i
  • A Normal( 0 , .01 )
  • ?A Gamma( .1, .1 )
  • bi Normal( B , ?B ) for all i
  • B Normal( 0 , .01 )
  • ?B Gamma( .1, .1 )
  • and ? Gamma( .1 , .1 )

Notice that rather than estimating a model with
an intercept and m-1 dummy variables, m dummy
variables are used
Similarly, notice that m state-specific effects
are estimated
Here I assume fixed variances (precisions) across
states. This assumption can and should be relaxed
to see if homoskedasticity is reasonable
7
WinBugs Implementation
  • model
  • macro model for hyperparameters
  • for (i in 121) loop over states
  • b i, 1 dnorm(eb1, varb1)
  • b i, 2 dnorm(eb2, varb2)
  • micro model
  • for (k in 1164)
  • perseatk dnorm(muperseatk, tau)
  • muperseatk lt- b statek, 1 b
    statek, 2 (vdim1k-mean(vdim1))
  • priors for variance parameters for
    micro-level model
  • tau dgamma(1,1)
  • eb1 dnorm(0, .01)
  • eb2 dnorm(0, .01)
  • varb1 dgamma( .1, .1 )
  • varb2 dgamma( .1, .1 )

Data Matrix state vdim1 perseat 4 0.00014
6333 0.5 16 0.000253667 1 5 0.000338
0.2 11 0.000338 1 14 0.00049625
0.56 16 0.000648 1 14 0.000730333 0.7
8
WinBugs Implementation
  • Hierarchical models in WinBugs requires care.
  • 1) Except for very simple models, you will run
    into convergence problems. Some subset of the
    following will almost certainly be required
  • - standardizing variables (even dummy variables,
    and especially variables with small variances
    relative to the others)
  • - hierarchical centering
  • - multivariate priors
  • - over-relaxation algorithm
  • - thinning your Markov Chain

9
WinBugs Implementation
  • Hierarchical models in WinBugs requires care.
  • 2) Hierarchical models often require complex sets
    of loops. This means that if you are not careful
    you will lose track of how you assign prior
    distributions.
  • - Be certain that all parameters have been
    assigned appropriate priors
  • - Be careful not to transpose indexes when
    working within loops
  • - When you monitor results, make sure that you
    monitor the parameter and not the number you
    assigned to the prior.

10
WinBugs Implementation
  • Hierarchical models in WinBugs require special
    care
  • 3) Data arrays are cumbersome
  • - your data sets must contain the information
    about the observation-level data and the data
    about the level of aggregation.
  • - you may prefer to load two different data
    sets, one for the observational data and one for
    the population data (esp. once we get to fuller
    models of the hierarchical structure)

11
The General 2-Layer Hierarchical (Multi-level)
Model
  • What if we believe that the lower-level
    regression coefficients across units of analysis
    are dependent on particular features of the
    sub-population (which could violate the
    exchangeability assumption)?
  • In this case, we can model the sub-population
    specific regression coefficients as a function of
    particular covariates.
  • Suppose that yij N(mij, tj)
  • then mij b1j b2jx1i bmjxmi
  • For a general multi-level (hierarchical) model,
    we assume that
  • bkj Zj ?k ?kj, where
  • - ? represents a vector of predictors for the
    sub-population regression coefficients
  • - Z represents a data matrix of the
    sub-population characteristics.
  • - ?j represents the difference in the
    coefficient between sub-population j and the
    overall coefficient, where E?kj0
  • One could substitute the expression bkj into the
    expression for mij for an estimate of the full
    expression for mij.
  • It is possible to add layer upon layer to the
    model, but in practice the data will typically
    only reveal information for a second layer.

12
The General 2-Layer Hierarchical (Multi-level)
Model
  • General hierarchical models have a number of
    advantages over least squares with standard dummy
    variables with interaction terms (which is a
    special case when the variances for the
    coefficient distributions are infinity)
    (Steenbergen and Jones ).
  • 1) Estimation of separate regressions for each
    sub-population is not efficient. Borrowing
    strength across sub-groups allows the full
    population to provide information about both the
    overall effect and each of the sub-populations.
  • 2) Least squares dummy variables are not able to
    explain the sources of heterogeneity in the
    behavior of various sub-populations.
  • (causal heterogeneity)
  • 3) Ignoring sub-population information may lead
    to too frequent rejection of null hypotheses of
    no effects. This is because observations are
    treated as independent even though they are in
    fact dependent because of the hierarchical
    nesting structure.

13
ExampleCausal Heterogeneity in Comparative
Research (Western)
  • Research Question the political determinants of
    economic growth in OECD countries
  • This is a pooled time-series cross section
    research design where the researcher believes
    that the causal effects across countries will not
    be identical.
  • ? this is because institutions condition the
    political determinants of economic growth and
    these institutions tend to be stable across time
    within countries.
  • To estimate this process, Western treats each
    country-level effect as if it was drawn from a
    common distribution.
  • ? so, the coefficients are treated as
    exchangeable

14
Western Paper
  • Dependent Variable yij change in real GDP in
    state i in period j
  • Independent variables
  • Gij Leftist Govt of cabinet seats held by
    leftist parties
  • Li Labor Org. Measure of labor union
    concentration/density
  • ? this variable is time-invariant
  • Dij Vulnerability to demand in OECD area
  • Iij Price Movements of OECD imports
  • Eij Price Movements of OECD exports
  • yij-1 A lagged dependent variable
  • Plus Li x G ij an interaction term between
    Labor Org and Leftist Govt

15
Western Paper
  • In the original study that Western replications,
    the effects of all variables was assumed to be
    constant across countries.
  • Thus, Eyit B0 B1 yij-1 B2 Dij B3 Iij
    B4Eij B5 Li B6 Gij B7 Li x G ij
  • This is clearly flawed because if political
    institutions matter, then we would expect
    differences in the effect of these various
    variables across countries.
  • This model can (and was) expanded to include
    interaction terms between the institutional
    variable Labor and everything else.
  • However, it is impossible to include a set of
    country-specific intercepts because the
    intercepts or coefficients because they would
    perfectly predict the Labor variable (perfect
    multicollinearity).

16
Western Probability Model
  • yit N(?it , ?)
  • ?it B0i B1i yij-1 B2i Dij B3i Iij B4i
    Eij B5i Gij
  • (dropping labor variable)
  • And Bki N(?bki, ?bk) for k 1, , 5
  • where ?bki ?k0 ?k1Li
  • The priors are given by
  • ?ki N ( .001 , .001 )
  • ? N ( .001 , .001 )
  • ?bk N ( .001 , .001 )

17
Winbugs Code will be put on the blackboard in
class
  • See Simon Jackmans MCMC page for a richer
    discussion of the results. This is his
    corporatism example.
  • http//tamarama.stanford.edu/mcmc/
  • It takes quite some time to go so I cant run it
    in class and his canned page of results only
    opens in WinBugs 1.3
  • Observations on the online setup
  • 1) Western (and Jackman) dont use all of the
    available tricks like standardizing variables, so
    it takes him 100,000 iterations to reach
    convergence.
  • 2) Notice how the data is formatted. He
    essentially reads in two different data setsone
    for the individual level data and one for the
    state-level data.

18
Data matrices for one of my projects
For a hierarchical model I estimated recently, I
had the following two data sets
This is the micro-level data percdist female
age black educ strpid strlibcon series
0.953944013 -1.039166274 -1.376793464 -0.33736372
2 -0.781001642 0.129087854 0.092924611 1 0.9539440
13 0.962263102 1.095396869 -0.337363722 -0.7810016
42 0.129087854 -0.997560551 1 0.530307732 -1.03916
6274 1.684013615 -0.337363722 -0.781001642 1.16704
1028 1.183409773 1 0.106671451 -1.039166274 1.1542
58544 -0.337363722 -0.781001642 0.129087854 -0.997
560551 1
This is the population level datanote the year
data here corresponds to the series response
able year eucdist 1 -0.927065327 2 -0.9365984
52 3 -0.763715526
To read the data into WinBugs, you may click on
percdist in data set one like usual and then
click load data. Then click year in data set two
and then load data. Then proceed like you
normally would.
Write a Comment
User Comments (0)
About PowerShow.com