Estimation techniques for clustered hierarchical data - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Estimation techniques for clustered hierarchical data

Description:

OLS the simplest and best understood estimator. Restrictive assumptions ... If assumptions do not hold, MLM. underestimates SEs of higher-level parameters ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 15
Provided by: bus46
Category:

less

Transcript and Presenter's Notes

Title: Estimation techniques for clustered hierarchical data


1
Estimation techniques for clustered
(hierarchical) data
  • Cluster-robust linear regression
  • and
  • Multilevel modelling

2
Estimators a trade off
  • Trade-off
  • Simplicity
  • OLS the simplest and best understood estimator
  • Restrictive assumptions
  • OLS makes assumptions about the data that often
    do not apply
  • e.g., independence
  • Other estimators
  • More realistic assumptions
  • e.g., that observations are inter-related in
    various ways
  • e.g., clustering
  • pupils in schools (or classes)
  • Statistical impact ? serial correlation
  • intra-group
  • More complex
  • Computation done by software
  • Need an intuitive understanding of what they do
    and what they dont do

3
Problems arising from clustering (hierarchical
data)
  • OLS
  • Assumes observations independent
  • ? Maximum information
  • Survey data
  • Observations often clustered
  • Individuals in families
  • Firms in industries or locations
  • Students in classes
  • ? observations not fully independent
  • reflected in the residuals
  • ? OLS underestimates SEs of regression
    coefficients
  • ? spurious precision

4
The problem of dependent observations
  • Units are clustered ( grouped)
  • e.g., students within a particular school
  • tend to be more like each other than students at
    other schools
  • ? a sample of students from a single school
  • less varied data than a random sample of the
    same sizefrom all students
  • ? loss of information
  • ? cannot use OLS
  • dependence between observations has to be modelled

5
Implications for estimation
  • OLS not appropriate
  • ? use different estimators
  • Cluster robust linear regression
  • Adjusts SEs to account for loss of independence
  • ? clustering a nuisance to control for
  • Necessary for honest estimates of standard
    errors
  • Multilevel modelling ( Hierarchical linear
    modelling)
  • Benefit
  • Explicitly model effects at each level
  • e.g., school/classroom/pupil
  • Identifies where and how effects occur
  • Cost
  • More powerful assumptions
  • ? results more dependent on assumption of random
    and normally distributed effects
  • i.e., sensitive to outliers and skewed error
    distribution
  • If assumptions do not hold, MLM
  • underestimates SEs of higher-level parameters

6
Estimation and data requirements
  • CRLR
  • STATA 8
  • LIMDEP 8
  • Multilevel modelling
  • MLwiN
  • Data requirements
  • Most common
  • individual data (e.g., pupils)
  • Variables to indicate belonging to higher level
    units
  • Class (and/or teacher)
  • School
  • Number of levels?
  • In practice, no more than 3 or 4

7
Multilevel modelling continuous response
2-level, 2 variable example
  • Single-level
  • Pupils only
  • Multilevel (1)
  • Pupils
  • Schools
  • different intercepts for each school
  • Multilevel (2)
  • Pupils
  • Schools
  • different intercepts
  • different slopes

8
Single level model
  • Individual pupils
  • yi individual test scores
  • xi individual ability
  • ei individual error terms
  • difference between actual predicted scores
  • i indexes pupils 1n
  • ?0 overall intercept (fixed all the same)
  • ?1 overall slope (fixed all the same)
  • Shows how individual test scores related to
    individual ability
  • ?1 measures the average relationship
  • Shortcoming
  • No measurement of how this average relationship
    varies between schools
  • ? model both pupil and school effects together

9
Two-levels pupils in schools
  • Random intercepts and slopes
  • Most general model for 2 variables and 2 levels
  • Fixed part of the model ( regression
    coefficients)
  • ?0 overall intercept (fixed)
  • Subscript 0 ? associated with the intercept
  • ?1 overall slope (fixed)
  • Subscript 1 ? associated with the slope
  • Random part of the model
  • Random effects
  • j indexes schools
  • uoj a school-level error term (between-school
    effect)
  • gives each school an individual intercept (? ?0)
  • u1j gives each school an individual slope
  • Random slope coefficient (?1u1j)xij
  • j ? between-school effect
  • eij within-school error term for ith pupil in
    the jth school
  • pupil-level error term

10
Variance of the random effects
  • Additional information

11
Example from MLwiN
  • Fixed Component
  • Overall constant -0.012 (not significant) (not
    interesting!)
  • Positive mean association between ability score
    (0.557)
  • Random component
  • School-level variance component 0.090
  • Variation of individual schools slopes around
    mean 0.015
  • Positive cov. between schools intercepts and
    slopes 0.018
  • Pupil-level variance (within-school variation)
    0.554

12
Further implications
  • Between-school variation around the regression
    line as a proportion of total variation
  • 14 of variation accounted for at school level
  • Positive covariance between schools intercepts
    and slopes
  • ? schools with steep slopes have high intercepts
    and vice versa
  • i.e., fanning out of school regression lines

13
Use in school effectiveness research
  • Specify additional variables at all levels
  • Control for variations in individual pupil
    characteristics (subscript ij)
  • Specify variables at individual level
  • Account for differences in school performance
    (subscript j)
  • Specify variables at school level
  • Contextual (function of the pupil-level data)
  • Proportion with FSM
  • Mean and/or SD of attainment
  • True school-level variable
  • Mixed or single-sex
  • Homework policy
  • Can be specified with or without random elements

14
Want to know more?
  • In order of difficulty
  • Tranmer, M. (2002) Multilevel Modelling
    Coursebook, Manchester University CCSR.
  • Rasbash, J. et al. (2002) A Users Guide to MLwiN
    (Version 2.1d), Centre for Multilevel Modelling,
    Institute of Education, University of London.
  • Goldstein, H. (1995 2nd Ed.) Multilevel
    Statistical Models, London Arnold
  • Available online http//www.ioe.ac.uk/hgpersonal/
Write a Comment
User Comments (0)
About PowerShow.com