Title: Causal Inference and Adequate Yearly Progress
1Causal Inference and Adequate Yearly Progress
University of Colorado at Boulder National Center
for Research on Evaluation,Standards, and
Student Testing (CRESST) CRESST ConferenceLos
Angeles, CASeptember 9, 2004
2Overview
- Description and Adequacy
- Causal Inference in the Context of NCLB-AYP
- Causal Inference in the Context of Value-Added
Models (VAMs) - Are we addressing causally meaningful questions?
- Directions for Future Research
3AYP as a Descriptive Measure
- Two 4th grade teachers Sylvia and Susan
- Sylvias students met AYP target in 2004 for 4th
grade reading - Susans students did NOT meet AYP target in 2004
for 4th grade reading - But this leads us naturally to ask WHY
- Why did Sylvias students meet AYP targets
- Why did Susans students fail to meet AYP target?
4The Slippery Slope from Description to Causation
- Casual comparisons inevitably initiate careless
causal conclusions - --Paul Holland, 2000
- EXAMPLES of Causal Attributions
- Sylvias students met AYP target because state
standards were properly emphasized. - Susans students did not meet AYP target because
they lacked necessary resources. - Students taught by Sylvia are learning to read
better than students taught by Susan.
5Defining Causal Inferences from NCLB
- What is the effect of Sylvias efforts to improve
student reading ability as measured by 4th grade
test scores? - Average 2004 scale score for Sylvias class
- COUNTERFACTUAL SUBSTITUTION
-
- Average causal effect attributed to Sylvia
6What is the Appropriate Counterfactual
Substitution?
- Is
- The average scale score on 4th grade reading test
for Sylvias class in 2003? - or
- The average scale score on 4th grade reading test
for Susans class in 2004?
This choice determines how we interpret the
meaning of the causal effect. The control
defines the treatment.
7The Implied NCLB Counterfactual
- Average scale score on 4th grade reading test for
Sylvias 2003 class
- Teachers serve as their own historical controls
- AYP can be viewed as an external judgment as to
the expected size of the effect - Effects cant be interpreted relative to other
teachers - Threats to validity are numerous
8An Added Wrinkle Standard Setting
- Policy not based on
- Instead, we focus on
- where S(.) represents the transformation from
scale score to meeting a given standard - An empirical question Do bNCLB and S(bNCLB) tell
equivalent stories?
9If you Believe the Counterfactual Substitution is
Reasonable
- 60 of Sylvias students performed at the
proficient level or higher on 4th grade reading
test in 2003 - AYP target is 5 increase
- 70 of Sylvias students performed at the
proficient level or higher on 4th grade reading
test in 2004 - The 10 improvement between 2003 and 2004 is
attributed to Sylvias teaching. Sylvias effect
gt 5, so it is considered adequate.
10The Value-Added Approach
- From repeated cross-sections to longitudinal data
- Complex statistical models fixed effects model,
cross-classified model, multivariate mixed
effects (layered) model - Controlling for prior student performance
- Teachers only held accountable for what students
learn (or fail to learn) - Direct estimates of teacher effects(?)
11Some Key Issues with VAMs
- Inclusion of covariates
- Impact of missing data
- Persistence attribution of teacher effects
- Impact of different methods of test construction,
scaling and equating - Plausibility of modeling assumptions
- Do VAM estimates of teacher effects have causally
meaningful interpretations?
12Teacher Effects in VAMs
- What is the effect of Sylvias efforts to improve
student reading ability as measured by 4th grade
test scores? - Avg score for Sylvias 4th graders in 2004
- Avg score for Sylvias 4th graders when they were
3rd graders in 2003 - Avg score for ALL 4th graders in 2004
- Avg score for ALL 4th graders when they were 3rd
graders in 2003 - Sylvias average
causal effect
13VAM Teacher Effects Graphically
Outcome
Source Raudenbush (2004), p. 125
Time
t
t 1
t 2
14Are VAM effects causally meaningful?
- Teacher effects in VAMs are normative
- -Effects are relative to system average
- -Effective teacher in one school may be
classified as ineffective in another school - A better term might be teacher deviations
- The treatment is poorly defined
- The control is ambiguous
- Policy implications are murky
15A Technical Digression Are teacher effects
random or fixed?
- Neither a fixed-effects model nor a random
effects model is unambiguously better. The
statistical implications of the choice may
influence the decision, but it is also partly
substantive The best decisions may depend on the
particular inferences that are considered most
important. - ---(McCaffrey et al, 2003, p. 67)
- Fixed effects models become unwieldy when
multiple time points and cohorts are available.
Given that fixed effect estimates have good
properties only in special circumstances, I would
recommend random effects as a general approach.
- ---(Raudenbush, 2004, p. 127)
16Random Effects and the Observed Data
- The preceding quotes imply to me that the random
effects assumption is just a statistical modeling
decision. - But the decision is actually an empirical
commitment about the nature of the observed data. - Are teacher effects really like independent
random draws from a population distribution? What
population? - What must we assume about how our observed data
was generated?
ltEnd of technical digressiongt
17Summary
Causally interpretable? Yes Yes
Is interpretation likely to be unbiased? No Maybe
Is interpretation meaningful? No No
Does the measure describe student learning? No Maybe
Is the measure valid? ? ?
18Directions for Future Research
- Demonstrate that cross-sectional and longitudinal
data paint different pictures about student
learning - Recast VAM-AYP effects to address causally
meaningful questions - -What is the effect of imposing a value-added
accountability system on student learning? - -What is the effect of innovative professional
development on teacher effectiveness? - Validate VAM measures of teacher effects by
developing criterion measures of teacher quality
19Sources
- Ballou, D., Sanders, W., Wright, P. (2004).
Controlling for student background in value-added
assessment for teachers. Journal of Educational
and Behavioral Statistics, 29(1), 37-66. - Kupermintz, H. (2003). Teacher effects and
teacher effectiveness a validity investigation
of the Tennessee Value Added Assessment System.
Educational Evaluation and Policy Analysis,
25(3), 287-298. - McCaffrey, D., Lockwood, J. R., Koretz, D.,
Louis, T., Hamilton, L. (2004). Models for
value-added modeling of teacher effects. Journal
of Educational and Behavioral Statistics, 29(1),
67-102. - McCaffrey, D., Lockwood, J. R., Koretz,
Hamilton, L. (2003). Evaluating value-added
models for teacher accountability. RAND
Corporation Santa Monica, CA. - Raudenbush, S. (2004). What are value-added
models estimating and what does this imply for
statistical practice? Journal of Educational and
Behavioral Statistics, 29(1), 121-130. - Reckase, M. (2004). The real world is more
complicated than we would like. Journal of
Educational and Behavioral Statistics, 29(1),
117-120. - Rubin, D., Stuart, E., Zanutto, E. (2004). A
potential outcomes view of value-added assessment
in education. Journal of Educational and
Behavioral Statistics, 29(1), 103-116. - Seltzer, M., Choi, K., Thum, Y. M. (2003).
Examining relationships between where students
start and how rapidly they progress Using new
developments in growth modeling to gain insights
into the distribution of achievement within
schools. Educational Evaluation and Policy
Analysis, 25(3), 263-286. - Tekwe, C., Carter, R., Ma, C.-X., Algina, J.,
Lucas, M., Roth, J., Abet, M., Fisher, T.,
Resnick, M. (2004). An empirical comparison of
statistical models for value-added assessment of
school performance. Journal of Educational and
Behavioral Statistics, 29(1), 11-36. - Thum, Y. M. (2004). Measuring progress towards a
goal estimating teacher productivity using a
multivariate multilevel model for value-added
analysis. Sociological Methods of Research.