Threats to the Validity of Measures of Achievement Gains - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Threats to the Validity of Measures of Achievement Gains

Description:

Assumption of consistency is violated to varying degrees depending on features ... Valid inferences require consistency between inference and test weights ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 29
Provided by: laura361
Learn more at: https://www.marces.org
Category:

less

Transcript and Presenter's Notes

Title: Threats to the Validity of Measures of Achievement Gains


1
Threats to the Validity of Measures of
Achievement Gains
  • Laura Hamilton and Daniel McCaffrey, RAND
    Corporation
  • Daniel Koretz, Harvard University
  • November 8, 2005

2
Growth Measures are Becoming More Common in State
Accountability Systems
  • NCLB is primarily not a growth-based approach to
    accountability, other than through safe harbor
  • Many states supplement NCLB with growth-based
    measures
  • Californias Academic Performance Index
  • Massachusetts Performance and Improvement ratings
  • U.S. Department of Education has recently
    expressed willingness to explore growth measures

3
Todays Presentation Examines Threats to Validity
of Growth Measures
  • Background How growth is measured
  • Framework for validating measures of change
  • Threats to validity
  • Dimensionality
  • Score inflation
  • Implications

4
Growth Metrics Come in Several Forms
  • Cohort to cohort (CTC)
  • E.g., the average for this years fifth graders
    compared to last years fifth graders
  • Quasi-longitudinal
  • E.g., the average for this years fifth graders
    compared to last years fourth graders
  • True longitudinal or individual growth (IG)
  • E.g., the average of the individual gains for
    this years fifth graders

5
Individual Growth Models are Generally Preferred
  • Address problems stemming from changes in student
    populations over time
  • Can yield biased estimates if students with
    incomplete data are different from other students
  • Provide better information to inform decisions
    about individual students or groups of students
  • CTC changes provides little information for
    stable schools

6
All Growth Models Require Assumptions about
Consistency of Constructs Measured
  • Users of information from growth models assume
    construct remains constant
  • For CTC models, nature of achievement and test
    content in a single grade should not change
  • For IG models, nature of achievement and
    constructs measured should not change as students
    progress through school
  • Assumption of consistency is violated to varying
    degrees depending on features of models, tests,
    curriculum

7
Consistency is One Aspect of Validity
  • Validity applies to inferences, not just to tests
  • Growth modeling raises concerns about validity of
    inferences about change
  • Need to understand what users infer from change
    scores
  • These inferences might vary by group (e.g.,
    parents, school administrators)
  • Match between what is inferred and what is
    actually measured is critical to validity

8
Framework for Validating Measures of Change
  • Validation of change scores has focused mainly on
    comparing trends between scores on two tests or
    on correlations between alternate measures
  • These traditional approaches do not address
    degree of match between tests or nonuniformity of
    changes within a test
  • Koretz, McCaffrey, and Hamilton (2001) developed
    a framework for validating tests under
    high-stakes conditions, with a focus on measuring
    change

9
Framework Addresses Nonuniformity of Gains Within
a Test
  • Test scores and inferences are considered in
    terms of specific performance elements
  • Substantive elements represent the domain of
    interest
  • Non-substantive elements are irrelevant to the
    domain of interest
  • Performance elements are associated with weights
  • Weights are typically not explicit
  • Some may be unintentional
  • Validity requires close match between test
    weights and inference weights

10
A Simple Linear Model for Test Scores
  • If we assume performance elements are additive,
    the a students scores in year t is
  • where qjt denotes the students performance on
    element j in year t and ljt is the test weight
  • The inference about a score assumes it is also a
    weighted sum of elements but might use different
    weights
  • Some weights can be zero

11
Several Factors Undermine Validity of Inferences
About Change
  • Changing nature of sample in CTC models
  • Differences in characteristics of students
    included at different time points undermine
    comparability
  • We do not address this problem here
  • Dimensionality Changes in performance elements
    and their weights
  • Score inflation Special case of dimensionality
    problem stemming from increases in scores that do
    not match increases in achievement

12
Dimensionality
  • Tests typically assess multiple performance
    elements
  • Test specifications or maps to standards provide
    explicit information about performance elements
  • But implicit and unintended elements are also
    likely to affect performance
  • We use the term dimensionality broadly to cover
    all types of performance elements
  • Users inferences are also likely to be
    multidimensional
  • Empirical unidimensionality is not sufficient to
    conclude dimensionality is not a problem

13
Dimensionality Affects Inferences about
Influences on Achievement
  • Analyses of NELS88 math and science assessments
    examine relationships among achievement, student
    background, and school and classroom experiences
    using subscales of achievement measure
  • For example, gender differences in science depend
    on what is measured
  • Difference is larger on items that require
    out-of-school knowledge or spatial reasoning
  • Focus on total score or on publisher-developed
    test specifications masks this difference
  • Similar findings for relationships with other
    student characteristics and school experiences

14
Dimensionality is Relevant to Value-Added Modeling
  • Subscales from a single mathematics achievement
    test produce dramatically different results
  • Study used Procedures and Problem Solving
    subscores from the Stanford Achievement Test
  • Variation within teachers across subscores was as
    large as or larger than variation across teachers
  • Results suggest that decisions about teacher or
    school effectiveness depend strongly on outcome
    measure
  • Changes in weights given to subscores could
    affect estimates of teacher or school
    effectiveness

15
The Effects of Different Weightings of
Computation and Problem Solving Scores on Teacher
Effects
16
Threats Stem from Changing Performance Weights or
Mismatch with Inference Weights
  • Many performance elements are likely to be
    inadvertent and non-substantive most measures of
    change will not be fully aligned with users
    inferences

17
Threats Stem from Changing Performance Weights or
Mismatch with Inference Weights
  • Sensitivity of test items to instruction is
    likely to vary across grades and across
    performance elements within the test, resulting
    in changing weights and/or incorrect inferences
    about educator effectiveness
  • When tests measure multiple elements, weights
    that change over time can contribute to gain
    scores independent of any gains on the
    performance elements

18
Implications for CTC and IG Models Vary
  • Most CTC models use the same test or parallel
    test forms from one year to the next
  • Test weights and inference weights will tend to
    remain reasonably constant over time
  • But performance elements might differ in their
    sensitivity to instruction
  • IG models face additional problem of changes in
    dimensionality and instructional sensitivity
    across grades
  • Problem is likely to be most severe for far-apart
    grade levels and for subjects in which the
    curriculum is not cumulative

19
Score Inflation
  • Score inflation refers to increases in test
    scores that are not matched by increases in the
    underlying achievement construct the test was
    intended to measure
  • Score inflation represents a special case of
    dimensionality-related problems

20
Score Inflation is Common in High-Stakes Testing
Contexts
  • Analyses of high-stakes test scores show gains in
    those scores are not matched by gains on other
    tests of the same content
  • Discrepancies in trends on high- and low-stakes
    tests suggest gains on high-stakes tests do not
    accurately reflect gains in the underlying
    achievement the test was intended to measure

21
Example of Score Inflation
Mathematics test scores
Source Koretz, Linn, Dunbar, Shepard, 1991
22
Variation in Teachers Responses to Tests Leads
to Variation in Inflation
  • Teachers respond to high-stakes testing in ways
    that are intended to maximize score increases
  • Placing more emphasis on tested topics than on
    untested topics, even when the latter are
    relevant to users inferences
  • Focusing on bubble kids (those just below the
    cut score)
  • Coaching on item styles, prompts, or rubrics
    (aspects of the test that are incidental to the
    domain being tested)
  • Many of these actions inflate scores by producing
    test-score gains that are larger than the gains
    in the broader achievement domain

23
Recent Surveys Suggest Teachers Practices are
Influenced by Tests
  • Data from surveys of teachers in California,
    Georgia, and Pennsylvania
  • Most teachers report increased focus on standards
    and on content emphasized on tests
  • More than half of elementary teachers report
    increasing time spent on test-taking strategies
  • Approximately 25 of teachers say they focus more
    on students near the proficient cut score
  • Responses tend to be stronger in math than in
    science

24
Score Inflation Exacerbates Inconsistencies in
Test and Inference Weights
25
Threats Stemming from Score Inflation
  • Problems arising from inflation are similar to
    those arising from dimensionality
  • Occurs when students make substantial gains on
    elements that might or might not have large
    inference weights, but fail to make gains on
    other elements that have high inference weights
  • Threatens the validity of inferences about gains
    in achievement when achievement is measured using
    high-stakes tests

26
Implications for CTC and IG Models
  • Most research on score inflation has focused on
    CTC measures
  • Evidence suggests score inflation is large in the
    first few years of test implementation but
    eventually plateaus
  • Even if inflation lessens over time, inferences
    about change should be limited to tested
    material change scores provide no information
    about untested material
  • IG models can be affected by variation in
    inflation across grades plateau effects might
    never occur

27
Improving the Validity of Inferences about Change
  • Users of test-score information need to recognize
    that measuring change is not necessarily the same
    as measuring growth
  • Test developers should make their measures as
    resistant to inflation as possible
  • Future research should address dimensionality and
    score inflation in the context of CTC and TL
    measures

28
Summary
  • Test scores and inferences depend on multiple
    performance elements
  • Valid inferences require consistency between
    inference and test weights
  • Inconsistency implies that changes in scores
    could be unrelated to the performance elements of
    interest
  • Score inflation
  • CTC susceptible to errors from growth on
    non-substantive or restricted set of elements
  • Effects likely to plateau
  • IG susceptible to changes in elements or content
    across grades
  • Can have big impact on growth and related measures
Write a Comment
User Comments (0)
About PowerShow.com