Title: An Introduction to Validity Arguments for Alternate Assessments
1An Introduction to Validity Arguments for
Alternate Assessments
- Scott Marion
- Center for Assessment
- Eighth Annual MARCES Conference
- University of Maryland
- October 11-12, 2007
2Overview
- A little validity background
- Creating and evaluating a validity argumentor
translating Kane (and others) to AA-AAS - Can we make it practical?
- A focus on validity in technical documentation
3- Validation is a lengthy, even endless process
- (Cronbach, 1989, p.151)
- Good for consultants, but not so great for state
folks and contractors - Are you nervous yet.
4Validity Should be Central
- We argue that the purpose of the technical
documentation is to provide data to support or
refute the validity of the inferences from the
alternate assessments at both the student and
program level
5Unified Conception of Validity
- Drawing on the work of Cronbach, Messick,
Shepard, and Kane the proposed evaluation of
technical quality is built around a unified
conception of validity - centered on the inferences related to the
construct including significant attention to the
social consequences of the assessment
6- But what is a validity argument and how do we
evaluate the validity of our inferences?
7A little history
- Kane traces the history of validity theory from
the criterion through the content model to the
construct model. - It is worth stopping briefly to discuss the
content model, because that appears to be where
many still appear to operate.
8- The content model interprets test scores based
on a sample of performances in some area of
activity as an estimate of overall level of skill
in that activity. The sample of items/tasks and
observed performances must be - representative of the domain,
- evaluated appropriately and fairly, and
- part of a large enough sample
- So, this sounds good, right?
9Concerns with the content model
- Messick (1989) argued that content-based
validity evidence does not involve test scores or
the performances on which the scores are based
and therefore cannot be used to justify
conclusions about the interpretation of test
scores. (p. 17) - Huh? More simplycontent evidence is a matching
exercise and doesnt really help us get at the
interpretations we make from scores - Is it useful? Sure, but with the intense focus
on alignment these days, content evidence appears
to be privileged compared with trying to create
arguments for the meaning of test scores
10The Construct Model
- We can trace this evolution from Cronbach and
Meehl (1955) through Loevinger (1957) to Cronbach
(1971) and culminating in Messick 1989) - Focused attention on the many factors associated
with the interpretations and uses of test scores
(and not simply with correlations) - Emphasized the important effect of assumptions in
score interpretations and the need to check these
assumptions - Allowed for the possibility of alternative
explanations for test scoresin fact, this model
even encouraged falsification
11Limitations of the Construct Model
- Does not provide clear guidance for the
validation of a test score interpretation and/or
use - Did not help evaluators prioritize validity
studies - If, as Anastasi (1986) noted, almost any
information gathered in the process of developing
or using a test is relevant to its validity (p.
3), where should one start and how do you know
when youre done or are you ever done?
12Transitioning to argument
- The call for careful examination of alternative
explanations within the construct model is
helpful for directing a program of validity
research
13Kanes argument-based framework
- assumes that the proposed interpretations and
uses will be explicitly stated as an argument, or
network of inferences and supporting assumptions,
leading from observations to the conclusions and
decisions. Validation involves an appraisal of
the coherence of this argument and of the
plausibility of its inferences and assumptions
(Kane, 2006, p. 17). - Sounds easy, right
14Two Types of Arguments
- An interpretative argument specifies the proposed
interpretations and uses of test results by
laying out the network of inferences and
assumptions leading to the observed performances
to the conclusions and decisions based on the
performances - The validity argument provides an evaluation of
the interpretative argument (Kane, 2006)
15- Kanes approach provides a more pragmatic
approach to validation, involving the
specification of proposed interpretations and
uses, the development of a measurement procedure
that is consistent with this proposal, and a
critical evaluation of the coherence of the
proposal and the plausibility of its inferences
and assumptions. - The challenge is that most assessments do not
start from an explicit attention to validity in
the design phase
16The Interpretative Argument
- Essentially a mini-theorythe interpretative
argument provides a framework for interpretation
and use of test scores - Like theory, the interpretative argument guides
the data collection and methods and most
importantly, theories are falsifiable as we
critically evaluate the evidence and arguments
17Two stages of the interpretative argument
- Development stagefocus on development of
measurement tools and procedures as well as the
corresponding interpretative argument - An appropriate confirmationist bias in this stage
since the developers (state and contractors) are
trying to make the program the best it can be - Appraisal stagefocus on critical evaluation of
the interpretative argument - Should be more neutral and arms-length to
provide a more convincing evaluation of the
proposed interpretations and uses - Falsification, obviously, is something we prefer
to do unto the constructions of others - (Cronbach, 1989, p. 153)
18Interpretative argument
- Difficulty in specifying an interpretative
argumentmay indicate a fundamental problem. If
it is not possible to come up with a test plan
and plausible rational for a proposed
interpretation and use, it is not likely that
this interpretation and use will be considered
valid (Kane, 2006, p. 26). - Think of the interpretative argument as a series
of if-then statements - E.g., if the student performs the task in a
certain way, then the observed score should have
a certain value
19Criteria for Evaluating Interpretative Arguments
- Clarityshould be clearly stated as a framework
for validation. Inferences and warrants
specified in enough detail to make proposed
claims explicit. - Coherenceassuming the individual inferences are
plausible, the network of inferences leading from
the observations to conclusions and decisions
make sense - Plausibilityparticularly of assumptions, are
judged in terms of all the evidence for and
against them
20- One of the most effective challenges to
interpretative arguments (or scientific theories)
is to propose and substantiate an alternative
argument that is more plausible - With AA-AAS we have to seriously consider and
challenge ourselves with competing alternative
explanations for test scores, for example - higher scores on our states AA-AAS reflects
greater learning of the content frameworks OR - higher scores on our states AA-AAS reflects
higher levels of student functioning
21Categories of interpretative arguments (Kane,
2006)
- Trait interpretations
- Theory-based interpretations
- Qualitative interpretations
- Decision procedures
- Like scientific theories, the specific type of
interpretative argument for test-based inferences
guides models, data collection, assumptions,
analyses, and claims
22Decision Procedures
- Evaluating a decision procedure requires an
evaluation of values and consequences - To evaluate a testing program as an instrument
of policy e.g., AA-AAS under NCLB, it is
necessary to evaluate its consequences (Kane,
2006, p.53) - Therefore, values inherent in the testing program
must be made explicit and the consequences of the
decisions as a result of test scores must be
evaluated!
23Prioritizing and Focusing
- Shepard (1993) advocated a straightforward means
to prioritize validity questions. Using an
evaluation framework, she proposed that validity
studies be organized in response to the
questions - What does the testing practice claim to do
- What are the arguments for and against the
intended aims of the test and - What does the test do in the system other than
what it claims, for good or bad? (Shepard, 1993,
p. 429). - The questions are directed to concerns about the
construct, relevance, interpretation, and social
consequences, respectively.
24A heuristic to help organize and focus the
validity evaluation (Marion, Quenemoen, Kearns,
2006)
- VALIDITY EVALUATION
- Empirical Evidence
- Theory and Logic (argument)
- Consequential Features
- Reporting
- Alignment
- Item Analysis/DIF/Bias
- Measurement Error
- Scaling and Equating
- Standard Setting
- Assessment System
- Test Development
- Administration
- Scoring
- Student Population
- Academic Content
- Theory of Learning
25Synthesizing and Integrating
- Haertel (1999) reminded us that the individual
pieces of evidence (typically presented in
separate chapters of technical documents) do not
make the assessment system valid or not, it is
only by synthesizing this evidence in order to
evaluate the interpretative argument can we judge
the validity of the assessment program.
26NHEAI/NAAC Technical Documentation
- The Nuts and Bolts
- The Validity Evaluation
- The Stakeholder Summary
- The Transition Document
27The Validity Evaluation
- Author Independent contractor with considerable
input from state DOE - Audience State policy makers, state DOE,
district assessment and special education
directors, state TAC members, special education
teachers, and other key stakeholders. This also
will contribute to the legal defensibility of the
system. - Notes This will be a dynamic volume where new
evidence is collected and evaluated over time.
28Table of Contents
- Overview of the Assessment System
- Who are the students?
- What is the content?
- Introduction of the Validity Framework and
Argument - Empirical Evidence
- Evaluating the Validity Argument
29Chapter VI The Validity Evaluation
- Revisiting the interpretative argument
- Logical/theoretical relationships among the
content, students, learning, and
assessmentrevisiting the assessment triangle - The specific validity evaluation questions
addressed in this volume - Synthesizing and weighing the various sources of
evidence - Arguments for the validity of the system
- Arguments against the validity of the system
- An overall judgment about the defensibility of
inferences from the scores of the AA-AAS in the
context of specific uses and purposes