An Introduction to Validity Arguments for Alternate Assessments - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

An Introduction to Validity Arguments for Alternate Assessments

Description:

Creating and evaluating a validity argument...or translating Kane (and others) ... organize and focus the validity evaluation (Marion, Quenemoen, & Kearns, 2006) ... – PowerPoint PPT presentation

Number of Views:138

Avg rating:3.0/5.0

Slides: 30

Provided by: scott506

Category:

more less

Transcript and Presenter's Notes

Title: An Introduction to Validity Arguments for Alternate Assessments

1
An Introduction to Validity Arguments for
Alternate Assessments

Scott Marion
Center for Assessment
Eighth Annual MARCES Conference
University of Maryland
October 11-12, 2007

2
Overview

A little validity background
Creating and evaluating a validity argumentor
translating Kane (and others) to AA-AAS
Can we make it practical?
A focus on validity in technical documentation

Validation is a lengthy, even endless process
(Cronbach, 1989, p.151)
Good for consultants, but not so great for state
folks and contractors
Are you nervous yet.

4
Validity Should be Central

We argue that the purpose of the technical
documentation is to provide data to support or
refute the validity of the inferences from the
alternate assessments at both the student and
program level

5
Unified Conception of Validity

Drawing on the work of Cronbach, Messick,
Shepard, and Kane the proposed evaluation of
technical quality is built around a unified
conception of validity
centered on the inferences related to the
construct including significant attention to the
social consequences of the assessment

But what is a validity argument and how do we
evaluate the validity of our inferences?

7
A little history

Kane traces the history of validity theory from
the criterion through the content model to the
construct model.
It is worth stopping briefly to discuss the
content model, because that appears to be where
many still appear to operate.

The content model interprets test scores based
on a sample of performances in some area of
activity as an estimate of overall level of skill
in that activity. The sample of items/tasks and
observed performances must be
representative of the domain,
evaluated appropriately and fairly, and
part of a large enough sample
So, this sounds good, right?

9
Concerns with the content model

Messick (1989) argued that content-based
validity evidence does not involve test scores or
the performances on which the scores are based
and therefore cannot be used to justify
conclusions about the interpretation of test
scores. (p. 17)
Huh? More simplycontent evidence is a matching
exercise and doesnt really help us get at the
interpretations we make from scores
Is it useful? Sure, but with the intense focus
on alignment these days, content evidence appears
to be privileged compared with trying to create
arguments for the meaning of test scores

10
The Construct Model

We can trace this evolution from Cronbach and
Meehl (1955) through Loevinger (1957) to Cronbach
(1971) and culminating in Messick 1989)
Focused attention on the many factors associated
with the interpretations and uses of test scores
(and not simply with correlations)
Emphasized the important effect of assumptions in
score interpretations and the need to check these
assumptions
Allowed for the possibility of alternative
explanations for test scoresin fact, this model
even encouraged falsification

11
Limitations of the Construct Model

Does not provide clear guidance for the
validation of a test score interpretation and/or
use
Did not help evaluators prioritize validity
studies
If, as Anastasi (1986) noted, almost any
information gathered in the process of developing
or using a test is relevant to its validity (p.
3), where should one start and how do you know
when youre done or are you ever done?

12
Transitioning to argument

The call for careful examination of alternative
explanations within the construct model is
helpful for directing a program of validity
research

13
Kanes argument-based framework

assumes that the proposed interpretations and
uses will be explicitly stated as an argument, or
network of inferences and supporting assumptions,
leading from observations to the conclusions and
decisions. Validation involves an appraisal of
the coherence of this argument and of the
plausibility of its inferences and assumptions
(Kane, 2006, p. 17).
Sounds easy, right

14
Two Types of Arguments

An interpretative argument specifies the proposed
interpretations and uses of test results by
laying out the network of inferences and
assumptions leading to the observed performances
to the conclusions and decisions based on the
performances
The validity argument provides an evaluation of
the interpretative argument (Kane, 2006)

Kanes approach provides a more pragmatic
approach to validation, involving the
specification of proposed interpretations and
uses, the development of a measurement procedure
that is consistent with this proposal, and a
critical evaluation of the coherence of the
proposal and the plausibility of its inferences
and assumptions.
The challenge is that most assessments do not
start from an explicit attention to validity in
the design phase

16
The Interpretative Argument

Essentially a mini-theorythe interpretative
argument provides a framework for interpretation
and use of test scores
Like theory, the interpretative argument guides
the data collection and methods and most
importantly, theories are falsifiable as we
critically evaluate the evidence and arguments

17
Two stages of the interpretative argument

Development stagefocus on development of
measurement tools and procedures as well as the
corresponding interpretative argument
An appropriate confirmationist bias in this stage
since the developers (state and contractors) are
trying to make the program the best it can be
Appraisal stagefocus on critical evaluation of
the interpretative argument
Should be more neutral and arms-length to
provide a more convincing evaluation of the
proposed interpretations and uses
Falsification, obviously, is something we prefer
to do unto the constructions of others
(Cronbach, 1989, p. 153)

18
Interpretative argument

Difficulty in specifying an interpretative
argumentmay indicate a fundamental problem. If
it is not possible to come up with a test plan
and plausible rational for a proposed
interpretation and use, it is not likely that
this interpretation and use will be considered
valid (Kane, 2006, p. 26).
Think of the interpretative argument as a series
of if-then statements
E.g., if the student performs the task in a
certain way, then the observed score should have
a certain value

19
Criteria for Evaluating Interpretative Arguments

Clarityshould be clearly stated as a framework
for validation. Inferences and warrants
specified in enough detail to make proposed
claims explicit.
Coherenceassuming the individual inferences are
plausible, the network of inferences leading from
the observations to conclusions and decisions
make sense
Plausibilityparticularly of assumptions, are
judged in terms of all the evidence for and
against them

One of the most effective challenges to
interpretative arguments (or scientific theories)
is to propose and substantiate an alternative
argument that is more plausible
With AA-AAS we have to seriously consider and
challenge ourselves with competing alternative
explanations for test scores, for example
higher scores on our states AA-AAS reflects
greater learning of the content frameworks OR
higher scores on our states AA-AAS reflects
higher levels of student functioning

21
Categories of interpretative arguments (Kane,
2006)

Trait interpretations
Theory-based interpretations
Qualitative interpretations
Decision procedures
Like scientific theories, the specific type of
interpretative argument for test-based inferences
guides models, data collection, assumptions,
analyses, and claims

22
Decision Procedures

Evaluating a decision procedure requires an
evaluation of values and consequences
To evaluate a testing program as an instrument
of policy e.g., AA-AAS under NCLB, it is
necessary to evaluate its consequences (Kane,
2006, p.53)
Therefore, values inherent in the testing program
must be made explicit and the consequences of the
decisions as a result of test scores must be
evaluated!

23
Prioritizing and Focusing

Shepard (1993) advocated a straightforward means
to prioritize validity questions. Using an
evaluation framework, she proposed that validity
studies be organized in response to the
questions
What does the testing practice claim to do
What are the arguments for and against the
intended aims of the test and
What does the test do in the system other than
what it claims, for good or bad? (Shepard, 1993,
p. 429).
The questions are directed to concerns about the
construct, relevance, interpretation, and social
consequences, respectively.

24
A heuristic to help organize and focus the
validity evaluation (Marion, Quenemoen, Kearns,
2006)

VALIDITY EVALUATION
Empirical Evidence
Theory and Logic (argument)
Consequential Features

Reporting
Alignment
Item Analysis/DIF/Bias
Measurement Error
Scaling and Equating
Standard Setting

Assessment System
Test Development
Administration
Scoring

Student Population
Academic Content
Theory of Learning

25
Synthesizing and Integrating

Haertel (1999) reminded us that the individual
pieces of evidence (typically presented in
separate chapters of technical documents) do not
make the assessment system valid or not, it is
only by synthesizing this evidence in order to
evaluate the interpretative argument can we judge
the validity of the assessment program.

26
NHEAI/NAAC Technical Documentation

The Nuts and Bolts
The Validity Evaluation
The Stakeholder Summary
The Transition Document

27
The Validity Evaluation

Author Independent contractor with considerable
input from state DOE
Audience State policy makers, state DOE,
district assessment and special education
directors, state TAC members, special education
teachers, and other key stakeholders. This also
will contribute to the legal defensibility of the
system.
Notes This will be a dynamic volume where new
evidence is collected and evaluated over time.

28
Table of Contents

Overview of the Assessment System
Who are the students?
What is the content?
Introduction of the Validity Framework and
Argument
Empirical Evidence
Evaluating the Validity Argument

29
Chapter VI The Validity Evaluation

Revisiting the interpretative argument
Logical/theoretical relationships among the
content, students, learning, and
assessmentrevisiting the assessment triangle
The specific validity evaluation questions
addressed in this volume
Synthesizing and weighing the various sources of
evidence
Arguments for the validity of the system
Arguments against the validity of the system
An overall judgment about the defensibility of
inferences from the scores of the AA-AAS in the
context of specific uses and purposes