An Introduction to Validity Arguments for Alternate Assessments - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

An Introduction to Validity Arguments for Alternate Assessments

Description:

Creating and evaluating a validity argument...or translating Kane (and others) ... organize and focus the validity evaluation (Marion, Quenemoen, & Kearns, 2006) ... – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 30
Provided by: scott506
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to Validity Arguments for Alternate Assessments


1
An Introduction to Validity Arguments for
Alternate Assessments
  • Scott Marion
  • Center for Assessment
  • Eighth Annual MARCES Conference
  • University of Maryland
  • October 11-12, 2007

2
Overview
  • A little validity background
  • Creating and evaluating a validity argumentor
    translating Kane (and others) to AA-AAS
  • Can we make it practical?
  • A focus on validity in technical documentation

3
  • Validation is a lengthy, even endless process
  • (Cronbach, 1989, p.151)
  • Good for consultants, but not so great for state
    folks and contractors
  • Are you nervous yet.

4
Validity Should be Central
  • We argue that the purpose of the technical
    documentation is to provide data to support or
    refute the validity of the inferences from the
    alternate assessments at both the student and
    program level

5
Unified Conception of Validity
  • Drawing on the work of Cronbach, Messick,
    Shepard, and Kane the proposed evaluation of
    technical quality is built around a unified
    conception of validity
  • centered on the inferences related to the
    construct including significant attention to the
    social consequences of the assessment

6
  • But what is a validity argument and how do we
    evaluate the validity of our inferences?

7
A little history
  • Kane traces the history of validity theory from
    the criterion through the content model to the
    construct model.
  • It is worth stopping briefly to discuss the
    content model, because that appears to be where
    many still appear to operate.

8
  • The content model interprets test scores based
    on a sample of performances in some area of
    activity as an estimate of overall level of skill
    in that activity. The sample of items/tasks and
    observed performances must be
  • representative of the domain,
  • evaluated appropriately and fairly, and
  • part of a large enough sample
  • So, this sounds good, right?

9
Concerns with the content model
  • Messick (1989) argued that content-based
    validity evidence does not involve test scores or
    the performances on which the scores are based
    and therefore cannot be used to justify
    conclusions about the interpretation of test
    scores. (p. 17)
  • Huh? More simplycontent evidence is a matching
    exercise and doesnt really help us get at the
    interpretations we make from scores
  • Is it useful? Sure, but with the intense focus
    on alignment these days, content evidence appears
    to be privileged compared with trying to create
    arguments for the meaning of test scores

10
The Construct Model
  • We can trace this evolution from Cronbach and
    Meehl (1955) through Loevinger (1957) to Cronbach
    (1971) and culminating in Messick 1989)
  • Focused attention on the many factors associated
    with the interpretations and uses of test scores
    (and not simply with correlations)
  • Emphasized the important effect of assumptions in
    score interpretations and the need to check these
    assumptions
  • Allowed for the possibility of alternative
    explanations for test scoresin fact, this model
    even encouraged falsification

11
Limitations of the Construct Model
  • Does not provide clear guidance for the
    validation of a test score interpretation and/or
    use
  • Did not help evaluators prioritize validity
    studies
  • If, as Anastasi (1986) noted, almost any
    information gathered in the process of developing
    or using a test is relevant to its validity (p.
    3), where should one start and how do you know
    when youre done or are you ever done?

12
Transitioning to argument
  • The call for careful examination of alternative
    explanations within the construct model is
    helpful for directing a program of validity
    research

13
Kanes argument-based framework
  • assumes that the proposed interpretations and
    uses will be explicitly stated as an argument, or
    network of inferences and supporting assumptions,
    leading from observations to the conclusions and
    decisions. Validation involves an appraisal of
    the coherence of this argument and of the
    plausibility of its inferences and assumptions
    (Kane, 2006, p. 17).
  • Sounds easy, right

14
Two Types of Arguments
  • An interpretative argument specifies the proposed
    interpretations and uses of test results by
    laying out the network of inferences and
    assumptions leading to the observed performances
    to the conclusions and decisions based on the
    performances
  • The validity argument provides an evaluation of
    the interpretative argument (Kane, 2006)

15
  • Kanes approach provides a more pragmatic
    approach to validation, involving the
    specification of proposed interpretations and
    uses, the development of a measurement procedure
    that is consistent with this proposal, and a
    critical evaluation of the coherence of the
    proposal and the plausibility of its inferences
    and assumptions.
  • The challenge is that most assessments do not
    start from an explicit attention to validity in
    the design phase

16
The Interpretative Argument
  • Essentially a mini-theorythe interpretative
    argument provides a framework for interpretation
    and use of test scores
  • Like theory, the interpretative argument guides
    the data collection and methods and most
    importantly, theories are falsifiable as we
    critically evaluate the evidence and arguments

17
Two stages of the interpretative argument
  • Development stagefocus on development of
    measurement tools and procedures as well as the
    corresponding interpretative argument
  • An appropriate confirmationist bias in this stage
    since the developers (state and contractors) are
    trying to make the program the best it can be
  • Appraisal stagefocus on critical evaluation of
    the interpretative argument
  • Should be more neutral and arms-length to
    provide a more convincing evaluation of the
    proposed interpretations and uses
  • Falsification, obviously, is something we prefer
    to do unto the constructions of others
  • (Cronbach, 1989, p. 153)

18
Interpretative argument
  • Difficulty in specifying an interpretative
    argumentmay indicate a fundamental problem. If
    it is not possible to come up with a test plan
    and plausible rational for a proposed
    interpretation and use, it is not likely that
    this interpretation and use will be considered
    valid (Kane, 2006, p. 26).
  • Think of the interpretative argument as a series
    of if-then statements
  • E.g., if the student performs the task in a
    certain way, then the observed score should have
    a certain value

19
Criteria for Evaluating Interpretative Arguments
  • Clarityshould be clearly stated as a framework
    for validation. Inferences and warrants
    specified in enough detail to make proposed
    claims explicit.
  • Coherenceassuming the individual inferences are
    plausible, the network of inferences leading from
    the observations to conclusions and decisions
    make sense
  • Plausibilityparticularly of assumptions, are
    judged in terms of all the evidence for and
    against them

20
  • One of the most effective challenges to
    interpretative arguments (or scientific theories)
    is to propose and substantiate an alternative
    argument that is more plausible
  • With AA-AAS we have to seriously consider and
    challenge ourselves with competing alternative
    explanations for test scores, for example
  • higher scores on our states AA-AAS reflects
    greater learning of the content frameworks OR
  • higher scores on our states AA-AAS reflects
    higher levels of student functioning

21
Categories of interpretative arguments (Kane,
2006)
  • Trait interpretations
  • Theory-based interpretations
  • Qualitative interpretations
  • Decision procedures
  • Like scientific theories, the specific type of
    interpretative argument for test-based inferences
    guides models, data collection, assumptions,
    analyses, and claims

22
Decision Procedures
  • Evaluating a decision procedure requires an
    evaluation of values and consequences
  • To evaluate a testing program as an instrument
    of policy e.g., AA-AAS under NCLB, it is
    necessary to evaluate its consequences (Kane,
    2006, p.53)
  • Therefore, values inherent in the testing program
    must be made explicit and the consequences of the
    decisions as a result of test scores must be
    evaluated!

23
Prioritizing and Focusing
  • Shepard (1993) advocated a straightforward means
    to prioritize validity questions. Using an
    evaluation framework, she proposed that validity
    studies be organized in response to the
    questions
  • What does the testing practice claim to do
  • What are the arguments for and against the
    intended aims of the test and
  • What does the test do in the system other than
    what it claims, for good or bad? (Shepard, 1993,
    p. 429).
  • The questions are directed to concerns about the
    construct, relevance, interpretation, and social
    consequences, respectively.

24
A heuristic to help organize and focus the
validity evaluation (Marion, Quenemoen, Kearns,
2006)
  • VALIDITY EVALUATION
  • Empirical Evidence
  • Theory and Logic (argument)
  • Consequential Features
  • Reporting
  • Alignment
  • Item Analysis/DIF/Bias
  • Measurement Error
  • Scaling and Equating
  • Standard Setting
  • Assessment System
  • Test Development
  • Administration
  • Scoring
  • Student Population
  • Academic Content
  • Theory of Learning

25
Synthesizing and Integrating
  • Haertel (1999) reminded us that the individual
    pieces of evidence (typically presented in
    separate chapters of technical documents) do not
    make the assessment system valid or not, it is
    only by synthesizing this evidence in order to
    evaluate the interpretative argument can we judge
    the validity of the assessment program.

26
NHEAI/NAAC Technical Documentation
  • The Nuts and Bolts
  • The Validity Evaluation
  • The Stakeholder Summary
  • The Transition Document

27
The Validity Evaluation
  • Author Independent contractor with considerable
    input from state DOE
  • Audience State policy makers, state DOE,
    district assessment and special education
    directors, state TAC members, special education
    teachers, and other key stakeholders. This also
    will contribute to the legal defensibility of the
    system.
  • Notes This will be a dynamic volume where new
    evidence is collected and evaluated over time.

28
Table of Contents
  1. Overview of the Assessment System
  2. Who are the students?
  3. What is the content?
  4. Introduction of the Validity Framework and
    Argument
  5. Empirical Evidence
  6. Evaluating the Validity Argument

29
Chapter VI The Validity Evaluation
  • Revisiting the interpretative argument
  • Logical/theoretical relationships among the
    content, students, learning, and
    assessmentrevisiting the assessment triangle
  • The specific validity evaluation questions
    addressed in this volume
  • Synthesizing and weighing the various sources of
    evidence
  • Arguments for the validity of the system
  • Arguments against the validity of the system
  • An overall judgment about the defensibility of
    inferences from the scores of the AA-AAS in the
    context of specific uses and purposes
Write a Comment
User Comments (0)
About PowerShow.com