Some Concepts in Evidence Evaluation - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Some Concepts in Evidence Evaluation

Description:

Thus, the nature of the construct guides the selection or construction of ... Provide rationales for actions as large menu-driven faux insurance form ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 21
Provided by: bobmi9
Category:

less

Transcript and Presenter's Notes

Title: Some Concepts in Evidence Evaluation


1
Some Concepts in Evidence Evaluation
  • Robert J. Mislevy
  • University of Maryland
  • October 10, 2003

2
Messick (1994) quote
  • Begin by asking what complex of knowledge,
    skills, or other attribute should be assessed...
  • Next, what behaviors or performances should
    reveal those constructs and what tasks or
    situations should elicit those behaviors?
  • Thus, the nature of the construct guides the
    selection or construction of relevant tasks as
    well as the rational development of
    construct-based scoring criteria and rubrics.

3
Evidence-centered design models
Task model includes specifications for work
product that will be captured
Task model includes specifications conditions
for performance
4
Evidence-centered design models
Evaluation rules specify what observables are and
how they are determined from work product
Statistical portion of evidence model(s)
explicates how which observables depend on which
SM variables
5
Key Concepts (1)
  • Conceptual vs. Mechanical distinction
  • Domain modeling vs. CAF
  • Product vs. Process
  • E.g., HYDRIVE, multiple choice, math problems
  • High Inference vs. Low Inference
  • High AP Studio Art
  • Low Multiple-choice, carrying out DISC scoring
    rules
  • Automated vs. Human Scoring

6
Key Concepts (2)
  • The role of rubrics
  • Rubrics are instructions for humans
  • The roles of examples
  • Rubrics not enough for high inference evaluation
  • Important to not only raters, but students
    teachers
  • Importance of communicating the rules of the
    game to the examinee
  • (Note relevance of sociocultural perspective)

7
Rubrics for two observable variables in the BEAR
assessment Issues, Evidence, and You.Mislevy,
Wilson, Erkican, Chudowsky (2003)
Psychometric principles and student assessment
8
What is performance assessment?
  • The new kinds of tasks are distinguished from MC
    tasks in a number of ways, some of which are
    present in some so-called performance tasks but
    not others (Wiley Haertel, p. 63)
  • More complex, longer to perform.
  • Attempt to measure multiple, complex, integrated
    knowledge capabilities.
  • Tasks nowhere near interchangeable. (Require
    methods extracting multiple bits of evidence from
    single performances and integrating them across
    tasks into complex aggregates)

9
Possible loci of interest
  • Complex interactions between examinee
    assessment?
  • Extended, multi-part activities? (NBPTS)

10
Possible loci of interest
  • Complex work product captured? (AP Art)
  • Info about process as well as production
    passed on to EI? (HYDRIVE)

11
Possible loci of interest
  • Complex process--more than objective
    scoring--to evaluate work product?
  • Human judgment (AP Art), automated process
    (Clauser et al. re NBME)?
  • Importance of washback effect (Frederiksen
    Collins Wolf et al.)

12
Possible loci of interest
  • More than just right/wrong observable variables?
    (AP Art rating scales)
  • Multiple aspects of complex performance captured?
  • (language testing of speaking fluency
    accuracy, which trade off)

13
Possible loci of interest
  • Multivariate student model, with different
    aspects of skill
  • knowledge informed by different observables?
    (WH emphasis
  • our examples include Hydrive DISC)

14
The DISC Student Model
Student model variables-- of persisting interest
over multiple tasks
15
The Statistical Part of a DISC Evidence Model
SM variables involved in scenarios written to
this task model
Variable to account for conditional dependence
among observables that are evaluations of aspects
from the same complex performance
Observables that evaluate key aspects of
performance in scenarios written from this task
model (human or automated)
16
What does the DISC simulator presentation process
capture as work products?
  • Examinees can (with varying degrees of accuracy
    and completeness)
  • Choose procedures which provide information or
    produce an observable effect
  • Provide rationales for actions as large
    menu-driven faux insurance form
  • Identify important patient characteristics used
    to guide treatment, again from large menu-driven
    faux insurance form

17
How is evidence evaluated given the examinees
performance?
Rules to evaluate essential characteristics of
examinee behavior
Example 1 Adequacy of examination procedures 1.
If the Rationale Product contains Chief
complaint Health history review THEN Adequacy
of history procedures performed all essential
history procedures ELSE If the
Rationale Product contains One, but not both,
essential procedure THEN Adequacy of history
procedures performed some essential history
procedures ELSE If the Rationale
Product contains Neither essential procedure
THEN Adequacy of history procedures did not
perform essential history procedures
18
How is evidence evaluated given the examinees
performance?
Rules to evaluate essential characteristics of
examinee behavior
Example 2 Individualization of procedures 1. If
the Rationale Product contains Follow up
questions duration of canker sore when gums
bleed weight loss Dentition assessment
visual with mirror Periodontal assessment
visual with mirror THEN Individualization of
procedures Performed all essential
individualized procedures If the Rationale
Product contains 50-80 of individualized
procedures THEN Individualization of
procedures Performed some essential
individualized procedures If the Rationale
Product contains lt50 of individualized
procedures THEN Individualization of
procedures Did not perform essential
individualized procedures
19
Docking an Evidence Model
Student Model
Evidence Model
20
Wiley Haertelon designing scoring rubrics
  • Deciding what skills abilities are to be
    measured.
  • Deciding what aspects or subtasks of the task
    bear on those abilities.
  • Assuring that the recording of performance
    adequately reflects those aspects or subtasks
    adequacy of work product.
  • Designing rubrics for those aspects or subtasks.
  • Creating procedures for merging aspect and
    subtask scores into a final set of scores
    organized according to the skills or abilities
    set forth as the intents of measurement.
    (WH, 1996, p. 79)
Write a Comment
User Comments (0)
About PowerShow.com