Some Concepts in Evidence Evaluation - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Some Concepts in Evidence Evaluation

Description:

Thus, the nature of the construct guides the selection or construction of ... Provide rationales for actions as large menu-driven faux insurance form ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 21

Provided by: bobmi9

Category:

more less

Transcript and Presenter's Notes

Title: Some Concepts in Evidence Evaluation

1
Some Concepts in Evidence Evaluation

Robert J. Mislevy
University of Maryland
October 10, 2003

2
Messick (1994) quote

Begin by asking what complex of knowledge,
skills, or other attribute should be assessed...
Next, what behaviors or performances should
reveal those constructs and what tasks or
situations should elicit those behaviors?
Thus, the nature of the construct guides the
selection or construction of relevant tasks as
well as the rational development of
construct-based scoring criteria and rubrics.

3
Evidence-centered design models
Task model includes specifications for work
product that will be captured
Task model includes specifications conditions
for performance
4
Evidence-centered design models
Evaluation rules specify what observables are and
how they are determined from work product
Statistical portion of evidence model(s)
explicates how which observables depend on which
SM variables
5
Key Concepts (1)

Conceptual vs. Mechanical distinction
Domain modeling vs. CAF
Product vs. Process
E.g., HYDRIVE, multiple choice, math problems
High Inference vs. Low Inference
High AP Studio Art
Low Multiple-choice, carrying out DISC scoring
rules
Automated vs. Human Scoring

6
Key Concepts (2)

The role of rubrics
Rubrics are instructions for humans
The roles of examples
Rubrics not enough for high inference evaluation
Important to not only raters, but students
teachers
Importance of communicating the rules of the
game to the examinee
(Note relevance of sociocultural perspective)

7
Rubrics for two observable variables in the BEAR
assessment Issues, Evidence, and You.Mislevy,
Wilson, Erkican, Chudowsky (2003)
Psychometric principles and student assessment
8
What is performance assessment?

The new kinds of tasks are distinguished from MC
tasks in a number of ways, some of which are
present in some so-called performance tasks but
not others (Wiley Haertel, p. 63)
More complex, longer to perform.
Attempt to measure multiple, complex, integrated
knowledge capabilities.
Tasks nowhere near interchangeable. (Require
methods extracting multiple bits of evidence from
single performances and integrating them across
tasks into complex aggregates)

9
Possible loci of interest

Complex interactions between examinee
assessment?
Extended, multi-part activities? (NBPTS)

10
Possible loci of interest

Complex work product captured? (AP Art)
Info about process as well as production
passed on to EI? (HYDRIVE)

11
Possible loci of interest

Complex process--more than objective
scoring--to evaluate work product?
Human judgment (AP Art), automated process
(Clauser et al. re NBME)?
Importance of washback effect (Frederiksen
Collins Wolf et al.)

12
Possible loci of interest

More than just right/wrong observable variables?
(AP Art rating scales)
Multiple aspects of complex performance captured?
(language testing of speaking fluency
accuracy, which trade off)

13
Possible loci of interest

Multivariate student model, with different
aspects of skill
knowledge informed by different observables?
(WH emphasis
our examples include Hydrive DISC)

14
The DISC Student Model
Student model variables-- of persisting interest
over multiple tasks
15
The Statistical Part of a DISC Evidence Model
SM variables involved in scenarios written to
this task model
Variable to account for conditional dependence
among observables that are evaluations of aspects
from the same complex performance
Observables that evaluate key aspects of
performance in scenarios written from this task
model (human or automated)
16
What does the DISC simulator presentation process
capture as work products?

Examinees can (with varying degrees of accuracy
and completeness)
Choose procedures which provide information or
produce an observable effect
Provide rationales for actions as large
menu-driven faux insurance form
Identify important patient characteristics used
to guide treatment, again from large menu-driven
faux insurance form

17
How is evidence evaluated given the examinees
performance?
Rules to evaluate essential characteristics of
examinee behavior
Example 1 Adequacy of examination procedures 1.
If the Rationale Product contains Chief
complaint Health history review THEN Adequacy
of history procedures performed all essential
history procedures ELSE If the
Rationale Product contains One, but not both,
essential procedure THEN Adequacy of history
procedures performed some essential history
procedures ELSE If the Rationale
Product contains Neither essential procedure
THEN Adequacy of history procedures did not
perform essential history procedures
18
How is evidence evaluated given the examinees
performance?
Rules to evaluate essential characteristics of
examinee behavior
Example 2 Individualization of procedures 1. If
the Rationale Product contains Follow up
questions duration of canker sore when gums
bleed weight loss Dentition assessment
visual with mirror Periodontal assessment
visual with mirror THEN Individualization of
procedures Performed all essential
individualized procedures If the Rationale
Product contains 50-80 of individualized
procedures THEN Individualization of
procedures Performed some essential
individualized procedures If the Rationale
Product contains lt50 of individualized
procedures THEN Individualization of
procedures Did not perform essential
individualized procedures
19
Docking an Evidence Model
Student Model
Evidence Model
20
Wiley Haertelon designing scoring rubrics

Deciding what skills abilities are to be
measured.
Deciding what aspects or subtasks of the task
bear on those abilities.
Assuring that the recording of performance
adequately reflects those aspects or subtasks
adequacy of work product.
Designing rubrics for those aspects or subtasks.
Creating procedures for merging aspect and
subtask scores into a final set of scores
organized according to the skills or abilities
set forth as the intents of measurement.
(WH, 1996, p. 79)