Validity/Reliability Matters

About This Presentation

Title:

Validity/Reliability Matters

Description:

Validity/Reliability Matters Really? Beverly Mitchell, Kennesaw State University Reliability Strongest inter-rater agreement between Modified CPI with performance ... –

Number of Views:215

Avg rating:3.0/5.0

Slides: 36

Provided by: Bever90

Category:

more less

Transcript and Presenter's Notes

Title: Validity/Reliability Matters

1
Validity/Reliability Matters

Really?

2
Can a test be valid and not be reliable?
3
Can a test be reliable and not be valid?
4
JustifiableRelevantTrue to its
purpose(consistently)

Validity

5
Validity

Design Issues
Application Issues

6
Validity

Design Issues
Application Issues

7
Design Creating the Instrument

2-Complexity

1-Inference

8
Inference
Low
High
9
HighInference

To draw a conclusion
To guess, surmise
To suggest, hint

10
LowInference

Straightforward
Language precise targeted
Clear no competing interpretations of words
No doubt as to what point is being made

11
Inference
Low
High
12
Complexity
Low
High
13
HighComplexity

Complicated
Comprised of interrelated parts or sections
Developed with great care or with much detail

14
LowComplexity

Simplistic
Plain
Unsophisticated

15
Complexity
Low
High
16
How They Are Related
Low
Complexity
High
High
Low
Inference
17
Designing the Instrument
Low
Complexity
High
High
Low
Inference
18
Due Yesterday!
Low
Complexity
High
High
Low
Inference
19
Overachieving
Low
Complexity
High
High
Low
Inference
20
How Much Error Are You Willing to Risk?
Low
Error
Complexity
Error
High
High
Low
Inference
21
Compromise
Low
Complexity
High
High
Low
Inference
22
Does the OBSERVED Behavior True Behavior? Obser
ved SCORE ? TRUE SCORE
E R R O R
23
Design Creating the Instrument

1-Inference

2-Complexity

General Rubric - high
Qualitative analytic rubric low

Easy to develop question worthiness, guidance,
single interpretation - low
Time to develop labor intensive, onerous, long
- high

24
Validity

Design Issues
Application Issues

25
Application Issues

Designated Use
Limitations/Conditions

26
Application Issues

Designated Use
Dont borrow from neighbor!

27
Application Issues

Limitations/Conditions
One size does not fit all or apply to all
circumstances

28
Ways to Increase Probability for Accuracy

Compare language standards concepts
The concepts/expectations in the standards are
apparent in the assessments same depth and
breadth
Good example of Content Validity
Behavior (performance) expected in the standard
matches the performance expected in the
assessment i.e., knowledge ofdemonstrating
skill
Identify Key/Critical items/concepts to evaluate
Give it away for analysis (many eyes)
Invite external expert review
Be receptive to feedback
Surveys from P-12 partners, candidates
Regular evaluation and analysis revise, revise,
revise
Awareness of design and application issues

29
Ways to Increase Reliability

Begin with a valid instrument
Two reliability issues
Reliability of the instrument repeated use of
instrument by same evaluators
If problematic revise, re-think, abandon
Reliability of the scoring performance rated
same by different evaluators, i.e., objectivity
If problematic ensure qualifications of
evaluators, check rubric, check language,
minimize generalized concepts applied to all
subject areas
Train evaluators frequently

30
AN APPLICATION A KSU Workshop (Handouts
Available)

Thirty experienced teachers participated in a
daylong workshop to help us evaluate three
student teaching observation rating forms.

31
Three Instruments

Traditional Candidate Performance Instrument
(CPI) Observation of Student Teaching. Observer
is asked to indicate strengths and weaknesses and
areas for improvement in three broad outcomes
(Subject matter, Facilitation of Learning, and
Collaborative Professional).
Modified CPI Observation of Student Teaching
(Observer is asked to explicitly rate each
proficiency within each outcome and then provide
narrative indicating any strengths, weaknesses,
suggestions for improvement.
Formative Analysis Class Keys Observer is asked
to rate 26 elements from Georgia Department of
Educations Class Keys. No required narrative.

32
Generally we were interested in two areas.

Validity/Accuracy Which instrument provides us
the best inference about the present of positive
behaviors (proficiencies) we deem important? AND
Reliability/Consistency Which instrument
demonstrates the best inter-rater reliability?

33
Study Design
Instrument Group 1 Group 2 Group 3
Period 1Traditional CPI-Narrative Video A Video B Video C
Period 2 Modified CPI Rating and Narrative Video B Video C Video A
Period 3 Class Key Formative Analysis Video C Video A Video B
34
Reliability

Strongest inter-rater agreement between Modified
CPI with performance level rating followed by
Class Keys Formative Assessment Instrument with a
performance level rating.
Very little agreement between behaviors noted in
Traditional CPI narratives and no performance
level ratings were available. Probably not a
reliable instrument for rating student teaching
behaviors.

35
Validity

Both the traditional CPI and Modified CPI are
explicitly aligned with institutional (and other)
standards but the Traditional CPI is a global
assessment and the Modified CPI requires a rating
and narrative for each proficiency.
However, the traditional CPI has not demonstrated
reliability.so
Participants were also asked to provide
information about the language, clarity, ease of
use for all instruments.