Title: Validity/Reliability Matters
1Validity/Reliability Matters
2Can a test be valid and not be reliable?
3Can a test be reliable and not be valid?
4JustifiableRelevantTrue to its
purpose(consistently)
5Validity
- Design Issues
- Application Issues
6Validity
- Design Issues
- Application Issues
7Design Creating the Instrument
8Inference
Low
High
9HighInference
- To draw a conclusion
- To guess, surmise
- To suggest, hint
10LowInference
- Straightforward
- Language precise targeted
- Clear no competing interpretations of words
- No doubt as to what point is being made
11Inference
Low
High
12Complexity
Low
High
13HighComplexity
- Complicated
- Comprised of interrelated parts or sections
- Developed with great care or with much detail
14LowComplexity
- Simplistic
- Plain
- Unsophisticated
15Complexity
Low
High
16How They Are Related
Low
Complexity
High
High
Low
Inference
17Designing the Instrument
Low
Complexity
High
High
Low
Inference
18Due Yesterday!
Low
Complexity
High
High
Low
Inference
19Overachieving
Low
Complexity
High
High
Low
Inference
20How Much Error Are You Willing to Risk?
Low
Error
Complexity
Error
High
High
Low
Inference
21Compromise
Low
Complexity
High
High
Low
Inference
22Does the OBSERVED Behavior True Behavior? Obser
ved SCORE ? TRUE SCORE
E R R O R
23Design Creating the Instrument
- General Rubric - high
- Qualitative analytic rubric low
- Easy to develop question worthiness, guidance,
single interpretation - low - Time to develop labor intensive, onerous, long
- high
24Validity
- Design Issues
- Application Issues
25Application Issues
- Designated Use
- Limitations/Conditions
26Application Issues
- Designated Use
- Dont borrow from neighbor!
27Application Issues
- Limitations/Conditions
- One size does not fit all or apply to all
circumstances
28Ways to Increase Probability for Accuracy
- Compare language standards concepts
- The concepts/expectations in the standards are
apparent in the assessments same depth and
breadth - Good example of Content Validity
- Behavior (performance) expected in the standard
matches the performance expected in the
assessment i.e., knowledge ofdemonstrating
skill - Identify Key/Critical items/concepts to evaluate
- Give it away for analysis (many eyes)
- Invite external expert review
- Be receptive to feedback
- Surveys from P-12 partners, candidates
- Regular evaluation and analysis revise, revise,
revise - Awareness of design and application issues
29Ways to Increase Reliability
- Begin with a valid instrument
- Two reliability issues
- Reliability of the instrument repeated use of
instrument by same evaluators - If problematic revise, re-think, abandon
- Reliability of the scoring performance rated
same by different evaluators, i.e., objectivity - If problematic ensure qualifications of
evaluators, check rubric, check language,
minimize generalized concepts applied to all
subject areas - Train evaluators frequently
30AN APPLICATION A KSU Workshop (Handouts
Available)
- Thirty experienced teachers participated in a
daylong workshop to help us evaluate three
student teaching observation rating forms.
31Three Instruments
- Traditional Candidate Performance Instrument
(CPI) Observation of Student Teaching. Observer
is asked to indicate strengths and weaknesses and
areas for improvement in three broad outcomes
(Subject matter, Facilitation of Learning, and
Collaborative Professional). - Modified CPI Observation of Student Teaching
(Observer is asked to explicitly rate each
proficiency within each outcome and then provide
narrative indicating any strengths, weaknesses,
suggestions for improvement. - Formative Analysis Class Keys Observer is asked
to rate 26 elements from Georgia Department of
Educations Class Keys. No required narrative.
32Generally we were interested in two areas.
- Validity/Accuracy Which instrument provides us
the best inference about the present of positive
behaviors (proficiencies) we deem important? AND - Reliability/Consistency Which instrument
demonstrates the best inter-rater reliability?
33Study Design
Instrument Group 1 Group 2 Group 3
Period 1Traditional CPI-Narrative Video A Video B Video C
Period 2 Modified CPI Rating and Narrative Video B Video C Video A
Period 3 Class Key Formative Analysis Video C Video A Video B
34Reliability
- Strongest inter-rater agreement between Modified
CPI with performance level rating followed by
Class Keys Formative Assessment Instrument with a
performance level rating. - Very little agreement between behaviors noted in
Traditional CPI narratives and no performance
level ratings were available. Probably not a
reliable instrument for rating student teaching
behaviors.
35Validity
- Both the traditional CPI and Modified CPI are
explicitly aligned with institutional (and other)
standards but the Traditional CPI is a global
assessment and the Modified CPI requires a rating
and narrative for each proficiency. - However, the traditional CPI has not demonstrated
reliability.so - Participants were also asked to provide
information about the language, clarity, ease of
use for all instruments.