Title: Establishing the Reliability and Validity of Outcomes Assessment Measures
1Establishing the Reliability and Validity of
Outcomes Assessment Measures
- Anthony R. Napoli, PhD
- Lanette A. Raymond, MA
- Office of Institutional Research Assessment
- Suffolk County Community College
- http//sccaix1.sunysuffolk.edu/Web/Central/IT/Inst
Research/
2Validity defined
- The validity of a measure indicates to what
extent items measure some aspect of what they are
purported to measure
3Types of Validity
- Face Validity
- Content Validity
- Construct Validity
- Criterion-Related Validity
4Face Validity
- It looks like a test of
- Not validity in a technical sense
5Content Validity
- Incorporates quantitative estimates
- Domain Sampling
- The simple summing or averaging of dissimilar
items is inappropriate
6Construct Validity
- Indicated by correspondence of scores to other
known valid measures of the underlying
theoretical trait
- Discriminant Validity
- Convergent Validity
7Criterion-Related Validity
- Represents performance in relation to particular
tasks of discrete cognitive or behavioral
objectives
- Predictive Validity
- Concurrent Validity
8Reliability defined
- The reliability of a measure indicates the degree
to which an instrument consistently measures a
particular skill, knowledge base, or construct - Reliability is a precondition for validity
9Types of Reliability
- Inter-rater (scorer) reliability
- Inter-item reliability
- Test-retest reliability
- Split-half alternate forms reliability
10Validity Reliability in Plain English
- Assessment results must represent the
institution, program, or course - Evaluation of the validity and reliability of the
assessment instrument and/or rubric will provide
the documentation that it does
11Content Validity for Subjective Measures
- The learning outcomes represent the
program/course (domain sampling) - The instrument addresses the learning outcomes
- There is a match between the instrument and the
rubric - Rubric scores can be applied to the learning
outcomes, and indicate the degree of student
achievement within the program/course
12Inter-Scorer Reliability
- Rubric scores can be obtained and applied to the
learning outcomes, and indicate the degree of
student achievement within the program/course
consistently
13Content Validity for Objective Measures
- The learning outcomes represent the
program/course - The items on the instrument address specific
learning outcomes - Instrument scores can be applied to the learning
outcomes, and indicate the degree of student
achievement within the program/course
14Inter-Item Reliability
- Items that measure the same learning outcomes
should consistently exhibit similar scores
15Content Validity (CH19)
- Description
- Write and decipher chemical nomenclature
- Solve both quantitative and qualitative problems
- Balance equations and solve mathematical problems
associated w/ balanced equations - Demonstrate an understanding intra-molecular
forces
A 12-item test measured students mastery of the
objectives
16Content Validity (CH19)
17Content Validity (SO11)
- Description
- Identify the basic methods of data collection
- Demonstrate an understanding of basic
sociological concepts and social processes that
shape human behavior - Apply sociological theories to current social
issues
A 30-item test measured students mastery of the
objectives
18Content Validity (SO11)
19Content Validity (SO11)
20Inter-Rater ReliabilityFine Arts Portfolio
- Drawing
- Design
- Technique
- Creativity
- Artistic Process
- Aesthetic Criteria
- Growth
- Portfolio Presentation
- Scale
- 5 Excellent
- 4 Very Good
- 3 Satisfactory
- 2 Unsatisfactory
- 1 Unacceptable
21Inter-Rater ReliabilityFine Arts Portfolio
22Inter-Item Reliability (PC11)
- Objective Description
- Demonstrate a satisfactory knowledge of
- 1. the history, terminology, methods, ethics in
psychology - 2. concepts associated with the 5 major schools
of psychology - 3. the basic aspects of human behavior including
learning and memory, personality, physiology,
emotion, etc - 4. an ability to obtain and critically analyze
research in the field of modern psychology
A 20-item test measured students mastery of the
objectives
23Inter-Item Reliability (PC11)
- Embedded-questions methodology
- Inter-item or internal consistency reliability
KR-20, rtt .71
- Mean score 12.478
- Std Dev 3.482
- Std Error 0.513
- Mean grade 62.4
24Inter-Item Reliability (PC11)Motivational
Comparison
- 2 Groups
- Graded Embedded Questions
- Non-Graded Form Motivational Speech
- Mundane Realism
25Inter-Item Reliability (PC11)Motivational
Comparison
- Graded condition produces higher scores (t(78)
5.62, p lt .001). - Large effect size (d 1.27).
26Inter-Item Reliability (PC11)Motivational
Comparison
- Minimum competency 70 or better
- Graded condition produces greater competency (Z
5.69, p lt .001).
27Inter-Item Reliability (PC11)Motivational
Comparison
- In the non-graded condition this measure is
neither reliable nor valid - KR-20N-g 0.29
28Criterion-Related Concurrent Validity (PC11)
29I am ill at these numbers. -- Hamlet --
30When you can measure what you are speaking about
and express it in numbers, you know something
about it but when you cannot express it in
numbers, your knowledge is of a meager and
unsatisfactory kind. -- Lord Kelvin
--
There are three kinds of lies lies, damned
lies, and statistics. --
Benjamin Disraeli --