Title: Creating Valid and Reliable Classroom Tests
1Creating Valid and Reliable Classroom Tests
- James A. Wollack, PhD
- John Siegler, PhD
- Taehoon Kang
- Craig S. Wells
- Testing Evaluation Services
2Creating Valid and Reliable Classroom
TestsSession IV Evaluating the Test
- Recap of Session III
- Item analysis overview
- Requesting TE analyses
- Completing the SRF form
- Explanation of output
- Item analysis and item revision exercise
- Question Answer Session
- Workshop Evaluation
3Recap of Session IIIWriting Essay and
Short-Answer Tests
- Rules for Writing Constructed-Response Items
- Scoring Considerations
- Developing Scoring Rubrics
- Group Exercise Developing Scoring Rubric
- Question Answer Session
4The Testing Cycle
- Typical classroom testing
- Item Development Test
Administration - Scoring
5The Testing Cycle
- Better classroom testing
- Item Development Test
Administration - Scoring
Test Blueprint
6The Testing Cycle
- Ideal model for classroom testing
- Test data should inform you about the
appropriateness of the content and the
effectiveness of the individual items in future
exams. - Students in your classes change, but assessment
is ongoing
7Item Evaluation
- People spend a lot of time developing items, but
too often dont analyze how well the items worked - Administering the test will provide lots of data
that can be used to study items. - Item analysis
- Provides breakdown of how different types of
students performed on various aspects of each
item. - Particularly useful for multiple-choice items
8Item Analysis Overview
- Item analysis can help answer the following
questions - How hard is this item?
- How well does performance on this item predict
overall achievement level? - Are students finding the item distractors
attractive? - Is the item confusing?
- Does the item have more than one right answer?
- For what type of student is this item ideal?
- Is the timing of the test appropriate?
9Sample Item Analysis for One Item
- PERCENT RESPONDING CORRECTLY BY QUINTILE
MATRIX RESPONDING BY QUINTILE - A B C D E O M
- 5TH
5TH 9 2 2 3 0 0 0 - 4TH 4TH
7 1 6 3 0 0 0 - 3RD 3RD 4 2 7 3 0 0 0
- 2ND 2ND 2 6 7 2 0 0 0
- 1ST 1ST
7 4 3 1 0 0 1 -
PROP
0.35 0.18 0.30 0.15 0.00 0.00 0.01 - 0 10 20 30 40 50 60
70 80 90 100 RPBI 0.18 -0.21 -0.07 0.
11 0.00 0.00 -0.09
- IA contains two parts
- picture on left
- matrix of numbers on right
10Left Hand Side of Item Analysis
- PERCENT RESPONDING CORRECTLY BY QUINTILE
- 5TH
- 4TH
- 3RD
- 2ND
- 1ST
-
- 0 10 20 30 40 50 60
70 80 90 100
Students are divided into quintile groups based
on total score Top quintile (5th) includes the
top 20 of the students 4th quintile includes
students in the 61st 80th percentiles ? 1s
t quintile includes students in the 1st 20th
percentiles
11Left Hand Side of Item Analysis
- PERCENT RESPONDING CORRECTLY BY QUINTILE
- 5TH
- 4TH
- 3RD
- 2ND
- 1ST
-
- 0 10 20 30 40 50 60
70 80 90 100
Percentage of students in each quintile group
answering item correctly Ideally these points
will form a straight line with relatively flat
slope i.e., large jumps in correct for each
unit increase in quintile Picture is often not
clean, particularly with fewer than 100
examinees. At a minimum, picture should have
positive slope Picture is a heuristic deviceuse
cautiously.
12Right Hand Side of Sample Item Analysis
- MATRIX RESPONDING BY QUINTILE
- A B C D E O M
- 5TH 9 2 2 3 0 0 0
- 4TH 7 1 6 3 0 0 0
- 3RD 4 2 7 3 0 0 0
- 2ND 2 6 7 2 0 0 0
- 1ST 7 4 3 1 0 0 1
- PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
- RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09
Students are again divided into quintile groups
based on total score
13Right Hand Side of Sample Item Analysis
- MATRIX RESPONDING BY QUINTILE
- A B C D E O M
- 5TH 9 2 2 3 0 0 0
- 4TH 7 1 6 3 0 0 0
- 3RD 4 2 7 3 0 0 0
- 2ND 2 6 7 2 0 0 0
- 1ST 7 4 3 1 0 0 1
- PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
- RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09
A E correspond to item alternatives O omits
(i.e., item not answered) M multiple (i.e.,
more than one answer selected)
14Right Hand Side of Sample Item Analysis
- MATRIX RESPONDING BY QUINTILE
- A B C D E O M
- 5TH 9 2 2 3 0 0 0
- 4TH 7 1 6 3 0 0 0
- 3RD 4 2 7 3 0 0 0
- 2ND 2 6 7 2 0 0 0
- 1ST 7 4 3 1 0 0 1
- PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
- RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09
Indicates the number of students in each quintile
group who selected each item alternative.
6 students in the 4th quintile selected
alternative C.
Want to see numbers decreasing from 5th to 1st
quintile for key, and increasing from 5th to 1st
quintile for distractors
15Right Hand Side of Sample Item Analysis
- MATRIX RESPONDING BY QUINTILE
- A B C D E O M
- 5TH 9 2 2 3 0 0 0
- 4TH 7 1 6 3 0 0 0
- 3RD 4 2 7 3 0 0 0
- 2ND 2 6 7 2 0 0 0
- 1ST 7 4 3 1 0 0 1
- PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
- RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09
Short for Proportion Indicates the proportion of
students in each column
PROP for correct answer (shown in brackets) is
referred to as the item difficulty PROP for
incorrect answers are called distractor
difficulties
16Right Hand Side of Sample Item Analysis
- MATRIX RESPONDING BY QUINTILE
- A B C D E O M
- 5TH 9 2 2 3 0 0 0
- 4TH 7 1 6 3 0 0 0
- 3RD 4 2 7 3 0 0 0
- 2ND 2 6 7 2 0 0 0
- 1ST 7 4 3 1 0 0 1
- PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
- RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09
Item difficulties range from 0.00 to 1.00. Hard
items have difficulties less than 0.35 Easy
items have difficulties above 0.85 Items that
are too hard or too easy will not contribute much
to the test reliability
17Right Hand Side of Sample Item Analysis
- MATRIX RESPONDING BY QUINTILE
- A B C D E O M
- 5TH 9 2 2 3 0 0 0
- 4TH 7 1 6 3 0 0 0
- 3RD 4 2 7 3 0 0 0
- 2ND 2 6 7 2 0 0 0
- 1ST 7 4 3 1 0 0 1
- PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
- RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09
Short for Point Biserial Correlation Indicates
the correlation between a students score on the
item alternative (1 selected, 0 not
selected), and their total score on the test
RPBI for correct answer is referred to as the
item discrimination RPBI for incorrect answers
are called distractor discriminations
18Item Discrimination
- Range from -1.0 to 1.0
- Interpreting the sign
- Positive values mean that students who selected
the alternative tended to have high scores and
students who did not select the alternative
tended to have low scores. - The RPBI for the key (i.e., item discrimination)
should be positive. - Negative values mean that students who selected
the alternative tended to have low scores and
students who did not select the alternative
tended to have high scores. - The RPBI for the distractors should be negative.
- Values near zero mean that there is no
relationship between that item alternative and
total score.
19Item Discrimination
- Range from -1.0 to 1.0
- Interpreting the magnitude
- Values of 1.0 (or -1.0) mean that there is a
perfect linear relationship between selecting the
alternative and total score. - Will never happen in practice.
- On classroom tests, discriminations rarely get
above .65 in absolute magnitude. - The higher the values, the better that choice is
able to discriminate between strong and weak
students.
20What Are We Looking For In An Item?
- Item Difficulty
- Ideally, should be between .35 and .85
- Items that are too easy or too hard will often
not discriminate well - Distractor Difficulties
- Should be at least .02
- Item Discrimination
- At least 0.20 for classroom exams
- Higher is better
- .30 or higher for standardized measures.
- Distractor Discriminations
- All should be negative
- The more negative, the better
- The larger the distractor difficulty, the
stronger the distractor discrimination should be - RPBI -0.05, PROP 0.08 OK
- RPBI -0.05, PROP 0.25 problem with
alternative
21Using Item Analyses to Guide Item Revision
- Items with negative or low positive RPBIs should
be either revised or deleted from item bank. - To understand how to revise, if at all, look at
distractor characteristics - Distractors with RPBIs that are either positive
or negative but too low considering the PROP,
should be replaced. - Consider replacing distractors that are selected
by too many or too few people - Dont change if the rest of item is working well
- For an item to be revised successfully, it is
often necessary to have at least one solid
distractor that will not be changed. - If either all distractors are poor, or none is
particularly strong, delete item and write a
brand new one. - Change only pieces of the item that caused
problems - If an item fails, is revised, and fails again,
delete it and write a new item.
22Right Hand Side of Sample Item Analysis
- MATRIX RESPONDING BY QUINTILE
- A B C D E O M
- 5TH 9 2 2 3 0 0 0
- 4TH 7 1 6 3 0 0 0
- 3RD 4 2 7 3 0 0 0
- 2ND 2 6 7 2 0 0 0
- 1ST 7 4 3 1 0 0 1
- PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
- RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09
Item Discrimination is lower than desired
Item is pretty hard
Alternative (D) has a positive discrimination
Alternative (C) has a low discrimination, given
its difficulty
Alternative (B) is working very well
Revision Decision certainly replace (D),
consider replacing (C) also.
23Requesting Item Analyses and Test Scoring
- Testing Evaluation Services
- 373 Educational Sciences Bldg.
- Pick up scanable answer sheets before testing
- Requesting Output
- Service Request Form (SRF) describes the nature
of your data and the types of output you want.
24Review of Item Analysis for Workshop Test
- Divide into 4 groups
- A Er with Jim in Front
- Es J with John in Middle
- K R with Taehoon in Back by entry
- S Z with Craig in Back on other side
25Questions?
26Thanks for Coming and ParticipatingWorkshop
Scheduled to run again in OctoberThanks to the
UW Teaching Academy
27Please Complete a Workshop Evaluation Form