Creating Valid and Reliable Classroom Tests - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Creating Valid and Reliable Classroom Tests

Description:

Creating Valid and Reliable Classroom Tests. Session IV: ... Writing Essay and Short-Answer Tests. Rules for Writing Constructed ... Biserial Correlation ... – PowerPoint PPT presentation

Number of Views:119

Avg rating:3.0/5.0

Slides: 28

Provided by: jimwo9

Category:

more less

Transcript and Presenter's Notes

Title: Creating Valid and Reliable Classroom Tests

1
Creating Valid and Reliable Classroom Tests

James A. Wollack, PhD
John Siegler, PhD
Taehoon Kang
Craig S. Wells
Testing Evaluation Services

2
Creating Valid and Reliable Classroom
TestsSession IV Evaluating the Test

Recap of Session III
Item analysis overview
Requesting TE analyses
Completing the SRF form
Explanation of output
Item analysis and item revision exercise
Question Answer Session
Workshop Evaluation

3
Recap of Session IIIWriting Essay and
Short-Answer Tests

Rules for Writing Constructed-Response Items
Scoring Considerations
Developing Scoring Rubrics
Group Exercise Developing Scoring Rubric
Question Answer Session

4
The Testing Cycle

Typical classroom testing
Item Development Test
Administration
Scoring

5
The Testing Cycle

Better classroom testing
Item Development Test
Administration
Scoring

Test Blueprint
6
The Testing Cycle

Ideal model for classroom testing

Test data should inform you about the
appropriateness of the content and the
effectiveness of the individual items in future
exams.
Students in your classes change, but assessment
is ongoing

7
Item Evaluation

People spend a lot of time developing items, but
too often dont analyze how well the items worked
Administering the test will provide lots of data
that can be used to study items.
Item analysis
Provides breakdown of how different types of
students performed on various aspects of each
item.
Particularly useful for multiple-choice items

8
Item Analysis Overview

Item analysis can help answer the following
questions
How hard is this item?
How well does performance on this item predict
overall achievement level?
Are students finding the item distractors
attractive?
Is the item confusing?
Does the item have more than one right answer?
For what type of student is this item ideal?
Is the timing of the test appropriate?

9
Sample Item Analysis for One Item

PERCENT RESPONDING CORRECTLY BY QUINTILE
MATRIX RESPONDING BY QUINTILE
A B C D E O M
5TH
5TH 9 2 2 3 0 0 0
4TH 4TH
7 1 6 3 0 0 0
3RD 3RD 4 2 7 3 0 0 0
2ND 2ND 2 6 7 2 0 0 0
1ST 1ST
7 4 3 1 0 0 1
PROP
0.35 0.18 0.30 0.15 0.00 0.00 0.01
0 10 20 30 40 50 60
70 80 90 100 RPBI 0.18 -0.21 -0.07 0.
11 0.00 0.00 -0.09

IA contains two parts
picture on left
matrix of numbers on right

10
Left Hand Side of Item Analysis

PERCENT RESPONDING CORRECTLY BY QUINTILE
5TH
4TH
3RD
2ND
1ST
0 10 20 30 40 50 60
70 80 90 100

Students are divided into quintile groups based
on total score Top quintile (5th) includes the
top 20 of the students 4th quintile includes
students in the 61st 80th percentiles ? 1s
t quintile includes students in the 1st 20th
percentiles
11
Left Hand Side of Item Analysis

PERCENT RESPONDING CORRECTLY BY QUINTILE
5TH
4TH
3RD
2ND
1ST
0 10 20 30 40 50 60
70 80 90 100

Percentage of students in each quintile group
answering item correctly Ideally these points
will form a straight line with relatively flat
slope i.e., large jumps in correct for each
unit increase in quintile Picture is often not
clean, particularly with fewer than 100
examinees. At a minimum, picture should have
positive slope Picture is a heuristic deviceuse
cautiously.
12
Right Hand Side of Sample Item Analysis

MATRIX RESPONDING BY QUINTILE
A B C D E O M
5TH 9 2 2 3 0 0 0
4TH 7 1 6 3 0 0 0
3RD 4 2 7 3 0 0 0
2ND 2 6 7 2 0 0 0
1ST 7 4 3 1 0 0 1
PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09

Students are again divided into quintile groups
based on total score
13
Right Hand Side of Sample Item Analysis

MATRIX RESPONDING BY QUINTILE
A B C D E O M
5TH 9 2 2 3 0 0 0
4TH 7 1 6 3 0 0 0
3RD 4 2 7 3 0 0 0
2ND 2 6 7 2 0 0 0
1ST 7 4 3 1 0 0 1
PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09

A E correspond to item alternatives O omits
(i.e., item not answered) M multiple (i.e.,
more than one answer selected)
14
Right Hand Side of Sample Item Analysis

MATRIX RESPONDING BY QUINTILE
A B C D E O M
5TH 9 2 2 3 0 0 0
4TH 7 1 6 3 0 0 0
3RD 4 2 7 3 0 0 0
2ND 2 6 7 2 0 0 0
1ST 7 4 3 1 0 0 1
PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09

Indicates the number of students in each quintile
group who selected each item alternative.
6 students in the 4th quintile selected
alternative C.
Want to see numbers decreasing from 5th to 1st
quintile for key, and increasing from 5th to 1st
quintile for distractors
15
Right Hand Side of Sample Item Analysis

MATRIX RESPONDING BY QUINTILE
A B C D E O M
5TH 9 2 2 3 0 0 0
4TH 7 1 6 3 0 0 0
3RD 4 2 7 3 0 0 0
2ND 2 6 7 2 0 0 0
1ST 7 4 3 1 0 0 1
PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09

Short for Proportion Indicates the proportion of
students in each column
PROP for correct answer (shown in brackets) is
referred to as the item difficulty PROP for
incorrect answers are called distractor
difficulties
16
Right Hand Side of Sample Item Analysis

MATRIX RESPONDING BY QUINTILE
A B C D E O M
5TH 9 2 2 3 0 0 0
4TH 7 1 6 3 0 0 0
3RD 4 2 7 3 0 0 0
2ND 2 6 7 2 0 0 0
1ST 7 4 3 1 0 0 1
PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09

Item difficulties range from 0.00 to 1.00. Hard
items have difficulties less than 0.35 Easy
items have difficulties above 0.85 Items that
are too hard or too easy will not contribute much
to the test reliability
17
Right Hand Side of Sample Item Analysis

MATRIX RESPONDING BY QUINTILE
A B C D E O M
5TH 9 2 2 3 0 0 0
4TH 7 1 6 3 0 0 0
3RD 4 2 7 3 0 0 0
2ND 2 6 7 2 0 0 0
1ST 7 4 3 1 0 0 1
PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09

Short for Point Biserial Correlation Indicates
the correlation between a students score on the
item alternative (1 selected, 0 not
selected), and their total score on the test
RPBI for correct answer is referred to as the
item discrimination RPBI for incorrect answers
are called distractor discriminations
18
Item Discrimination

Range from -1.0 to 1.0
Interpreting the sign
Positive values mean that students who selected
the alternative tended to have high scores and
students who did not select the alternative
tended to have low scores.
The RPBI for the key (i.e., item discrimination)
should be positive.
Negative values mean that students who selected
the alternative tended to have low scores and
students who did not select the alternative
tended to have high scores.
The RPBI for the distractors should be negative.
Values near zero mean that there is no
relationship between that item alternative and
total score.

19
Item Discrimination

Range from -1.0 to 1.0
Interpreting the magnitude
Values of 1.0 (or -1.0) mean that there is a
perfect linear relationship between selecting the
alternative and total score.
Will never happen in practice.
On classroom tests, discriminations rarely get
above .65 in absolute magnitude.
The higher the values, the better that choice is
able to discriminate between strong and weak
students.

20
What Are We Looking For In An Item?

Item Difficulty
Ideally, should be between .35 and .85
Items that are too easy or too hard will often
not discriminate well
Distractor Difficulties
Should be at least .02
Item Discrimination
At least 0.20 for classroom exams
Higher is better
.30 or higher for standardized measures.
Distractor Discriminations
All should be negative
The more negative, the better
The larger the distractor difficulty, the
stronger the distractor discrimination should be
RPBI -0.05, PROP 0.08 OK
RPBI -0.05, PROP 0.25 problem with
alternative

21
Using Item Analyses to Guide Item Revision

Items with negative or low positive RPBIs should
be either revised or deleted from item bank.
To understand how to revise, if at all, look at
distractor characteristics
Distractors with RPBIs that are either positive
or negative but too low considering the PROP,
should be replaced.
Consider replacing distractors that are selected
by too many or too few people
Dont change if the rest of item is working well
For an item to be revised successfully, it is
often necessary to have at least one solid
distractor that will not be changed.
If either all distractors are poor, or none is
particularly strong, delete item and write a
brand new one.
Change only pieces of the item that caused
problems
If an item fails, is revised, and fails again,
delete it and write a new item.

22
Right Hand Side of Sample Item Analysis

MATRIX RESPONDING BY QUINTILE
A B C D E O M
5TH 9 2 2 3 0 0 0
4TH 7 1 6 3 0 0 0
3RD 4 2 7 3 0 0 0
2ND 2 6 7 2 0 0 0
1ST 7 4 3 1 0 0 1
PROP 0.35 0.18 0.30 0.15 0.00 0.00 0.01
RPBI 0.18 -0.21 -0.07 0.11 0.00 0.00 -0.09

Item Discrimination is lower than desired
Item is pretty hard
Alternative (D) has a positive discrimination
Alternative (C) has a low discrimination, given
its difficulty
Alternative (B) is working very well
Revision Decision certainly replace (D),
consider replacing (C) also.
23
Requesting Item Analyses and Test Scoring

Testing Evaluation Services
373 Educational Sciences Bldg.
Pick up scanable answer sheets before testing
Requesting Output
Service Request Form (SRF) describes the nature
of your data and the types of output you want.

24
Review of Item Analysis for Workshop Test

Divide into 4 groups
A Er with Jim in Front
Es J with John in Middle
K R with Taehoon in Back by entry
S Z with Craig in Back on other side

25
Questions?
26
Thanks for Coming and ParticipatingWorkshop
Scheduled to run again in OctoberThanks to the
UW Teaching Academy
27
Please Complete a Workshop Evaluation Form

Write a Comment

User Comments (0)