TEST WRITING - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

TEST WRITING

Description:

TEST WRITING. May 6, 2006. Monica Geist. FRCC Westminster Math Department. May 6, 2006 ... is the field of study concerned with the theory and technique of ... – PowerPoint PPT presentation

Number of Views:112

Avg rating:3.0/5.0

Slides: 39

Provided by: monica90

Category:

more less

Transcript and Presenter's Notes

Title: TEST WRITING

1
TEST WRITING

May 6, 2006
Monica Geist
FRCC Westminster Math Department

2
Feedback

For you, what is
the most difficult
part of writing
exams?

Writing good test
items is a learnable
skill.
(Thorndike, 1997)

4
Outline

Fundamental Concepts
Test Writing
Coverage
Writing Items
Item Analysis
Reliability
Validity

5
Fundamentals

Math ability is a
latent trait
(latent hidden
unobservable)

6
Fundamentals, cont

Psychometrics
is the field of study concerned with the theory
and technique of psychological measurement, which
includes the measurement of knowledge, abilities,
attitudes, and personality traits. The field is
primarily concerned with the study of differences
between individuals. It involves two major
research tasks, namely (i) the construction of
instruments and procedures for measurement and
(ii) the development and refinement of
theoretical approaches to measurement.

7
The GOAL

The goal is to assign a
test score that best
reflects the students
ABILITY.

8
Test Writing

Coverage
The entire scope of what the test should cover is
called the DOMAIN of the test.
Start with a test BLUEPRINT.
A blueprint outlines what content, concepts,
ideas should be on the test, as well as
proportion of the test that should be
skill/drill, conceptual, application, etc.
See handouts
Ideally, the blueprints should stay the same each
semester, but then you can change the items each
semester.

9
Item Writing

Items are developed according to the blueprint.
Be clear in your mind that you are asking the
question that will measure the knowledge you are
testing.
This will affect validity (more on that later).

10
(No Transcript)
11
Writing items, cont

Think about the reading level of your students.
Do the words distract from the skill you want to
assess
Be mindful of international students
i.e. be careful of using slang

12
Item Analysis

Item analysis provides information useful for
improving the quality and accuracy of test items
There are complicated ways of examining each test
item as well as home-made quick-and-dirty
systems.

13
Methods Psychometricians Use

Item Response Theory (IRT)
Rasch Modeling
Classical Test Theory (CTT)

14
Item Response Theory (IRT)

Most thorough and complicated
Ability
Item Difficulty
Item Discrimination
Guessing
N thousands and thousands
Testing companies use (ACT, SAT, GRE)

15
Rasch Modeling

A special case of IRT
Assumes equal item discrimination within a test
Assumes no guessing
Assumes item difficulty is the only item
characteristic influencing performance
N does not have to be as large as IRT requires,
but still in the hundreds
Note IRT and Rasch folks disagree on approach.

16
Classical Test Theory (CTT)

N can be small (classroom size)
X T E
X the students observed score
T the students true score
E error (measurement error, or otherwise)

17
CTT

When thinking in CTT terms, we want to minimize
the error, E, so that the observed score is as
close as possible to the true score. Recall X
T E

18
Our Goal

Use the principles of IRT,
Rasch, and CTT to write
the best possible test
the one that measures
ability the best.

19
Characteristics of the best test

Items should have varying degrees of difficulty
Items should have varying degrees of
discrimination
Items that do not contribute to a total score of
ABILITY, should not be included
i.e. extra credit that asks How did you like the
test?

20
Need a system

Have a system to determine if items are working
from
The students perspective
Your perspective
The Classical Test Theory perspective

21
From the STUDENTS perspective

Quick-and-Dirty
Develop a system from the student perspective to
see if items work.
I use
? means I got it right. Im confident.
? means I got it wrong, but its because I
didnt study or I didnt come to class or
whatever
? means I dont know how to do it.
W means I dont understand the wording. What are
you asking?

22
Example of W question

Consider the equation
5(x-2)(x3)0
The left side has three factors,
but the equation has only two
solutions. Why?

23
From YOUR perspective

Quick-and-Dirty
Calculate item difficulty
Item difficulty is computed by finding the
percent of examinees who answered the item
correctly (CTT)
Should not have too many items that no one got
correct
Should not have too many items that everyone got
correct
Should be a variety of difficulty levels
Should have questions for different level
students
i.e. A questions, B questions, C questions
One way 70 C questions, 10 B questions, 10 A
questions

24
From a CTT perspective

Classical Test Theory Item Analysis(if we have
time look at handout)
Simplified Procedures
Looking at distribution
Separating Lower and Upper sections of group
More Formal Item Analysis Procedures
Discrimination Index
Item-Total Score Correlation

25
Reliability and Validity

Reliability
- Related to the stability or consistency of the
test scores
Does the test consistently measure ability?
If you could give a student the test, then erase
that testing experience, and test again, would
they get the same score?
Validity
Related to the meaning of the test score
Does the test measure what we think its
measuring?
How appropriate is the inference based on test
score?

26
Reliability

Reliability is the extent to which random sources
of measurement error are minimized
Recall X T E
A reliable measure reflects primarily true score
variance and little error variance.

27
Reliability, cont

Things that affect reliability
Test length
Questions that dont measure knowledge
The myriad factors, unrelated to the knowledge,
which influence performance
Physical
Headache
Hunger
Room temperature
Verbal directions/hints before the test
You want to administer the test in a consistent
way.
Grading
Have you experienced grading drift?

28
Reliability, cont

What can we do to help reliability?
Test length
The longer the test the more reliable.
Think repeatability!
The more times we ask students to perform a
skill, the more reliable the score
Break questions down into parts
Ask about each part of bigger skill
Then ask about the bigger skill
Youll know which parts they can and cannot do.
(this is related to validity, as well)

29
What you can do, cont

Encourage students to get enough sleep the night
before
Remind them cramming doesnt work
Administer the test in the same way across
sections
Dont give hints to one class and not the other
A lot of things are not in your control
Noise, temperature, illness, etc.

30
Validity

Validity answers the question of MEANING.
Are we testing what we think were testing?
Does this question get at the skill that I am
trying to test?
What does the test score mean?

31
Validity, cont

Validity is the adequacy and appropriateness of
inferences and actions based on test scores
(Messick, 1988)

32
Validity is NOT

Validity is NOT a characteristic of any given
test
No test will be valid for all purposes or for all
people
We can only say a test is valid for College
Algebra students at the end of the semester
We can never PROVE a tests validity
We can only provide evidence.

33
Broad types of validity evidence

Content-related evidence
Appropriate breadth and depth of content
Refer to your test blueprint
The minimum level of validity
Criterion-related evidence
Predictive
Does the test predict some future behavior?
Concurrent
Does the test score correlate to another
criterion measured by a different test or
measurement?
Construct-related evidence
Statistically look at the structure of the test
(FA, SEM)
Need large N
We wont be able to do this for our classroom
tests.

34
Validity

What can we do to help with validity?
Stick to our blueprint
Throw out bad items
ASK yourself
Does this score have meaning? What does it mean?
Does this score predict how the student will
likely do on future tests? In future classes?
Have I eliminated as much error as possible from
the observed score? X T E
Break concepts down into smaller parts. Test
those. ALSO test the whole process.

35
Reliability and Validity

We need BOTH reliability and validity
Can have a reliable test that measures the wrong
thing. It will be reliable, but not valid.
Cannot have a valid test without reliability.

36
MAT 090 Example