Title: Designing a Classroom Test
1Designing a Classroom Test
- Anthony Paolo, PhD
- Director of Assessment Evaluation
- Office of Medical Education
-
- Psychometrician for CTC
- Teaching Learning Technologies
- September 2008
2Content
- Purpose of classroom test
- Test blueprint specifications
- Item writing
- Assembling the test
- Item analysis
3Purpose of Classroom Test
- Establish basis for assigning grades
- Determine how well each student has achieved
course objectives - Diagnose student problems
- Identify areas where instruction needs
improvement - Motivate students to study
- Communicate what material is important
4Test Blueprint
- To ensure the test assesses what you want to
measure - Ensure the test assesses the level or depth of
learning you want to measure
5Blooms Revised Cognitive Taxonomy
- Remembering Understanding
- Remembering Retrieving, recognizing, recalling
relevant knowledge. - Understanding Constructing meaning from
information through interpreting, classifying,
summarizing, inferring, explaining. - ITEM TYPES MC, T/F, Matching, Short Answer
- Applying Analyzing
- Applying Implementing a procedure or process.
- Analyzing Breaking material into constituent
parts, determining how the parts relate to one
another and to an overall structure or purpose
through differentiating, organizing, and
attributing. - ITEM TYPES MC, Short Answer, Problems, Essay
- Evaluating Creating
- Evaluating Making judgments based on criteria
standards through checking and critiquing. - Creating Putting elements together to form a
coherent or functional whole reorganizing
elements into a new pattern or structure through
generating, planning, or producing. - ITEM TYPES MC, Essay
6Test Blueprint
7Test Specifications
- To ensure the test covers the content and/or
objectives in the proper proportions
8Test Specifications
9Item Writing General Guidelines1
- Present a single clearly defined problem that is
based on a significant concept rather then
trivial or esoteric ideas - Use simple, precise unambiguous wording
- Exclude extraneous or irrelevant information
- Eliminate any systematic pattern of answers that
may allow guessing correctly
10Item Writing General Guidelines2
- Avoid cultural, racial, ethnic sexual bias.
- Avoid presupposed knowledge which favors one
group over another (fly ball favors those that
know baseball) - Refrain from providing unnecessary clues to the
correct answer. - Avoid negatively phrased items (i.e., except,
not) - Arrange answers in alphabetical / numerical order
11Item Writing General Guidelines3
- Avoid None of the above or All of the above
type answers - Avoid Both A B or Neither A or B type
answers
12Item Writing Correct Answer is
- Longer
- More qualified or more general
- Uses familiar phraseology
- Is grammatically correct for item stem
- Is 1 of the 2 similar statements
- Is 1 of the 2 opposite statements
13Item Writing Wrong Answer is
- Usually the first or last option
- Contain extreme words (always, never, nonsense,
etc.) - Contain unexpected language or technical terms
- Contain flippant remarks or completely
unreasonable statements
14Item Writing Grammatical Cues
15Item Writing Logical Cues
16Item Writing Absolute Terms
17Item Writing Word Repeats
18Item Writing Vague Terms
19Item Writing Vague Terms
20Item Writing
- Effective test items match the desired depth of
learning as directly as possible - Applying Analyzing
- Applying Implementing a procedure or process.
- Analyzing Breaking material into constituent
parts, determining how the parts relate to one
another and to an overall structure or purpose
through differentiating, organizing, and
attributing. - ITEM TYPES MC, Short Answer, Problems, Essay
21Comparison of MC Essay1
22Comparison of MC Essay2
23Item Writing - Application
- MC application of knowledge items tend to have
long vignettes that require decisions. - Case, et al. at the NBME investigated the impact
of increasing levels of interpretation, analysis
and synthesis required to answer a question on
item performance. - (Academic Medicine, 199671528-530)
24Item Writing - Application
25Item Writing - Application
26Item Writing - Application
27Preparing Assembling the Test
- Provide general directions
- Time allowed (allow enough time to complete test)
- How items are scored
- How to record answers
- How to record name /ID
- Arrange items systematically
- Provide adequate space for short answer and essay
responses - Placement of easier harder items
28Interpreting test scores
- Teachers
- High scores good instruction
- Low scores poor students
- Students
- High scores smart, well-prepared
- Low scores poor teaching, bad test
29Interpreting test scores
- High scores
- too easy, only measured simple educational
objectives, biased scoring, cheating,
unintentional clues to right answers - Low scores
- too hard, tricky questions, content not covered
in class, grader bias, insufficient time to
complete test
30Item Analysis
- Main purpose of item analysis is to improve the
test - Analyze items to identify
- Potential mistakes in scoring
- Ambiguous/tricky items
- Alternatives that do not work well
- Problems with time limits
31Reliability
- The reliability of a test refers to the extent to
which a test is likely to produce consistent
results. - Test-Retest
- Split-Half
- Internal consistency
- Reliability coefficients range from 0 (no
reliability) to 1 (perfect reliability) - Internal consistency usually measured by
Kuder-Richardson 20 (KR-20) or Cronbachs
coefficient alpha
32Internal Consistency Reliability
- High reliability means that the questions of the
test tended to hang together. Students that
answered a given question correctly were more
likely to answer other questions correctly. - Low reliability means that the questions tended
to be unrelated to each other in terms of who
answered them correctly.
33Reliability Coefficient Interpretation
- General guidelines for homogeneous tests
- .80 and above Very good reliability
- .70 to .80 Good reliability, a few test items
may need to be improved - .50 to .70 Somewhat low, several items will
likely need improvement (unless short test 15 or
fewer items) - .50 and below Questionable reliability, test
likely needs revision
34Item difficulty1
- Proportion of students that got the item correct
(ranges from 0 to 100) - Helps evaluate if an item is suited to the level
of examinee being tested. - Very easy or very hard items cannot adequately
discriminate between student performance levels. - Spread of student scores is maximized with items
of moderate difficulty.
35Item difficulty2
- Moderate item difficulty is the point halfway
between a perfect score and a chance score.
36Item discrimination1
- How well does the item separate those that know
the material from those that do not. - In LXR, measured by the Point-Biserial (rpb)
correlation (ranges from -1 to 1). - rbp is the correlation between item and exam
performance
37Item discrimination2
- rpb means that those scoring higher on the exam
were more likely to answer the item correctly.
(better discrimination) - - rpb means that high scorers on the exam
answered the item wrong more frequently than low
scorers. (poor discrimination) - A desirable rpb correlation is 0.20 or higher.
38Evaluation of Distractors
- Distractors are designed to fool those that do
not know the material. Those that do not know
the answer, guess among the choices. - Distractors should be equally popular.
- ( expected answered item wrong / of
distractors) - Distractors ideally have a low or -rpb
39LXR Example 1( correct answer)
Very easy item, would probably review the
alternates to make sure they are not ambiguous
and/or provide clues that they are wrong.
40LXR Example 2( correct answer)
Three of the alternatives are not functioning
well, would review them.
41LXR Example 3( correct answer)
Probably a miskeyed item. The correct answer is
likely option E.
42LXR Example 4( correct answer)
Relatively hard item with good discrimination.
Would review alternatives C D to see why they
attract a relatively low high number of
students.
43LXR Example 5( correct answer)
Poor discrimination for correct choice B.
Choice E actually does a better job
discriminating. Would review item for proper
keying, ambiguous wording, proper wording of
alternatives, etc. This item needs revision.
44Resources
- Constructing Written Test Questions for the Basic
and Clinic Sciences (www.nbme.org) - How to Prepare Better Multiple-Choice Test Items
Guidelines for University Faculty (Brigham Young
University (testing.byu.edu/info/handbooks/better
items.pdf)
45Thank you for your timeQuestions ???