TEST WRITING - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

TEST WRITING

Description:

TEST WRITING. May 6, 2006. Monica Geist. FRCC Westminster Math Department. May 6, 2006 ... is the field of study concerned with the theory and technique of ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 39
Provided by: monica90
Category:
Tags: test | writing | geist

less

Transcript and Presenter's Notes

Title: TEST WRITING


1
TEST WRITING
  • May 6, 2006
  • Monica Geist
  • FRCC Westminster Math Department

2
Feedback
  • For you, what is
  • the most difficult
  • part of writing
  • exams?

3
  • Writing good test
  • items is a learnable
  • skill.
  • (Thorndike, 1997)

4
Outline
  • Fundamental Concepts
  • Test Writing
  • Coverage
  • Writing Items
  • Item Analysis
  • Reliability
  • Validity

5
Fundamentals
  • Math ability is a
  • latent trait
  • (latent hidden
  • unobservable)

6
Fundamentals, cont
  • Psychometrics
  • is the field of study concerned with the theory
    and technique of psychological measurement, which
    includes the measurement of knowledge, abilities,
    attitudes, and personality traits. The field is
    primarily concerned with the study of differences
    between individuals. It involves two major
    research tasks, namely (i) the construction of
    instruments and procedures for measurement and
    (ii) the development and refinement of
    theoretical approaches to measurement.

7
The GOAL
  • The goal is to assign a
  • test score that best
  • reflects the students
  • ABILITY.

8
Test Writing
  • Coverage
  • The entire scope of what the test should cover is
    called the DOMAIN of the test.
  • Start with a test BLUEPRINT.
  • A blueprint outlines what content, concepts,
    ideas should be on the test, as well as
    proportion of the test that should be
    skill/drill, conceptual, application, etc.
  • See handouts
  • Ideally, the blueprints should stay the same each
    semester, but then you can change the items each
    semester.

9
Item Writing
  • Items are developed according to the blueprint.
  • Be clear in your mind that you are asking the
    question that will measure the knowledge you are
    testing.
  • This will affect validity (more on that later).

10
(No Transcript)
11
Writing items, cont
  • Think about the reading level of your students.
  • Do the words distract from the skill you want to
    assess
  • Be mindful of international students
  • i.e. be careful of using slang

12
Item Analysis
  • Item analysis provides information useful for
    improving the quality and accuracy of test items
  • There are complicated ways of examining each test
    item as well as home-made quick-and-dirty
    systems.

13
Methods Psychometricians Use
  • Item Response Theory (IRT)
  • Rasch Modeling
  • Classical Test Theory (CTT)

14
Item Response Theory (IRT)
  • Most thorough and complicated
  • Ability
  • Item Difficulty
  • Item Discrimination
  • Guessing
  • N thousands and thousands
  • Testing companies use (ACT, SAT, GRE)

15
Rasch Modeling
  • A special case of IRT
  • Assumes equal item discrimination within a test
  • Assumes no guessing
  • Assumes item difficulty is the only item
    characteristic influencing performance
  • N does not have to be as large as IRT requires,
    but still in the hundreds
  • Note IRT and Rasch folks disagree on approach.

16
Classical Test Theory (CTT)
  • N can be small (classroom size)
  • X T E
  • X the students observed score
  • T the students true score
  • E error (measurement error, or otherwise)

17
CTT
  • When thinking in CTT terms, we want to minimize
    the error, E, so that the observed score is as
    close as possible to the true score. Recall X
    T E

18
Our Goal
  • Use the principles of IRT,
  • Rasch, and CTT to write
  • the best possible test
  • the one that measures
  • ability the best.

19
Characteristics of the best test
  • Items should have varying degrees of difficulty
  • Items should have varying degrees of
    discrimination
  • Items that do not contribute to a total score of
    ABILITY, should not be included
  • i.e. extra credit that asks How did you like the
    test?

20
Need a system
  • Have a system to determine if items are working
    from
  • The students perspective
  • Your perspective
  • The Classical Test Theory perspective

21
From the STUDENTS perspective
  • Quick-and-Dirty
  • Develop a system from the student perspective to
    see if items work.
  • I use
  • ? means I got it right. Im confident.
  • ? means I got it wrong, but its because I
    didnt study or I didnt come to class or
    whatever
  • ? means I dont know how to do it.
  • W means I dont understand the wording. What are
    you asking?

22
Example of W question
  • Consider the equation
  • 5(x-2)(x3)0
  • The left side has three factors,
  • but the equation has only two
  • solutions. Why?

23
From YOUR perspective
  • Quick-and-Dirty
  • Calculate item difficulty
  • Item difficulty is computed by finding the
    percent of examinees who answered the item
    correctly (CTT)
  • Should not have too many items that no one got
    correct
  • Should not have too many items that everyone got
    correct
  • Should be a variety of difficulty levels
  • Should have questions for different level
    students
  • i.e. A questions, B questions, C questions
  • One way 70 C questions, 10 B questions, 10 A
    questions

24
From a CTT perspective
  • Classical Test Theory Item Analysis(if we have
    time look at handout)
  • Simplified Procedures
  • Looking at distribution
  • Separating Lower and Upper sections of group
  • More Formal Item Analysis Procedures
  • Discrimination Index
  • Item-Total Score Correlation

25
Reliability and Validity
  • Reliability
  • - Related to the stability or consistency of the
    test scores
  • Does the test consistently measure ability?
  • If you could give a student the test, then erase
    that testing experience, and test again, would
    they get the same score?
  • Validity
  • Related to the meaning of the test score
  • Does the test measure what we think its
    measuring?
  • How appropriate is the inference based on test
    score?

26
Reliability
  • Reliability is the extent to which random sources
    of measurement error are minimized
  • Recall X T E
  • A reliable measure reflects primarily true score
    variance and little error variance.

27
Reliability, cont
  • Things that affect reliability
  • Test length
  • Questions that dont measure knowledge
  • The myriad factors, unrelated to the knowledge,
    which influence performance
  • Physical
  • Headache
  • Hunger
  • Room temperature
  • Verbal directions/hints before the test
  • You want to administer the test in a consistent
    way.
  • Grading
  • Have you experienced grading drift?

28
Reliability, cont
  • What can we do to help reliability?
  • Test length
  • The longer the test the more reliable.
  • Think repeatability!
  • The more times we ask students to perform a
    skill, the more reliable the score
  • Break questions down into parts
  • Ask about each part of bigger skill
  • Then ask about the bigger skill
  • Youll know which parts they can and cannot do.
  • (this is related to validity, as well)

29
What you can do, cont
  • Encourage students to get enough sleep the night
    before
  • Remind them cramming doesnt work
  • Administer the test in the same way across
    sections
  • Dont give hints to one class and not the other
  • A lot of things are not in your control
  • Noise, temperature, illness, etc.

30
Validity
  • Validity answers the question of MEANING.
  • Are we testing what we think were testing?
  • Does this question get at the skill that I am
    trying to test?
  • What does the test score mean?

31
Validity, cont
  • Validity is the adequacy and appropriateness of
    inferences and actions based on test scores
    (Messick, 1988)

32
Validity is NOT
  • Validity is NOT a characteristic of any given
    test
  • No test will be valid for all purposes or for all
    people
  • We can only say a test is valid for College
    Algebra students at the end of the semester
  • We can never PROVE a tests validity
  • We can only provide evidence.

33
Broad types of validity evidence
  • Content-related evidence
  • Appropriate breadth and depth of content
  • Refer to your test blueprint
  • The minimum level of validity
  • Criterion-related evidence
  • Predictive
  • Does the test predict some future behavior?
  • Concurrent
  • Does the test score correlate to another
    criterion measured by a different test or
    measurement?
  • Construct-related evidence
  • Statistically look at the structure of the test
    (FA, SEM)
  • Need large N
  • We wont be able to do this for our classroom
    tests.

34
Validity
  • What can we do to help with validity?
  • Stick to our blueprint
  • Throw out bad items
  • ASK yourself
  • Does this score have meaning? What does it mean?
  • Does this score predict how the student will
    likely do on future tests? In future classes?
  • Have I eliminated as much error as possible from
    the observed score? X T E
  • Break concepts down into smaller parts. Test
    those. ALSO test the whole process.

35
Reliability and Validity
  • We need BOTH reliability and validity
  • Can have a reliable test that measures the wrong
    thing. It will be reliable, but not valid.
  • Cannot have a valid test without reliability.

36
MAT 090 Example
  • See MAT 090 test
  • While creating your tests, you should keep in
    mind
  • Math knowledge is a latent trait
  • Coverage (blueprint)
  • Reducing error in the CTT model
  • X T E
  • Reliability issues
  • Validity issues

37
Remember
  • The goal is to assign a test score that best
    reflects the students ABILITY.
  • We want to write the best possible test the one
    that measures ability the best.

38
  • Q A
Write a Comment
User Comments (0)
About PowerShow.com