Test Items and Item Analysis - PowerPoint PPT Presentation

About This Presentation

Title:

Test Items and Item Analysis

Description:

Psy 427 Cal State Northridge Andrew Ainsworth PhD Cal State Northridge - Psy 427 * Item Response Theory viewing item response curves at different levels of difficulty ... – PowerPoint PPT presentation

Number of Views:1935

Avg rating:3.0/5.0

Slides: 71

Provided by: AndrewAi9

Learn more at: http://www.csun.edu

Category:

more less

Transcript and Presenter's Notes

Title: Test Items and Item Analysis

1
Test Items and Item Analysis

Psy 427
Cal State Northridge
Andrew Ainsworth PhD

2
Item Formats

Dichotomous Format
Two alternatives
True/False
MMPI/2 MMPI/A
Polytomous or Polychotomous Format
More than two alternatives
Multiple choice
Psy427 Midterm, SAT, GRE,

3
Item Formats

Distractors
Item Formats
Incorrect choices on a polychotomous test
Best to have three or four
BUT -
one study (Sidick, Barret, Doverspike, 1994)
found equivalent validity and reliability for a
test with two distractors (three items) as one
with four distractors (five items).
SO, best might be to have two to four (further
study is needed)

4
Should you guess on polytomous tests?

Depends Correction for guessing
R is the number correct
W is the number incorrect
n is the number of polytomous choices
If no correction for guessing, guess away.
If there is a correction for guessing, better to
leave some blank (unless you can beat the odds)

5
Other Test Items

Likert scales
On a rating scale of 1-5, or 1-6, 1-7, etc. where
1 strongly disagree
2 moderately disagree
3 mildly disagree
4 mildly agree
5 moderately agree
6 strongly agree
rate the following statements.

6
Other Test Items

Likert scales
Even vs. odd number of choices
Even numbers prevents fence-sitting
Odd numbers allows people to be neutral
Likert items are VERY popular measurement items
in psychology.
Technically ordinal but are often assumed
continuous if 5 or more choices
With that assumption we can calculate means,
factor analyze, etc.

7
Other Test Items

Category format
Like Likert, but with MANY more categories
e.g., 10-point scale
Best if used with anchors
Research supports use of 7-point scales to
21-point scales

8
Other Test Items

Visual Analogue Scale
No Headache Worst Headache
Also used in research
dials, knobs
time sampling

9
Checklists Q-Sorts

Both used in qualitative research as well as
quantitative research
Checklists
Present list of words (adjectives)
Have person choose to endorse each item
Can determine perceptions of concepts using
checklists.

10
Checklists Q-Sorts

Adjective Checklists (from http//www.encyclopedia
.com/doc/1O87-AdjectiveCheckList.html)
In psychometrics, any list of adjectives that can
be marked as applicable or not applicable
to oneself
to one's ideal self
to another person, OR
to some other entity or concept.

11
Checklists Q-Sorts

Checklists
When written with initial uppercase letters
(ACL), the term denotes more specifically a
measure consisting of a list of 300 adjectives,
from absent-minded to zany
Selected by the US psychologist Harrison G. Gough
(born 1921) and introduced as a commercial test
in 1952.
The test yields 24 scores, including measures of
personal adjustment, self-confidence,
self-control, lability, counselling readiness,
some response styles, and 15 personality needs,
such as achievement, dominance, and endurance.

12
Checklists Q-Sorts

Q-Sorts
Introduced by William Stephenson in 1935
PhD in physics 1926 PhD in psychology in 1929
Student of Charles Spearman
Goal to get a quantitative description of a
persons perceptions of a concept
Process give subject a pile of numbered cards
have them sort them into piles
Piles represent graded degrees of description
(most descriptive to least descriptive).

13
Checklists Q-Sorts

Q-Sorts
Means of self-evaluation of clients current
status
The Q-Sort consists of a number of cards, often
as many as 40 or 50, even 100 items each
consisting of a single trait, belief, or
behavior.
The goal is to sort these cards into one of five
columns ranging from statements such as, very
much like me to not at all like me.
There are typically a specific number of cards
allowed for each column, forcing the client to
balance the cards evenly.
Example
California Q-sort , Attachment Q-sort

14
Example Q-sort
15
California Q-Sort
16
Attachment Q-sort
Attachment Q-sort Distribution (number of items
per pile designated)
17
Item Analysis

Methods used to evaluate test items.
What are good items?
Techniques
Item Difficulty (or easiness)
Discriminability
Extreme Group
Item/Total Correlation
Item Characteristic Curves
Item Response Theory
Criterion-Referenced Testing

18
Item Difficulty

The proportion of people who get a particular
item correct or that endorse an item (if there is
no correct response, e.g. MMPI)
Often thought of as the items easiness because
it is based on the number correct/endorsed

19
Item Difficulty

The difficulty can be given in proportion for or
it can be standardized in to a Z-value

20
Item Difficulty

For example a test with the difficulty of .84

21
Difficult Item (35)

If you are taking a criterion referenced test in
a social psychology course and you need to score
a 92 in order to get an A, the criterion is
Social Psychology
Scoring a 92
Getting an A
Not enough info.

22
Difficult Item (35)
23
Moderate Item (51)

The correlation between X and is .54. X has a
SD of 1.2 and Y has a SD of 5.4. What is the
regression coefficient (b) when Y is predicted by
X?
.12
2.43
.375
.45

24
Difficult Item (51)
25
Easy Item (100)

For the following set of data 5 9 5 5 2 4
, the mean is
4
5
4.5
6

26
Difficult Item (100)
27
Optimum Difficulty

Mathematically half-way between chance and 100.
Steps (assuming a 5-choice test)
Find half-way between 100 and chance
1 - .2 .8, .8/2 .4
Add this value to chance alone
.4 .2 .6
Alternately Chance 1.0 / 2 optimum
difficulty
A good test will have difficulty values between
.30 and .70

28
Discriminability

Can be defined in 2 ways
How well does each item distinguish
(discriminate) between individuals who are
scoring high and low on the test as a whole (e.g.
the trait of interest).
Or simply how well is each item related to the
trait (e.g. loadings in factor analysis)
1 and 2 are really the same the more an item is
related to the trait the better it can
distinguish high and low scoring individuals

29
Discriminability

Extreme Group Method
First
Identify two extreme groups
Top third vs. bottom third
Second
Compute Difficulty for the top group
Compute Difficulty for the bottom group
Compute the difference between Top Difficulty and
Bottom Difficulty
Result Discriminability Index

30
(No Transcript)
31
Discriminability

Item/Total Correlation
Let the total test score stand in for the trait
of interest a roughly estimated factor of
sorts
Correlate each item with the total test score
items with higher item/total correlations are
more discriminating
These correlations are like rough factor loadings

32
Discriminability

Point Biserial Method
If you have dichotomous scored items (e.g. MMPI)
or items with a correct answer
Correlate the proportion of people getting each
item correct with total test score.
One dichotomous variable (correct/incorrect)
correlated with one continuous variable (total
score) is a Point-Biserial correlation
Measures discriminability

33
Discriminability

Point Biserial Method

34
Discriminability

The discimination can be standardized in to a
Z-value as well

35
Discriminability

The discimination can be standardized in to a
Z-value as well

36
Discriminability
37
Selecting Items

Using Difficulty and Discrimination together

38
Item Characteristic Curves

A graph of the proportion of people getting each
item correct, compared to total scores on the
test.
Ideally, lower test scores should go along with
lower proportions of people getting a particular
item correct.
Ideally, higher test scores should go along with
higher proportions of people getting a particular
item correct.

39
Item Characteristic Curves
40
Item Characteristic Curves
41
Item Characteristic Curves
42
Item Characteristic Curves
43
Item Characteristic Curves
44
Item Characteristic Curves
45
Item Characteristic Curves
46
Item Characteristic Curves
47
Item Characteristic Curves
48
Item Characteristic Curves
49
Item Characteristic Curves
50
Item Characteristic Curves
51
Item Characteristic Curves
52
Item Characteristic Curves
53
Item Characteristic Curves
54
Item Characteristic Curves
55
Item Characteristic Curves
56
Item Characteristic Curves
57
Item Characteristic Curves
58
Item Characteristic Curves
59
Item Characteristic Curves
60
Item Characteristic Curves
61
Item Characteristic Curves
62
Item Characteristic Curves
63
Item Characteristic Curves
64
Item Characteristic Curves
65
Item Characteristic Curves
66
Other Evaluation Techniques

Item Response Theory
viewing item response curves at different levels
of difficulty
Looks at standard error at different ranges of
the trait you are trying to measure
More on this in the next topic

67
Other Evaluation Techniques

Criterion-Referenced Tests
Instead of comparing a score on a test or scale
to other respondents scores we can compare each
individual to what they should have scored.
Requires that there is a set objective in order
to assess whether the objective has been met
E.g. In intro stats students should learn how to
run an independent samples t-test a criterion
referenced test could be used to test this. This
needs to be demonstrated before moving on to
another objective.

68
Other Evaluation Techniques

Criterion-Referenced Tests
To evaluate CRT items
Give the test to 2 groups one exposed to the
material and one that has not seen the material
Distribute the scores for the test in a frequency
polygon
The antimode (leasts frequent value) represents
the cut score between those who were exposed to
the material and those who werent
Scores above the cut score are assumed to have
mastered the material, and vice versa

69
Criterion Referenced Test
70
Other Evaluation Techniques

Criterion-Referenced Tests
Often used with Mastery style learning
Once a student indicates theyve mastered the
material he/she moves on to the next module of
material
If they do not pass the cut score for mastery
they receive more instruction until they can
master the material

Write a Comment

User Comments (0)