Title: Item Response Theory
1Item Response Theory
- Dan Mungas, Ph.D.
- Department of Neurology
- University of California, Davis
2What is it?Why should anyone care?
3IRT Basics
4Item Response Theory - What Is It
- Modern approach to psychometric test development
- Mathematical measurement theory
- Associated numeric and computational methods
- Widely used in large scale educational,
achievement, and aptitude testing - More than 50 years of conceptual and
methodological development
5Item Response Theory - Methods
- Dataset consists of rectangular table
- rows correspond to examinees
- columns correspond to items
- IRT applications simultaneously estimate examinee
ability and item parameters - iterative, maximum likelihood estimation
algorithms - processor intensive, no longer a problem
6Basic Data Structure
7Item Types
- Dichotomous
- Multiple Choice
- Polytomous
- Information is greater for polytomous item than
for the same item dichotomized at a cutpoint
8What is the item level response
- Smallest discrete unit (e.g. Object Naming)
- Sum of correct responses (trials in word list
learning test) - For practical reasons, continuous measures might
have to be recoded into ordinal scales with
reduced response categories (10, 15)
9Item Response Theory - Basic Results
- Item parameters
- difficulty
- discrimination
- correction for guessing
- most applicable for multiple choice items
- Subject Ability (in the psychometric sense)
- Capacity to successfully respond to test items
(or propensity to respond in a certain direction) - Net result of all genetic and environmental
influences - Measured by scales composed of homogenous items
- Item difficulty and subject ability are on the
same scale
10Item Characteristic Curves
11Item Response Theory - Outcomes
- Item-Level Results
- Item Characteristic Curve (ICC)
- non-linear function relating ability to
probability of correct response to item - Item Information Curve (IIC)
- non-linear function showing precision of
measurement (reliability) at different ability
points - Both curves are defined by the item parameters
12Item Characteristic Curves
13Information Curves
14(No Transcript)
15Item Response Theory - Outcomes
- Test-Level Results
- Test Characteristic Curve (TCC)
- non-linear function relating ability to expected
total test score - Test Information Curve (TIC)
- non-linear function showing precision of
measurement (reliability) at different ability
points - Both sum of item level functions of included items
16Test Characteristic CurveMini-Mental State
Examination
17Information Curves
18Item Response Theory - Fundamental Assumptions
- Unidimensionality - items measure a homogenous,
single domain - Local independence - covariance among items is
determined only by the latent dimension measured
by the item set
19IRT Models
- 1PL (Rasch)
- Only Difficulty and Ability are estimated
- Discrimination is assumed to be equal across
items - 2PL
- Discrimination, Difficulty and Ability are
estimated - Guessing is assumed to not have an effect
- 3PL
- Discrimination, Difficulty, Guessing, and Ability
are estimated (multiple choice items)
20Item Response Theory - Invariance Properties
- Invariance requires that basic assumptions are
met - Item parameters are invariant across different
samples - Within the range of overlap of distributions
- Distributions of samples can differ
- Ability estimates are invariant across different
item sets - Assumes that ability range of items spans ability
range of subjects that is of interest
21Why Do We Care -Applications of IRT in Health
Care Settings
- Refined scoring of tests
- Characterization of psychometric properties of
existing tests - Construction of new tests
22Test Scoring
- IRT permits refined scoring of items that allows
for differential weighting of items based on
their item parameters
23Physical Function Scale Hays, Morales Reise
(2000)
Item LIMITED LIMITED NOT LIMITED A LOT A
LITTLE AT ALL Vigorous activities,
running, Lifting heavy objects, Strenuous
sports 1 2 3 Climbing one flight 1 2 3 Walki
ng more than 1 mile 1 2 3 Walking one
block 1 2 3 Bathing / dressing
self 1 2 3 Preparing meals / doing
laundry 1 2 3 Shopping 1 2 3 Getting around
inside home 1 2 3 Feeding self 1 2 3
24How to Score Test
- Simple approach there are numbers that will be
circled total these up, and we have a score. - But should limited a lot for walking a mile
receive the same weight as limited a lot in
getting around inside the home? - Should limited a lot for walking one block be
twice as bad as limited a little for walking
one block?
25How IRT Can Help
- IRT provides us with a data-driven means of
rational scoring for such measures - Items that are more discriminating are given
greater weight - In practice, the simple sum score is often very
good improvement is at the margins
26Description of Psychometric Properties
- The Test Information Curve (TIC) shows
reliability that continuously varies by ability - Depicts ability levels associated with high and
low reliability - The standard error of measurement is directly
related to information value (I(Q)) - SEM(Q) 1 / sqrt(I(Q))
- SEM (Q) and I(Q) also have a direct
correspondence to traditional r - r (Q) 1 - 1/ I(Q)
27I(Q), SEM, r
28TICs for English and Spanish language Versions of
Two Scales
Mungas et al., 2004
29Construction of New Scales
- Items can be selected to create scales with
desired measurement properties - Can be used for prospective test development
- Can be used to create new scales from existing
tests/item pools - IRT will not overcome inadequate items
30TICs from an Existing Global Cognition Scale and
Re-Calibrated Existing Cognitive Tests
Mungas et al., 2003
31Principles of Scale Construction
- Information corresponds to assessment goals
- Broad and flat TIC for longitudinal change
measure in population with heterogenous ability - For selection or diagnostic test, peak at point
of ability continuum where discrimination is most
important - But normal cognition spans a 4.0 s.d. range, and
is even greater in demographically diverse
populations
32Other Issues In IRT
- Polytomous IRT models are available
- Useful for ordinal (Likert) rating scales
- Each possible score of the item (minus 1) is
treated like a separate item with a different
difficulty parameter - Information is greater for polytomous item than
for the same item dichotomized at a cutpoint
33Other Issues in IRT
- Applicable to broad range of content domains
- IRT certainly applies to cognitive abilities
- Also applies to other health outcomes
- Quality of life
- Physical function
- Fatigue
- Depression
- Pain
34Other Issues in IRT
- Differential Item Function - Test Bias
- IRT provides explicit methods to evaluate and
quantify the extent to which items and tests have
different measurement properties in different
groups - e.g. racial and ethnic groups, linguistic groups,
gender
35English and Spanish Item Characteristic Curves
for Lamb/Cordero Item
36English and Spanish Item Characteristic Curves
for Stone/Piedra Item
37Differential Item Function (DIF)
- DIF refers to systematic bias in measuring true
ability - doesnt address group differences in
ability
38Challenges/ Limitations of IRT
- Large samples required for stable estimation
- 150-200 for 1PL
- 400-500 for 2PL
- 600-1000 for 3PL
- Analytic methods are labor intensive
- There are a number of (expensive ) applications
readily available for IRT analyses - Evaluation of basic assumptions, identification
of appropriate model, and systematic IRT analysis
require considerable expertise and labor
but, R!!
39Computerized Adaptive Testing (CAT)
- IRT based computer driven method
- Selects items that most closely match examinees
ability - Administers only items needed to achieve a
pre-specified level of precision in measurement
(information, s.e.m., reliability)
40Why CAT
- Efficiency
- Administration -
- Standardization
- Time efficiency
- Data collection
- Scoring
- Computer can implement complex scoring algorithms
41CAT Example 1
42CAT Example 2
43Practical Considerations for CAT
44What You Need for CAT
- Computer technology
- Item Selection
- Item Administration
- Scale Scoring
- Item bank with IRT parameters
- Range of item difficulty relevant to measurement
needs
45What is Straightforward/Easy?
- Dichotomous items
- Multiple choice items
- Ordered polytomous response scales
- Up to 10-15 response options
46Technical Challenges
- Continuous response scales (memory, timed tasks)
- Can be recoded into smaller number of ordered
response ranges - Lose information
47Methodological Challenges
- Sample size requirements
- Minimally 300-600 cases for stable estimation of
item parameters - Differential Item Function and Measurement Bias
- Essentially involves item calibration within
groups of interest - e.g., age, education, language, gender, race
- Available literature provides minimal guidance
48References
- Mungas, D., Reed, B. R., Kramer, J. H. (2003).
Psychometrically matched measures of global
cognition, memory, and executive function for
assessment of cognitive decline in older persons.
Neuropsychology, 17(3), 380-392. - Mungas, D., Reed, B. R., Crane, P. K., Haan, M.
N., González, H. (2004). Spanish and English
Neuropsychological Assessment Scales (SENAS)
Further development and psychometric
characteristics. Psychological Assessment, 16(4),
347-359.