Item Response Theory in Health Measurement - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Item Response Theory in Health Measurement

Description:

Any item in any health measure has two parameters: ... Item discrimination: biserial correlation between item and total test score. ... – PowerPoint PPT presentation

Number of Views:162

Avg rating:3.0/5.0

Slides: 41

Provided by: IanMcD2

Category:

more less

Transcript and Presenter's Notes

Title: Item Response Theory in Health Measurement

1
Item Response Theory in Health Measurement
2
Outline

Contrast IRT with classical test theory
Introduce basic concepts in IRT
Illustrate IRT methods with ADL and IADL scales
Discuss empirical comparisons of IRT and CTT
Advantages and disadvantages of IRT
When would it be appropriate to use IRT?

3
Test Theory

Any item in any health measure has two
parameters
The level of ability required to answer the
question correctly. (In health this translates
into the level of health at which the person
doesnt report this problem)
The level of discrimination of the item how
accurately it distinguishes well from sick

4
Classical Test Theory

Most common paradigm for scale development and
validation in health
Few theoretical assumptions, so broadly
applicable
Partitions observed score into True Score Error
Probability of a given item response is a
function of person to whom item is administered
and nature of item
Item difficulty proportion of examinees who
answer item correctly (in health item severity)
Item discrimination biserial correlation
between item and total test score.

5
Classical test theory

Probability of no answer depends on type of
item (difficulty) and the level of physical
functioning (e.g. SF-36 bathing vs. vigorous
activities)
Some limitations
Item difficulty, discrimination, and ability are
confounded
Sample dependent item difficulty estimates will
be different in different samples. Estimate of
ability is item dependent
Difficult to compare scores across two different
tests because not on same scale
Often, ordinal scale of measurement for test
Assumes equal errors of measurement at all levels
of ability

6
Item Response Theory

Complete theory of measurement and item selection
Theoretically, item characteristics are not
sample dependent estimates of ability are not
item dependent
Item scores on same scale as ability
Puts all individual scores on standardized,
interval level scale easy to compare between
tests and individuals

7
Item Response Theory

Assumes that a normally distributed latent trait
underlies performance on a measure
Assumes unidimensionality
All items measuring the same construct
Assumes local independence
Items are uncorrelated with each other when
ability is held constant
Given unidimensionality, any reponse to an item
is a monotonically increasing function of the
latent trait (item characteristic curve)

8
(No Transcript)
9
Example of item characteristic curves(Note the a
parameter 2.82 for the steep curve, 0.98 for the
shallow curve)
10
(No Transcript)
11
(No Transcript)
12
Differential Item Functioning
Assuming that the measured ability is
unidimensional and that the items measure the
same ability, the item curve should be unique
except for random variations, irrespective of the
group for whom the item curve is
plotted items that do not yield the same item
response function for two or more groups are
violating one of the fundamental assumptions of
item response theory, namely that the item
and the test in which it is contained are
measuring the same unidimensional trait
13
Possible DIF
14
Item Bias

Items may be biased against one gender,
linguistic, or social group
Can result in people being falsely identified
with problems or missing problems
Two elements in bias detection
Statistical detection of Differential Item
Functioning
Item review
If source of problems not related to performance,
then item is biased

15
DIF detection

Important part of test validation
Helps to ensure measurement equivalence
Scores on individual items are compared for two
groups
Reference
Focal group under study
Groups matched on total test score (ability)

16
DIF detection

DIF can be uniform or nonuniform
Uniform
Probability of correctly answering item correctly
is consistently higher for one group
Nonuniform
Probability of correctly answering item is higher
for one group at some points on the scale
perhaps lower at other points

17
Illustration of IRT with ADL and IADL Scales

The latent traits represent the ability to
perform self-care activities and instrumental
activities (necessary for independent living)
Item difficulty (b) the level of function
corresponding to a 50 chance of endorsing the
item
Item discrimination (a) slope of the item
characteristic curve, or how well it
differentiates low from high functioning people

18
3 models

One-parameter (Rasch) model provides estimates of
item difficulty only
Two-parameter model provides estimates of
difficulty and discrimination
Three-parameter model allows for guessing
IRT does have different methods for dichotomous
and polytomous item scales

19
IRT models dichotomous items

One parameter model
Probability correct response (given theta) 1/1
exp(theta item difficulty)
Two-parameter model
Probability correct response (given theta)
1/1 exp discrimination (theta item
difficulty)
Three parameter model
Adds pseudo-guessing parameter
Two parameter model is most appropriate for
epidemiological research

20
Steps in applying IRT

Step One Assess dimensionality
Factor analytic techniques
Exploratory factor analysis
Study ratio of first to second eigenvalues
(should be 31 or 41)
Also ?2 tests for dimensionality
Calibrate items
Calculate item difficulty and discrimination and
examine how well model fits
?2 goodness of fit test
Compare goodness of fit between one-parameter and
two-parameter models
Examine root mean square residual (values should
be lt 2.5)

21
Steps in IRT continued

Score the examinees
Get item information estimates
Based on discrimination adjusted for standard
error
Study test information
If choosing items from a larger pool, can discard
items with low information, and retain items that
give more information where it is needed

22
Item Information

Item information is a function of item difficulty
and discrimination. It is high when item
difficulty is close to the average level of
function in the group and when ICC slope is steep

23
The ADL scale example

Caregiver ratings of ADL and IADL performance for
1686 people
1048 with dementia and 484 without dementia
1364 had complete ratings

24
ADL/IADL example

Procedures
Assessed dimensionality. Found two dimensions
ADL and IADL
Assessed fit of one-parameter and two parameter
model for each scale
Two-parameter better
Only 3 items fit one-parameter model
Sig. improvement in ?2 goodness of fit
Used two-parameter model to get item statistics
for 7 ADL items and 7 IADL items

25
ADL/IADL

Got results for each item difficulty,
discrimination, fit to model
Results for item information and total scale
information

26
Example of IRT with Relatives Stress Scale

The latent trait (theta) represents the intensity
of stress due to recent life events
Item severity or difficulty (b) the level of
stress corresponding to a 50 chance of endorsing
the item
Item discrimination (a) slope of the item
characteristic curve, or how well it
differentiates low from high stress cases
Item information is a function of both high when
(b) is close to group stress level and (a) is
steep

27
Stress Scale Item Information

item information is a function of item difficulty
and discrimination. It is high when item
difficulty is close to group stress level and
when ICC slope is steep

28
Stress Scale Item Difficulty

Item severity or difficulty (b) indicates the
level of stress (on theta scale) corresponding to
a 50 chance of endorsing the item

29
Stress Scale Item Discrimination

item discrimination reflected in the slope of the
item characteristic curve (ICC) how well does
the item differentiate low from high stress cases?

30
Example of developing Index of Instrumental
Support

Community Sample CSHA-1
Needed baseline indicator of social support as it
is important predictor of health
Concept Availability and quality of
instrumental support
Blended IRT and classical methods

31
Sample

8089 people
Randomly divided into two samples
Development and validation
Procedures
Item selection and coding
7 items

32
Procedure

IRT analyses
Tested dimensionality
Two-parameter model
Estimated item parameters
Estimated item and test information
Scored individual levels of support

33
External validation

Internal consistency
Construct validity
Correlation with size of social network
Correlation with marital status
Correlation with gender
Predictive validity

34
Empirical comparison of IRT and CTT in scale
validation

Few studies. So far, proponents of IRT assume it
is better. However,
IRT and CTT often select the same items
High correlations between CTT and IRT difficulty
and discrimination
Very high (0.93) correlations between CTT and IRT
estimates of total score

35
Empirical comparisons (contd)

Little difference in criterion or predictive
validity of IRT scores
IRT scores are only slightly better
When item discriminations are highly varied, IRT
is better
IRT item parameters can be sample dependent
Need to establish validity on different samples,
as in CTT

36
Advantages of IRT

Contribution of each item to precision of total
test score can be assessed
Estimates precision of measurement at each level
of ability and for each examinee
With large item pool, item and test information
excellent for test-building to suit different
purposes
Graphical illustrations are helpful
Can tailor test to needs For example, can
develop a criterion-referenced test that has most
precision around the cut-off score

37
Advantages of IRT

Interval level scoring
More analytic techniques can be used with the
scale
Ability on different tests can be easily compared
Good for tests where a core of items is
administered, but different groups get different
subsets (e.g., cross-cultural testing, computer
adapted testing)

38
Disadvantages of IRT