Title: Item Response Models
1Item Response Models
- Assumptions
- Basic statistical models
- Caveats and interpretations
Harvey Goldstein University of Bristol h.goldstein
_at_bristol.ac.uk
2Basic assumptions
- Consider a test with p binary (correct/incorrect)
responses - Each item is assumed to reflect one or more
underlying (latent) dimensions of achievement
or ability .. - So
- Let us start with an assumed 1-dimensional test,
say of mathematics with 40 items. - How do we get a value (score) on the mathematics
scale from a set of 40 (1/0) responses from each
individual? - Well. we set up a model
3Some simple models
- First some basic notation (following Goldstein
and Wood, 1989) - Let represent the latent (factor) score
for individual j. Let be the probability
that individual j responds correctly to item i. - Then a simple item response model is
This is just a binary response factor analysis
model.
Goldstein, H. and Wood, R. (1989). Five decades
of item response modelling. British Journal of
mathematical and statistical psychology 42
139-167.
4A potted history
- Lawley (1944) really started it off.
- Lord (1980) promoted the term item response
theory as opposed to classical item analysis - Now IRT is the standard procedure for test
construction - Note that the theory is statistical not
substantive. - Technical elaborations include
- parameters for guessing
- Partial credit (degrees of correctness) responses
- Multidimensional models
- BUT the workhorse is still the Lord model (with
the factor assumed to be a random rather than
fixed variable), as follows
Lord, F. M. (1980). Applications of item response
theory to practical testing problems. Hillsdale,
New Jersey, Lawrence Erlbaum Associates Lawley,
D. N. (1943). The application of the maximum
likelihood method to factor analysis. British
Journal of Psychology 33 172-175.
5Classical item analysis
- This is really an item response model (IRM)
- A reasonable (consistent) estimate of (a
random variable - so in red) is given by the raw
score i.e. percentage (or total) of correct
items. - A somewhat more efficient estimate is given by a
weighted percentage, using the as weights. - The Lord model is simply
6Item response relationships
For a single item in a test
7The Rasch model
- As used in PISA for example
- Here the discrimination (roughly the
correlation between the response for an item and
the factor value) is assumed to be the same for
each item - The resulting (maximum likelihood) factor score
estimates are then a 1 1 transformation of the
raw scores. - So Rasch Model is a special case and will often
(e.g. in PISA) not fit the data very well.
8What are the advantages of modelling?
- We can add further predictors, for example social
background, that may mediate the relationships. - If we can rely on the model then we will obtain
efficient estimates for each individuals factor
value. - Item response practitioners go further
- If we assume that the item parameters (
) are the same across populations, and, for
example, tests, then we can form common scales
for different populations and different tests.
9Applications
- Consider the case of different populations - or
the same population at different times. - Suppose that we require different tests for each
population (e.g. for confidentiality reasons) but
some items are common (say 15). - These items are assumed to retain the same
parameter values in each population, and this
means we can equate the tests to provide a common
scale the parameter values of the non-common
items are determined by linking them to those
of the common items.
10Caveats
- When linking different tests over time the
parameter constancy assumption is difficult to
test, and typically remains an assumption. - In some cases, e.g. the NAEP reading anomaly, the
assumption can be falsified.
11Item analysis
- IRMs also used to check items. Those that dont
fit the model being used are candidates for
removal. - A problem with this is that what remains conforms
to the model but if the model cannot be relied on
to describe reality we may be losing important
information. - One way a model may be misspecified is because
the reality is multidimensional.
12Multidimensional IRMs
- We can generalise our logistic model as follows
- Adding a further factor allows an individual to
be characterised by two underlying traits. - A sensible analysis will explore the
dimensionality structure of a set of item
responses - Assumptions are needed, for example that factors
are independent, or alternatively that they are
correlated but each item has a non-zero
coeffcient (loading) on only 1 factor or an
intermediate assumption. - What are the consequences of a more complex
structure? -
13- It allows a more faithful representation of
multi-faceted achievement. - It allows the (multidimensional) structure of
achievement to be compared among groups or
populations.in the following ways
- The correlations between factors can vary
- The values of loadings can vary
- The factor scores can be allowed to depend on
further variables such as gender and the
resulting regressions may vary. For example
With extensions to multilevel modelling etc. a
structural equation model.
14Another assumption and an extension
- In all these models we have to assume
conditional independence that is that for any
given individual the response to an item depends
only on the model parameters and is independent
of the responses to other items. - This may break down in several ways and is a
persistent problem with these models. - One violation is where a series of
(correct/incorrect) responses relate to the same
question scenario. In such cases we can
reformulate the set of responses as an ordered
(partial credit) response and such response types
are easily incorporated into the model.
15Conclusions
- Formalising a test in terms of an underlying
model helps to clarify what is being measured. - These models can incorporate group differences
and general dependencies that can be explored and
efficient and valid statistical analyses
undertaken. - The full complexity of achievement responses can
be summarised in a small number of parameters
using a full structural (and multilevel) approach
without the need to adopt a very simple model
such as the Rasch model. - If we wish to make the necessary assumptions and
carry out equating this can be done more
realistically using a multidimensional model.