Title: Why do something simple, when it
1Why do something simple, when its just as easy
to do something complicated?
- Mark Wilson
- UC Berkeley
- April 2006
2- Or, why would someone use a simple model like
the Rasch model, when a more complicated model is
just as easy to run? - After all, a more complicated model will almost
certainly fit better (because it has more
parameters). - Hence it will allow you to delete fewer items due
to misfit--maybe delete none. - Observation It is very hard to convince
- subject matter experts,
- policy-types,
- business people, etc.
- that deleting items for misfit is important.
3Outline
- Why deleting misfitting items is important
- Some reasons that have been offered
- Importance of Wright maps etc. for interpretation
- 4 building blocks, etc.
- But they are still uncertain, so will give up
interpretation advantages of Rasch models - examples of worries
- Strategy limited items approach
- Expanding the scenarios
- tactics
- Conclusion
4Some reasons that have been offered for deleting
misfitting items
- Because its philosophically more sound.
- i.e., (i) to get specific objectivity
- (ii) to get separation of variables
- Contra (i) most practitioners dont care
- (ii) other peoples philosophies
- may lead to other conclusions
-
5- Because tests are part of the designed world,
- not the natural world, and
- linear models are instances of better design
than non-linear ones. - i.e., (i) why are bricks almost always
rectangular prisms? (rather than, say,
rhomboid prisms?) - (ii) why do people use linear models in
- regression so much
- Contra (i) dont care about good design if it
costs more in item development
6- Because its easier to explain to people
- Contra (i) no one understands these complicated
- formulae anyway, so you can have as
- many parameters as you want
- (ii) no need for explanation, as the results
of - psychometric modeling dont need
- to be understood, all that matters are
- their technical characteristics
- - i.e., if the items are modeled in a more
- complicated way, that must be more
- true.
7- Because you want to interpret results using
something equivalent to a Wright Map -
8(No Transcript)
9What does distance between item responses mean?
- The idea of "location" of an item response with
respect to the location of another item response
only makes sense if that relative meaning is
independent of the location of the respondent
involved - i.e., the interpretation of relative locations
needs to be uniform no matter where the
respondent is.
10(No Transcript)
11Another way to put this
- meaning is the same no matter where you are on
the map - e.g., an "inch represents a mile" wherever you
are on the map - One consequence of this is that the order (on the
map) of the item responses must remain the same
for all respondents - and that the order of the respondents (on the
map) must remain the same for all item responses.
- Note this is equivalent to double stochastic
ordering (a concept used in non-parametric
models)
12But requirement is strongernot just order is
preserved, but metric properties too. In a
picture...
13(No Transcript)
14reprise
- If people just want a number, with certain
technical characteristics, - its hard to convince them that they should
delete items for misfit. - If people just want to do what ETS/CTB/etc. does,
- its hard to convince them that they should
delete items for misfit. - If people want to save money by including all
items, - its hard to convince them that they should
delete items for misfit.
15BUT
- If people want to be able to interpret their
results, - then you have an in.
16SO
- Explain about
- Wright Maps
- 4 Building Blocks
- (See Wilson, M. (2005). Constructing Measures An
Item Response Modeling Approach. Mahwah, NJ
Erlbaum.) - Etc.
17Now
- suppose you have convinced them that
interpretation matters - and that the Rasch approach with Wright maps etc.
is the best way to go - BUT, they still have to deal with issues such as
those above
18Examples of their worries
- need to include items for historical reasons
- e.g., they are all we have
- need to include items for technical reasons
- e.g., they are only items left to represent
certain categories in a linking - need to include items because they love em
- e.g., they have certain content urges
- need to not exclude items due to misfit, because
they just cant understand or accept that they
should - e.g., 3PL true
19Thus, for any and all of above reasons, they
want to include 2-p or 3-p (or other) items
20One Strategy limited items approach
- Identify which items do fit Rasch-family models,
call them R items - Identify which items do not fit Rasch-family
models, call them L (limited) items - limited b/c they are used for limited purposes
- And assumes that they are limited in number too
- developed with Claus Carstensen of IPN, Kiel.
21Then
- Calibrate with R items using Rasch-family models
- Anchor R items, calibrate R and L items together
using Rasch OPLM-like models - (e.g., OPLM, SAS NLMixed)
- (Research Question tests for unidimensionality
of R and L)
22Thus,
- For interpretation, use R items only
- For accuracy (e.g., smaller sem) use R and L
items. - E.g., estimate a persons q using all items,
- use R items to develop construct validity,
criterion-referencing, etc. - In a picture
231 and 3 for interpretation, 1, 2, 3 for
estimation
Research Questions need to establish acceptance
rules for which items to calibrate, which to
map, etc.
24Expanding the scenarios
- (A) They have used a Rasch-family technique in
the scaling/equating/etc, - Example In PISA, facets (LLTM)-like parameters
are used to control for booklet effects. - Example Facets (LLTM)-like parameters are used
to control for harshness/leniency effects in (a
fixed set of) raters.
25Tactics
- In case where rater and/or booklet effects apply
only to Rasch-family items, then use approach
described above - its a bit more complicated, but its the same
general idea. - In case where these effects apply to non-Rasch
items, its more difficult. - Maybe re-design non-Rasch items so they dont
involve booklets and/or raters - but need to know ahead to do that.
- Otherwise, its a research question
- either adapt the models, or delete the non-Rasch
items.
26(B) Suppose they have used a non-Rasch technique
in the scaling/equating/etc.
- Example Historically, non-Rasch items have been
used in the equating, so need to be maintained. - Example A longitudinal scaling has used
non-Rasch items, so scale needs to be maintained.
27Tactics
- (i) In any set of non-Rasch items, there will be
a Rasch-like core, identify that - e.g., select largest clump of items with slope
params. within 1 std error of one another, - then select subset with low (fit) impact of
lower asymptote.
28Tactics
- (ii) If that set is large and comprehensive
enough, problem solved. If it is not large or
comprehensive enough, then either - (a) develop Rasch items parallel to non-Rasch
items, checking empirically and/or judgmentally
for match, or - (b) use non-Rasch items as limited items, as
before
29Conclusion
- Going further afield, the non-Rasch
characteristics may be non-IRT elements, - e.g., standards set by non-scale method such as
Angoff. - Needs creativity
- map the Angoff standards onto Wright Map
- critique or accept standards based on that
perspective
30Conclusion
- Possible to achieve practical aims of Rasch
approach - e.g., interpretation via Wright maps,
- While adapting to non-Rasch environment.
- e.g., using limited items in scaling.
- Complications can be dealt with
- e.g., facets (LLTM)-like situations can be
included in tactics - e.g., non-Rasch scaling could be adapted.