DIFFERENTIAL ITEM FUNCTIONING AND COGNITIVE ASSESSMENT USING IRTBASED METHODS - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

DIFFERENTIAL ITEM FUNCTIONING AND COGNITIVE ASSESSMENT USING IRTBASED METHODS

Description:

... dysfunction, Spanish speaking respondents are more likely than English speaking ... DIF in earlier stages of the analyses, can convert to non-DIF with the use of a ... – PowerPoint PPT presentation

Number of Views:230
Avg rating:3.0/5.0
Slides: 43
Provided by: douglash150
Category:

less

Transcript and Presenter's Notes

Title: DIFFERENTIAL ITEM FUNCTIONING AND COGNITIVE ASSESSMENT USING IRTBASED METHODS


1
DIFFERENTIAL ITEM FUNCTIONING AND COGNITIVE
ASSESSMENTUSING IRT-BASED METHODS
  • Jeanne Teresi, Ed.D., Ph.D.
  • Katja Ocepek-Welikson, M.Phil.

2
PART I OVERVIEWJeanne Teresi, Ed.D., Ph.D.
3
  • A recent report on national healthcare
    disparities (DHHS, Agency for Healthcare Research
    and Quality, National Healthcare Disparities
    Report, 2003) concluded that
  • Disparities in the health care system are
    pervasive
  • Racial, ethnic and socioeconomic disparities are
    national problems that affect health care
  • Differential item functioning analyses are
    important in health disparities research

4
USES OF DIF ANALYSES
  • EVALUATE EXISTING MEASURES
  • DEVELOP NEW MEASURES THAT ARE AIMED TO BE
  • Culture Fair
  • Gender Equivalent
  • Age invariant

5
DIF METHODS
  • There are numerous review articles and books
    related to DIF. A few are
  • Camilli and Shepard, 1994
  • Holland and Wainer 1993
  • Millsap and Everson, 1993
  • Potenza and Dorans, 1995
  • Thissen, Steinberg and Wainer, 1993

6
DEFINITIONS
  • DIF INVOLVES THREE FACTORS
  • Response to an item
  • Conditioning/matching cognitive status variable
  • Background (grouping) variable(s)
  • DIF can be defined as conditional probabilities
    or conditional expected item scores that vary
    across groups.

7
CONTROLLING FOR LEVEL OF COGNITIVE STATUS, IS
RESPONSE TO AN ITEM RELATED TO GROUP MEMBERSHIP?
A randomly-selected person of average cognitive
function interviewed in Spanish should have the
same chance of responding in the unimpaired
direction to a cognitive status item as would a
randomly selected person of average function
interviewed in English
8
EXAMPLE
  • Contingency table that examines the
    cross-tabulation of item response by group
    membership for every level (or grouped levels) of
    the attribute estimate

9
Two by two contingency table for item Does not
State Correct State by language groups,
conditioning on the MMSE summary score (score
levels 8 to 12)
10
UNIFORM DIF DEFINITIONS
  • DIF is in the same direction across the entire
    spectrum of disability (item response curves for
    two groups do not cross)
  • DIF involves the location (b) parameters
  • DIF is a significant main (group) effect in
    regression analyses predicting item response

11
(No Transcript)
12
  • The probability of a randomly selected Spanish
    speaking person of mild cognitive dysfunction
    (theta 0) responding incorrectly to the item
    Does not State Correct State is higher (.45)
    than for a randomly selected English speaking
    person (.09) at the same cognitive dysfunction
    level. (Given equal cognitive dysfunction,
    Spanish speaking respondents are more likely than
    English speaking respondents to make an error.)

13
NON-UNIFORM DIF
  • An item favors one group at certain disability
    levels, and other groups at other levels (or the
    probability of item endorsement is higher for
    group 1 at lower ability and higher for group 2
    at higher ability)
  • DIF involves the discrimination (a) parameters
  • DIF is a significant group by ability interaction
    in regressions predicting item response
  • DIF is assessed by examination of nested models
    comparing differences in log-likelihoods

14
(No Transcript)
15
(No Transcript)
16
MAGNITUDE
  • Magnitude of DIF
  • Item level characteristic, e.g.,
  • odds ratio,
  • area statistic,
  • beta coefficient or R square increment,
  • expected item scores

17
(No Transcript)
18
IMPACT
  • Impact in the context of cognitive measures
  • Differences in the cognitive status distributions
    and summary statistics between or among studied
    groups
  • Group differences in the total (test) response
    function
  • Group differences in relationship of demographic
    variables to cognitive status variables with and
    without adjustment for DIF

19
(No Transcript)
20
IRT-BASED METHODS
  • Likelihood ratio test based on IRT (Thissen,
    1991, 2001)
  • Based on examination of differences in fit
    between compact and augmented models that include
    additional free parameters representing
    non-uniform and uniform DIF
  • Latent conditioning variable

21
SOME ADVANTAGES OF IRTLR
  • Well-developed theoretical models
  • Can examine uniform and non-uniform DIF
  • No equating required because of simultaneous
    estimation of group parameters
  • Can model missing data
  • Simulations show superior performance (in terms
    of power, particularly with small sample sizes)
    in comparison with non-parametric methods (Bolt,
    2002)

22
POSSIBLE DISADVANTAGES OF IRTLR
  • Model must fit the data misfit results in
  • Type I error Inflation (Bolt, 2002)
  • Requires categorical group variable
  • Assumptions must be met
  • Magnitude measures not as well-integrated
  • No formal magnitude summary measure or
  • guidelines

23
AREA AND DFIT METHODS
  • Area and DFIT methods based on IRT model with
    latent conditioning variable (Raju and
    colleagues, 1995 Flowers and colleagues, 1999)
  • Non-compensatory DIF (NCDIF) indices
  • average squared differences in item true or
    expected raw scores for individuals as members of
    the focal group and as members of the reference
    group
  • (expected score is the sum of the (weighted)
    probabilities of category endorsement,
    conditional on disability).
  • Differential test functioning (DTF)
  • based on the compensatory DIF (CDIF) index and
    reflects group differences summed across items

24
SOME ADVANTAGES OF DFIT
  • Can detect both uniform and non-uniform DIF, and
    shares the advantages of IRT models upon which it
    is based
  • Magnitude measures used for DIF detection
  • Impact of item DIF on the total score is examined
  • One simulation study (in comparison with IRTLR)
    showed favorable performance in terms of false
    DIF detection (Bolt, 2002)

25
SOME DISADVANTAGES OF DFIT
  • Requires parameter equating
  • Many programs required for DIF testing
  • Model misfit will result in false DIF
  • detection
  • ?2 statistical tests affected by sample
  • size, and identification of optimal
    cut-points
  • for DIF detection requires further simulation

26
DIFFERENCES AMONG DIF METHODS CAN BE
CHARACTERIZED ACCORDING TO WHETHER THEY
  • Are parametric or non-parametric
  • Are based on latent or observed variables
  • Treat the disability dimension as continuous
  • Can model multiple traits
  • Can detect both uniform and non-uniform DIF
  • Can examine polytomous responses
  • Can include covariates in the model
  • Must use a categorical studied (group variable)

27
CONCLUSIONS
  • DIF cancellation at the aggregate level may still
    have an impact on an individual
  • DIF assessment of measures remains a critical
    component of health disparities research, and of
    efforts to achieve cultural equivalence in an
    increasingly, culturally diverse society

28
PART II STEPS IN IRTLRDIF ANALYSISKatja
Ocepek-Welikson, M.Phil.
29
IRTLRDIF ANALYSIS
The underlying procedure of IRTLRDIF is a series
of comparisons of compact and augmented models.
Likelihood ratio tests are used for comparison
resulting in goodness of fit statistic G2
distributed as a ?2
30
STEP 1 NO ANCHOR ITEMS DEFINED
STEP 1a The first comparison is between a model
with all parameters constrained to be equal for
the two groups, including the studied item, with
a model with separate estimation of all
parameters for the studied item. IRTLRDIF is
designed using stringent criteria for DIF
detection, so that if any model comparison
results in a ?2 value greater than 3.84 (d.f.
1), indicating that at least one parameter
differs between the two groups at the .05 level,
the item is assumed to have DIF.
31
STEP 1b If there is any DIF, further model
comparisons are performed STEP 1c Two-parameter
models, test of DIF in the a parameter the
model with all parameters constrained is compared
to a model in which the a parameter (slope or
discrimination) is constrained to be equal and
the b parameter (difficulty or threshold) is
estimated freely
32
STEP 1d The same concepts are followed with
respect to the b parameters test of DIF. The
a parameters are constrained equal and the b
parameters are free to be estimated as different.
The G2 for this last model is derived by
subtraction of the G2 for evaluation of the a
parameters from the overall G2 value evaluating
any difference (G2 all equal - G2 a's equal).
33
STEP 2 ANCHOR ITEM SET
  • For all models, all items are constrained to be
    equal within the anchor set
  • Anchor items are defined as those with the G2
    cutoff value of 3.84 or less for the overall test
    of all parameters equal versus all parameters
    free for the studied item (for a dichotomous item
    under the 2p model the d.f. 2)

34
ANCHOR ITEM SET, cont.
This may result in the selection of a very small
anchor set for some comparisons. Therefore, these
criteria may be relaxed somewhat, and the results
of the individual parameter estimates examined
rather than the overall result. If significant
DIF is observed for the a's or b's using
appropriate degrees of freedom, then the item
will be excluded from the anchor set.
35
FINAL ANCHOR ITEM SET
Even if anchor items were identified prior to the
analyses using IRTLRDIF, additional items with
DIF may be identified. All of the items in the
anchor test are again evaluated, following the
procedures described in step 1, in order to
exclude any additional items with DIF, and to
finalize the anchor set.
36
STEP 3 FINAL TESTS FOR DIF
After the anchor item set is defined, all of the
remaining (non-anchor) items are evaluated for
DIF against this anchor set. Some items that have
been identified as having DIF in earlier stages
of the analyses, can convert to non-DIF with the
use of a purified anchor set. (It is noted that
the studied item is modeled along with the anchor
items, so that parameter estimates are based on
the anchor item set with inclusion of the studied
item.)
37
STEP4 ADJUSTMENT FOR MULTIPLE COMPARISONS
Items with values of G2 indicative of DIF in this
last stage are subject to adjustment or p values
for multiple comparisons used in order to reduce
over-identification of items with DIF.
Bonferroni, Benjamini-Hochberg or other
comparable method to control for false discovery
can be used.
38
STEP 5 MULTILOG RUN TO OBTAIN FINAL PARAMETER
ESTIMATES
  • In order to obtain the final item parameter
    estimates, an additional MULTILOG run has to be
    performed
  • Parameters are estimated simultaneously for two
    groups
  • Parameters for anchor items are set to be
    estimated equal for two groups
  • Parameters for items with DIF are estimated
    separately (if only b parameters show DIF, as
    are set as equal)

39
SUMMARY OF STEPS IN DFIT ANALYSIS
  • Perform an assessment of dimensionality
  • Perform IRT analyses to obtain parameters and
    disability estimates perform analyses separately
    for each group (both PARSCALE and MULTILOG can be
    used)
  • Equate the parameters (Bakers EQUATE program
    was used in this step)

40
DFIT STEP, cont.
  • Identify DIF using DFIT (DFIT5P was used)
  • Identify anchor items that are relatively
    DIF-free, using NCDIF cutoffs rather than ?2
    significance tests that are available
  • Purify the equating constants by re-equating
  • Perform DFIT again

41
DFIT STEP, cont.
  • Examine the NCDIF cutoffs to determine items with
    DIF
  • Examine CDIF and DTF to determine if values
    exceed the cutoff, indicating differential test
    (scale) functioning
  • If DTF gt the cutoff, examine the removal index to
    identify items that might be removed

42
DFIT STEP, cont.
  • Calculate expected item scores sum the expected
    item scores to obtain an expected test (scale)
    score for each group, separately
  • Plot the expected scale scores against theta
    (disability) for each group
Write a Comment
User Comments (0)
About PowerShow.com