Title: Identifying Causes of Differential Item Functioning Using Optimal Appropriateness Measurement
1Identifying Causes of Differential Item
Functioning Using Optimal Appropriateness
Measurement
- Sasha Chernyshenko, Stephen Stark,
- and Fritz Drasgow
- University of Illinois at Urbana-Champaign
2Research Issue
- DIF may bias ability estimates and adversely
affect hiring decisions. - When DIF is found, practitioners are often
advised to eliminate or replace suspect items. - But, item writing is time consuming and expensive
- No guarantee that revised items would not exhibit
DIF - Thus, before revising a test, one should attempt
to identify potential sources of DIF.
3Causes of DIF
- DIF occurs when subgroups differ on secondary
dimensions that are unaccounted for by
unidimensional models - Potential sources of DIF include
- Educational background
- Test-taking strategies
- Unmotivated responding
4Overview of this Study
- Examined unmotivated responding as a potential
source of DIF on national licensing exam - Unmotivated responding was modeled using optimal
appropriateness measurement (OAM) methods - DIF results were compared before and after
removing examinees who were identified as
unmotivated
5Factors that Affect Test-Taking Motivation
- Beliefs about test validity and fairness
- Characteristics of selection/certification
process - In compensatory systems, good performance on one
exam can make up for poor performance on another - For professional licensing, one must usually
demonstrate competency in various subdomains - Multiple exams required
- Not all exams must be passed at same time
- Typically, a window of several months is allowed
for passing
6Factors that Affect Test-Taking Motivation
Professional Licensing Exam
- Licensing exam consists of 4 subtests
- To become certified, candidates must pass all
subtests in an 18 month window - Must pass two subtests and earn minimal scores on
others, or all exams must be retaken - Offers incentives for examinees to engage in
strategic preparation - Respondents may be unmotivated on two subtests
7Unmotivated Responding Affects Psychometric
Properties of Exam
- Affects classical test theory statistics and IRT
item parameters - Increases exam dimensionality
- Contributes to differential item/test functioning
8Identifying Unmotivated Examinees
- Optimal appropriateness measurement can be used
to identify unmotivated examinees (OAM Levine
Drasgow, 1988) - Method
- Specify models for normal and unmotivated
responding - For each examinee, compute marginal likelihood of
response pattern for each model - Get likelihood ratio (LR), and classify examinee
as motivated or unmotivated
9Marginal Likelihood for Normal Model
- Assume a single, general ability underlies
performance on all four subtests
where
is a standard normal density function, and
n is the number of items in a subtest
10Marginal Likelihood for Aberrant Model
- Assume examinee is unmotivated on two of four
subtests, and, thus, responds based on two
separate abilities - m1 and m2 represent numbers of items on highest
subtests - m3 and m4 represent numbers of items on lowest
subtests
11OAM Analyses of Licensing Exam Data
- Data N40,029
- Estimated 3PLM item parameters using BILOG
- Computed LR value for each examinee
- Based on simulation study, chose LR10 as cut
score for classification (1 FP) - If LRgt10, then unmotivated
12DIF Analyses
- Randomly sampled groups of White and Black
examinees (N1600) for one subtest (121 items) - Used ITERLINK program to link the metrics and
compute Lords chi-square DIF statistics - Removed 440 examinees with LRgt10
- Repeated DIF analyses using only motivated
examinees (N1160) - To control for sample size sensitivity,
- Redone DIF analyses with random samples of
N1160.
13Results and Conclusions
- Results
- Initial sample (N1600) 57 DIF items
- Motivated only (N1160) 20 DIF items
- Undifferentiated (N1160) 32 DIF items
- Conclusions
- Unmotivated responding increased the number of
items identified as problematic - If 5 FP rate had been chosen, the number of DIF
items would have decreased further
14Implications
- Findings of DIF dont necessarily indicate
problems with item content - Extraneous factors may induce aberrant
responding (e.g., low motivation, faking) - When considering test revision, one should
- Remove aberrant examinees before analyzing the
psychometric properties of a test - Attempt to control factors that contributed to
aberrant responding