Identifying Causes of Differential Item Functioning Using Optimal Appropriateness Measurement

1 / 14
About This Presentation
Title:

Identifying Causes of Differential Item Functioning Using Optimal Appropriateness Measurement

Description:

Remove aberrant examinees before analyzing the psychometric properties of a test. Attempt to control factors that contributed to aberrant responding ... –

Number of Views:54
Avg rating:3.0/5.0
Slides: 15
Provided by: Stephe4
Category:

less

Transcript and Presenter's Notes

Title: Identifying Causes of Differential Item Functioning Using Optimal Appropriateness Measurement


1
Identifying Causes of Differential Item
Functioning Using Optimal Appropriateness
Measurement
  • Sasha Chernyshenko, Stephen Stark,
  • and Fritz Drasgow
  • University of Illinois at Urbana-Champaign

2
Research Issue
  • DIF may bias ability estimates and adversely
    affect hiring decisions.
  • When DIF is found, practitioners are often
    advised to eliminate or replace suspect items.
  • But, item writing is time consuming and expensive
  • No guarantee that revised items would not exhibit
    DIF
  • Thus, before revising a test, one should attempt
    to identify potential sources of DIF.

3
Causes of DIF
  • DIF occurs when subgroups differ on secondary
    dimensions that are unaccounted for by
    unidimensional models
  • Potential sources of DIF include
  • Educational background
  • Test-taking strategies
  • Unmotivated responding

4
Overview of this Study
  • Examined unmotivated responding as a potential
    source of DIF on national licensing exam
  • Unmotivated responding was modeled using optimal
    appropriateness measurement (OAM) methods
  • DIF results were compared before and after
    removing examinees who were identified as
    unmotivated

5
Factors that Affect Test-Taking Motivation
  • Beliefs about test validity and fairness
  • Characteristics of selection/certification
    process
  • In compensatory systems, good performance on one
    exam can make up for poor performance on another
  • For professional licensing, one must usually
    demonstrate competency in various subdomains
  • Multiple exams required
  • Not all exams must be passed at same time
  • Typically, a window of several months is allowed
    for passing

6
Factors that Affect Test-Taking Motivation
Professional Licensing Exam
  • Licensing exam consists of 4 subtests
  • To become certified, candidates must pass all
    subtests in an 18 month window
  • Must pass two subtests and earn minimal scores on
    others, or all exams must be retaken
  • Offers incentives for examinees to engage in
    strategic preparation
  • Respondents may be unmotivated on two subtests

7
Unmotivated Responding Affects Psychometric
Properties of Exam
  • Affects classical test theory statistics and IRT
    item parameters
  • Increases exam dimensionality
  • Contributes to differential item/test functioning

8
Identifying Unmotivated Examinees
  • Optimal appropriateness measurement can be used
    to identify unmotivated examinees (OAM Levine
    Drasgow, 1988)
  • Method
  • Specify models for normal and unmotivated
    responding
  • For each examinee, compute marginal likelihood of
    response pattern for each model
  • Get likelihood ratio (LR), and classify examinee
    as motivated or unmotivated

9
Marginal Likelihood for Normal Model
  • Assume a single, general ability underlies
    performance on all four subtests

where
is a standard normal density function, and
n is the number of items in a subtest
10
Marginal Likelihood for Aberrant Model
  • Assume examinee is unmotivated on two of four
    subtests, and, thus, responds based on two
    separate abilities
  • m1 and m2 represent numbers of items on highest
    subtests
  • m3 and m4 represent numbers of items on lowest
    subtests

11
OAM Analyses of Licensing Exam Data
  • Data N40,029
  • Estimated 3PLM item parameters using BILOG
  • Computed LR value for each examinee
  • Based on simulation study, chose LR10 as cut
    score for classification (1 FP)
  • If LRgt10, then unmotivated

12
DIF Analyses
  • Randomly sampled groups of White and Black
    examinees (N1600) for one subtest (121 items)
  • Used ITERLINK program to link the metrics and
    compute Lords chi-square DIF statistics
  • Removed 440 examinees with LRgt10
  • Repeated DIF analyses using only motivated
    examinees (N1160)
  • To control for sample size sensitivity,
  • Redone DIF analyses with random samples of
    N1160.

13
Results and Conclusions
  • Results
  • Initial sample (N1600) 57 DIF items
  • Motivated only (N1160) 20 DIF items
  • Undifferentiated (N1160) 32 DIF items
  • Conclusions
  • Unmotivated responding increased the number of
    items identified as problematic
  • If 5 FP rate had been chosen, the number of DIF
    items would have decreased further

14
Implications
  • Findings of DIF dont necessarily indicate
    problems with item content
  • Extraneous factors may induce aberrant
    responding (e.g., low motivation, faking)
  • When considering test revision, one should
  • Remove aberrant examinees before analyzing the
    psychometric properties of a test
  • Attempt to control factors that contributed to
    aberrant responding
Write a Comment
User Comments (0)
About PowerShow.com