Title: Detection of Item Degradation
1Detection of Item Degradation
- Yongwei Yang
- Abdullah Ferdous
- Tzu-Yun Chin University
of Nebraska-Lincoln - In T. L. Hayes (chair), Item degradation impact,
detection, and mitigation, an academic-practitione
r collaborative forum conducted at the 22nd
annual conference of the Society of Industrial
and Organizational Psychology in New York City,
NY, April 2007.
2Item Degradation
- Item Degradation
- Items favorable psychometric characteristics
deteriorate over time - Psychometric characteristics
- Content relevance and representativeness
- Technical characteristics (e.g.,
difficulty/location, lack of bias) - Utility (e.g., item-criterion relationship)
- Item Degradation vs. Exposure/Compromise
- Item degradation observed phenomenon
- Item exposure/compromise
- Items have become known to test takers prior to
administration - Possible reasons for degradation
3Detection of Item Degradation
- Essentially it is about investigating the
comparability of items psychometric properties
over time - temporal stability of the psychometric
characteristics (Chan, Drasgow, Sawin, 1999) - Can be evaluated under the framework of
- Measurement invariance (MI Meredith, 1993)
- Predictive invariance (PI Millsap, 1995)
4Item Degradation as MI or PI
Let x be observed indicator that measures latent
w and predicts y, and v be some population
indicator
- Measurement Invariance (MI)
- Same relationship across populations between
observed indicators and the latent variables - Degradation ? noninvariance in such relationships
over time - Loading, location
- Predictive Invariance (PI)
- Same relationship across populations between
predictors and criterion - Degradation ? noninvariance in such relationships
over time - Indicator-criterion relationship
5Item Degradation Detection Methods
- Differential item functioning, item parameter
drift - Mean covariance modeling
- Assessing invariance in various aspects pertain
to measurement or predictive properties - Statistical process control
- Models of change
6Item Degradation Detection
- Differential item functioning, item parameter
drift - Mean covariance modeling
- Assessing invariance in various aspects pertain
measurement or predictive properties - Statistical process control
- Cumulative sum (CUSUM) procedure
- Models of change
7CUSUM for Item Degradation Detection
- Our approachConditional CUSUM
- Whether item parameters have deviated from target
- Make use of observed scores
- The importance of controlling for shifts in
traits level over time - Conditionaltest takers at different time
points were matched based on their total test
score - Procedures
- Initial Item Calibration
- Compute target item parameter (e.g., difficulty)
using the first n job applicants from the
operation sample - Define time group
- Every m applicants from the n1 applicant to the
last person under investigation - Define trait group (conditioning variable)
- Divide job applicants into groups of reasonable
size based on total test scores - Compute and plot CUSUM statistics for each trait
group separately
8Conditional CUSUMCalculation
- Two-sided Standardized CUSUM
- Reference value (k) and Control limit (h)
9Conditional CUSUMData Source
- A web-based personnel selection assessment for
selecting managers - 103 items measuring job-related non-cognitive
attributes - CTT-based test construction and scoring
- Fixed-length, linear test
- Unproctored
- Sample
- Job applicants from Oct. 2002 to Sept. 2005
- Re-taker excluded
- Total N 7,000
10Conditional CUSUMResults
- Among the 103 items
- 36 flagged for upward shift in item means for at
least one trait group - 20 flagged for downward shift in item means for
at least one trait group - 9 flagged for having both upward and downward
shifts for different trait groups - 38 not flagged for any trait group
- A couple examples it035, it174
- Follow-up analysis
- Were there differences across item types with
respect to the likelihood of being flagged by
conditional CUSUM?
11Conditional CUSUMFollow-up
- Multinomial logistic regression
- DV condition CUSUM flag 3 categories Not
Flagged as the reference category - IV ability (6 levels), item type (3 levels,
multiple choice (MC) as the reference group
- Results
- GOF statistic indicates appropriate fit of the
main effect model (X216.83, df20, p.664) - The impact of ability levels on the CUSUM flags
was not statistically significant (X213.48,
df10, p.198) - The impact of item type on the CUSUM flags was
statistically significant (X217.83, df4,
p.001). - MC items were more likely to be flagged by
conditional CUSUM for negative shifts - Forward items were more likely to be flagged by
conditional CUSUM for positive shifts
12Model of Change
- Perspective 1
- Understanding patterns of change using examinee
characteristics - Do the trajectories of item parameter change vary
across different types of examinees? - Applicant location, SES, demographics, etc.
- Perspective 2
- Understanding patterns of change using item
characteristics - Do the trajectories of item parameter change vary
across different types of items? - Item format, complexity, content area, etc.
- Formulating these questions in a longitudinal
analysis framework
13Perspective 1 Example
- Using a 2-level longitudinal model to explore
- RQ1 On average, was there a shift in item
difficulty? - RQ2 Were there variations in the slope of the
shift? - (If Yes to RQ2) RQ3 Could the variations be
explained by job applicants characteristics
(e.g., trait level, region, etc.)?
- The model
- Analysis with item 174
- RQ1 significant positive slope
- RQ2 non-significant variations
- RQ3 not pursued
14Perspective 2 Example
- Using a 2-level longitudinal model to explore
- RQ1 Across items, on average was there a change
in item difficulty over time? - RQ2 Were there variations in the slope of the
change across items? - (If Yes to RQ2) RQ3 Could the variations be
explained by item characteristics?
15Perspective 2 Example
- Model B
- Analysis with this data set
- RQ3 item type did not explain a significant
portion of the variations in slopes
- Model A
- Analysis with this data set
- RQ1 average slope across items was not different
from zero - RQ2 significant variations in slopes across items
16Summary and Discussions
- Two types of methods that serve different
purposes - Statistical process control (e.g., CUSUM)
- Real-time monitoring of degradation
- We illustrated conditional CUSUM procedure, but
other methods exist (e.g., an IRT-based moving
residual approach by Han Hambleton 2004) - Explicit modeling of patterns of degradation
- Understanding the nature of degradation,
exploring potential factors that impact
degradation, assisting the development of
prevention and mitigation procedures - We illustrated longitudinal modeling methods, but
various methods for studying MI/PI may be applied - These methods can also be used in monitoring and
understanding degradation in other parameters
(e.g., item variance, discrimination, response
time) - It might be helpful to monitor/model multiple
parameters simultaneously to (1) flag items
more accurately and, (2) understand factors
behind degradation
17Summary and Discussions
- Understanding temporal stability of measurement
properties is essential to - Valid decisions based on test scores
- Valid inferences in substantive research based on
assessment outcomes - Research on Flynn effect (e.g., Wicherts et al.,
2004) - Further research is needed, such as
- What monitoring approaches would better fit
personnel selection assessment programs? - What would lead to or impact degradation?
- How would item-level degradation impact
test-level decisions and inferences? - Etc.
18Some Useful References
- MI PI Concepts
- Mellenbergh (1989)
- Meredith (1993)
- Millsap (1995)
- Various IPD and Item Exposure Detection Methods
- Bock, Muraki, Pfeiffenberger (1988)
- Chan, Drasgow, Sawin (1999)
- DeMars (2004)
- Donahue Isham (1998)
- Han Hambleton (2004)
- Kim, Cohen, Park (1995)
- CUSUM and Psychometric Applications
- Hawkins Olwell (1998)
- Meijer van Krimpen-Stoop (2003)
- Montgomery (2005)
- van Krimpen-Stoop Meijer (2002)
- Veerkamp Glas (2000)
19Contacts
- Yongwei Yang yongwei_yang_at_gallup.com
- Abdullah Ferdous aferdous_at_measuredprogress.org
- Tzu-Yun Chin tzuyun_at_unlserve.unl.edu
-
THANK YOU
20Item 35 Conditional CUSUM Charts
back
21Item 174 Conditional CUSUM Charts
back