Title: Estimating Growth when Content Specifications Change:
1Estimating Growth when Content Specifications
Change
- A Multidimensional IRT Approach
- Mark D. Reckase
- Tianli Li
- Michigan State University
2The Problem
- State curriculum frameworks often change from one
grade to the next reflecting the addition of new
instructional content. - For example, at grade 7 algebra may be introduced
as an instructional goal. - At grade 6, algebra is not an important component
of the curriculum. - Tests at the two grades reflect the instructional
content so the 6th grade test does not include
algebra and the 7th grade test does. - How can the score scales of these tests be linked?
3Research Questions
- What do changes on the linked score scale mean,
when the scale is produced using the usual
unidimensional IRT models? - Can multidimensional IRT be used to form vertical
scales? If so, how do the results compare to the
unidimensional results?
4The Approach
- State testing data were analyzed using
multidimensional IRT to develop a realistic model
for the test data at two grade levels. - The results of the real data analyses were
idealized to create the specifications for
simulating the tests at two grade levels. - Simulate data with known structure to determine
how unidimensional and multidimensional
procedures function.
5The Simulated Data Design
- Grade 6 two major constructs
- Arithmetic
- Problem Solving
- Grade 7 three major constructs
- Arithmetic
- Problem Solving
- Algebra
6Simulated Test Structure
Note The numbers in parentheses are the common
items between the two forms of the tests.
7Mean Vectors at each Grade Level
Note Values in parentheses are the observed
means from the simulated data
8Covariance Matrices
Covariance Matrix for Grade 6
Covariance Matrix for Grade 7
Note Values in parentheses are estimated from
the simulated data.
9Orientation of Items
10Effect Size Built into Data
11Unidimensional Basisfor Comparison
- Imagine that the full set of 70 items from both
test levels are administered to the students at
both grade levels. - The matrix of 2000 2000 students from the two
grades by 70 items can be analyzed with the
unidimensional models to serve as a basis for
comparison for the vertical scaling result. - Analyze the matrix using 2pl and Rasch model.
122PL Solution
13Rasch Model Solution
14Vertical Scaling Analysis
- Common-item concurrent calibration
- BILOGMG
- Off grade items coded as not reached
- Both 2pl and Rasch model used for analysis
- Determine effect size of difference in mean of
two grade levels
15Vertically Scaled Effect Sizes
16Vertically Scaled Effect Sizes
- Linked effect size is smaller than full data
effect size. - Rasch effect size is less than 2pl effect size.
- Full data set effect size is less than modeled
effect size.
17Alternative Linking Method
- Common-item, separate calibration
- Common item parameter relationship was poor
18MIRT Analysis
- Full data analysis with TESTFACT
- Three dimensional analysis
- Determine effect size for each dimension
- Correlate each estimated q with the generating qs
to determine meaning of the results.
19MIRT Effect Sizes
20Correlation between Trueand Estimated q Values
21Interpretation of MIRT Solution
- Results are difficult to interpret because of the
default procedures in TESTFACT. - Solution needs to be rotated to have axes align
with content dimensions. - Current solution shows that q1 is related to
algebra and shows the big algebra effect. - q2 is a combination of arithmetic and problem
solving with the emphasis on problem solving. - Most likely it has the sign of the a-parameters
reversed.
22Concurrent MIRT Analysis
- Use concurrent calibration of data from the two
grade levels. - Three dimensional solution
- No rotation
- Determine effect sizes and correlations with true
q values.
23Concurrent MIRT Calibration
24Concurrent MIRT Calibration
25Concurrent MIRT Calibration
- Scale on Dimension 3 is reversed and it has a
large effect size (algebra). - Dimension 1 is most related to arithmetic and
problem solving with a moderate effect size. - Dimension 2 is moderately related to algebra and
has a large effect size. - The overall result gives a reasonable estimate of
effects, but the dimensions need to be rotated to
match the constructs.
26Conclusions
- Unidimensional linking of the two level tests
underestimate the effect size. - Rasch model gives a smaller effect size than the
two parameter logistic model. - MIRT solution shows promise.
- Need to determine how to rotate solution to match
constructs. - TESTFACT has problems converging on estimates
because of mismatch between assumptions and
reality.