Title: Test Equating
1Test Equating
- Zhang Zhonghua
- Chinese University of Hong Kong
2Question?
- Two sets of Standardized Test which measure the
same trait A and B. - A and B were administrated separately to two
groups of students (Group 1 and Group 2). Group 1
students only took Test A, and Group 2 students
only took Test B. - The mean score on Test A for Group 1 is 84. And
the mean score on Test B for Group2 is 80. t-test
result indicated that there was a statistically
significant difference between the mean score for
Group 1 and Group 2 (plt0.05). - Then, should the conclusion that the Group 1
students were better than the Group 2 students on
the trait that the two tests measured be gotten?
3Why Equate?
- To compare test scores of different forms of
tests (Strictly speaking, Parallel tests) which
measure the same latent trait - To construct the item bank/pool
- Computerized Adaptive Testing (CAT)
4Whats Equating?
- Equating is a statistical process that is used
to adjust scores on test forms so that scores on
the forms can be used interchangeably. Equating
adjusts for differences in difficulty among forms
that are built to be similar in difficulty and
content (Kolen Brennan, 2004). - The two alternate test forms for equating Same
content and statistical specification - Equity
- Symmetric
- Group Invariance
5- Lords Equity Property
- Examinees with a given true score would have
identical observed score means, standard
deviations, and distributional shapes of
converted scores on Form X and scores on Form Y. - First-order Equity Property
- Examinees with a given true score have the
same means converted score on Form X as they have
on Form Y.
6(No Transcript)
7Equating Design
- Single Group
- Random Groups
- Single Group with Counterbalance
- Anchored/Common-item Nonequivalent Group
- Preequating
8 9- Single Group with Counterbalancing
10 11- Common-item Nonequivalent Groups
12(No Transcript)
13Items form Bank (Operational items)
New Items (Non-Operational Items)
Precalibrated IRT Parameter Item Bank
14Equating Methods
- Based on Classical Testing Theory (CTT)
- Based on Item Response Theory (IRT)
15Downloadable Equating Procedures
- Equating/Linking Programs
- http//www.education.uiowa.edu/casma/EquatingLinki
ngPrograms.htm - IRT Scale Transformation Programs
- http//www.education.uiowa.edu/casma/IRTPrograms.h
tm
16Equating Methods Based on CTT
- Mean Equating
- Linear Equating
- Equipercentiel equating
17CTT-Mean Equating
- In mean equating, Form X is considered to differ
in difficulty from Form Y by the difference of
the mean scores between the two forms. - Example
- MX70, MY75.
- Let Form X as the base Form, Form Y as the
target Form. - For the score 80 on Form Y, the Equated Score
on the scale of Form X is 80-(75-70)75.
18CTT-Linear Equating
- In Linear Equating, scores that are an equal
distance from their means in standard deviation
units are set equal.
19CTT-Equipercentile
- For a given Form X score, find the percentage of
examinees earning scores at or below that Form X
score. - Find the Form Y score that has the same
percentage of examinees at or below it. - The Form X and Form Y score are considered to be
equivalent. - Example
- 70 of the examinees got a score 75 or
below on Form X. - 70 of the examinees got a score 80 or
below on Form Y. - Then a Form X score of 75 would be
considered to represent the same level of
achievement as a Form Y score of 80. -
20Equating Methods Based on IRT
- IRT Parameters Equating
- IRT Observed Score and IRT Truce Score Equating
21Item Response Theory
- Take IRT Three-Parameter Model as an example,
- Item parameters Item Discrimination, Item
Difficulty, Guessing
22(No Transcript)
23(No Transcript)
24Item 1
Item 2
Probability
Scale Score
Difficulty
Item 1
Item 2
25Item 1
Item 2
Probability
Scale Score
Difficulty
Item 1
Item 2
26Item 1
Item 2
Probability
Scale Score
Difficulty
Item 1
Item 2
27(No Transcript)
28(No Transcript)
29Item Parameter Equating
- Linking Separate Calibration (Mean/Mean Method,
Mean/Sigma Method, Stocking-Lord Method, Haebara
Method) - Concurrent Calibration
- Fixed Common-Precalibrated Item Parameter Method
30IRT-Linking Separate Calibration
31IRT-Moment Methods
- Mean/Mean Method
- Mean/Sigma Method
32IRT-Characteristic Curve Method
- Stocking-Lord method
- Haebara method
33Example
- Take Form Y as the base test , Form X as the
target Test - Item 1 on Form X Item Difficulty is 1.0 Item
Discrimination is 1.896 Guessing is 0.18 - Equated item parameters for Item 1 on Form X onto
the scale of Base Form Y can be computed as
follows,
34IRT- Concurrent Calibration
- Concurrent calibration method involves estimating
item and ability parameters simultaneously on a
single computer run. In the procedure, the items
that are not taken by one group of subjects are
taken as not reached or missing data and the item
parameters for all items on the two test forms
are simultaneously estimated. This one estimation
run makes the item parameters for all items from
the two test forms put on the same scale (Kim
Hanson, 2002 Kim Cohen, 1998). - Example
35Concurrent Calibration for Replication
16gtCOMMENTSHorizontal EquatingConcurrent
Calibration for Replication 16gtGLOBAL
NPARM3,DFNAME'D\RESEARCH\REP16\CONH-16\CONH-16.
DAT',SAVEgtSAVE PARM'D\RESEARCH\REP16\CONH-16\C
ONH-16.PAR'gtLENGTH NITEMS140gtINPUT
NTOTAL80,SAMPLE2000,NALT4,NIDCH4,FORMS2(4X,
4A1,6X,I1,1X,80A1)gtFORM1 LENGTH80,ITEMS(1(1)80)
gtFORM2 LENGTH80,ITEMS(1(1)20,81(1)140)gtTEST
ITEMS(1(1)140),LINK(1,1,1,1,1,1,1,1,1,1,1,1,1,1
,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0)gtCALIB CYCLES20gtSCORE
36IRT-Fixed Common-Item Parameters
- This procedure combines the features of
concurrent calibration and linking separate
calibration methods. In the method, the item
parameters for the two test forms are estimated
separately. What differs from linking separate
calibration is that the common item parameters
from the target test will be fixed at the
estimated values from the base test. - Example
37- Fixed Common Item Parameters for Replication 16
- gtCOMMENTS
- FCIP for Replication 16
- Target Test Form B with N (0,1)
- gtGLOBAL NPARM3,DFNAME'D\RESEARCH\REP16\FIXV-16\
B11-16.DAT',SAVE - gtSAVE PARM'D\RESEARCH\REP16\FIXV-16\FIXV-16.PAR'
- gtLENGTH NITEMS(80)
- gtINPUT NTOTAL80,SAMPLE1000,NALT4,NIDCH4
- (4A1,1X,80A1)
- gtTEST ITEMS(1(1)80)
- gtCALIB TPRIOR,SPRIOR,GPRIOR,READPRI,CYCLES20
- gtPRIORS
- TMU(-0.639,1.041,1.701,0.482,-1.144,-0.023,0.616,
1.133,0.668,0.577, - -0.257,0.029,0.904,0.232,1.602,1.642,0.537,-0.228,
1.439,0.517,0.0(0)60), - TSIGMA(0.001,0.001,0.001,0.001,0.001,0.001,0.001,
0.001,0.001,0.001, - 0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.
001,0.001,2.0(0)60), - SMU(-0.688,0.011,-0.810,0.614,-0.811,-0.445,-0.14
2,-0.387,0.292,-0.449, - 0.040,-0.522,0.080,0.660,0.301,0.408,-0.689,-0.079
,0.294,-0.174,0.0(0)60), - SSIGMA(0.001,0.001,0.001,0.001,0.001,0.001,0.001,
0.001,0.001,0.001,
38Comparison of Different Equating Methods
- No agreements have been gotten
- Methods based on CTT can be used to equate tests.
Methods based on IRT are essential to construct
item bank/pool. - Among the methods based on IRT, some researches
indicated that Concurrent Calibration Method
could produce more accurate equating results than
that of Linking Separate Calibration Method and
FCIP method.
39Thank You Very Much!