Test Equating - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Test Equating

Description:

A and B were administrated separately to two groups of students (Group 1 and Group 2) ... SAVE PARM='D:RESEARCHREP16CONH-16CONH-16.PAR'; LENGTH NITEMS=140; ... – PowerPoint PPT presentation

Number of Views:304
Avg rating:3.0/5.0
Slides: 40
Provided by: John669
Category:
Tags: equating | parm | test

less

Transcript and Presenter's Notes

Title: Test Equating


1
Test Equating
  • Zhang Zhonghua
  • Chinese University of Hong Kong

2
Question?
  • Two sets of Standardized Test which measure the
    same trait A and B.
  • A and B were administrated separately to two
    groups of students (Group 1 and Group 2). Group 1
    students only took Test A, and Group 2 students
    only took Test B.
  • The mean score on Test A for Group 1 is 84. And
    the mean score on Test B for Group2 is 80. t-test
    result indicated that there was a statistically
    significant difference between the mean score for
    Group 1 and Group 2 (plt0.05).
  • Then, should the conclusion that the Group 1
    students were better than the Group 2 students on
    the trait that the two tests measured be gotten?

3
Why Equate?
  • To compare test scores of different forms of
    tests (Strictly speaking, Parallel tests) which
    measure the same latent trait
  • To construct the item bank/pool
  • Computerized Adaptive Testing (CAT)

4
Whats Equating?
  • Equating is a statistical process that is used
    to adjust scores on test forms so that scores on
    the forms can be used interchangeably. Equating
    adjusts for differences in difficulty among forms
    that are built to be similar in difficulty and
    content (Kolen Brennan, 2004).
  • The two alternate test forms for equating Same
    content and statistical specification
  • Equity
  • Symmetric
  • Group Invariance

5
  • Lords Equity Property
  • Examinees with a given true score would have
    identical observed score means, standard
    deviations, and distributional shapes of
    converted scores on Form X and scores on Form Y.
  • First-order Equity Property
  • Examinees with a given true score have the
    same means converted score on Form X as they have
    on Form Y.

6
(No Transcript)
7
Equating Design
  • Single Group
  • Random Groups
  • Single Group with Counterbalance
  • Anchored/Common-item Nonequivalent Group
  • Preequating

8
  • Single Group

9
  • Single Group with Counterbalancing

10
  • Random Groups

11
  • Common-item Nonequivalent Groups

12
(No Transcript)
13
  • Preequating

Items form Bank (Operational items)
New Items (Non-Operational Items)
Precalibrated IRT Parameter Item Bank
14
Equating Methods
  • Based on Classical Testing Theory (CTT)
  • Based on Item Response Theory (IRT)

15
Downloadable Equating Procedures
  • Equating/Linking Programs
  • http//www.education.uiowa.edu/casma/EquatingLinki
    ngPrograms.htm
  • IRT Scale Transformation Programs
  • http//www.education.uiowa.edu/casma/IRTPrograms.h
    tm

16
Equating Methods Based on CTT
  • Mean Equating
  • Linear Equating
  • Equipercentiel equating

17
CTT-Mean Equating
  • In mean equating, Form X is considered to differ
    in difficulty from Form Y by the difference of
    the mean scores between the two forms.
  • Example
  • MX70, MY75.
  • Let Form X as the base Form, Form Y as the
    target Form.
  • For the score 80 on Form Y, the Equated Score
    on the scale of Form X is 80-(75-70)75.

18
CTT-Linear Equating
  • In Linear Equating, scores that are an equal
    distance from their means in standard deviation
    units are set equal.

19
CTT-Equipercentile
  • For a given Form X score, find the percentage of
    examinees earning scores at or below that Form X
    score.
  • Find the Form Y score that has the same
    percentage of examinees at or below it.
  • The Form X and Form Y score are considered to be
    equivalent.
  • Example
  • 70 of the examinees got a score 75 or
    below on Form X.
  • 70 of the examinees got a score 80 or
    below on Form Y.
  • Then a Form X score of 75 would be
    considered to represent the same level of
    achievement as a Form Y score of 80.

20
Equating Methods Based on IRT
  • IRT Parameters Equating
  • IRT Observed Score and IRT Truce Score Equating

21
Item Response Theory
  • Take IRT Three-Parameter Model as an example,
  • Item parameters Item Discrimination, Item
    Difficulty, Guessing

22
(No Transcript)
23
(No Transcript)
24
Item 1
Item 2
Probability
Scale Score
Difficulty
Item 1
Item 2
25
Item 1
Item 2
Probability
Scale Score
Difficulty
Item 1
Item 2
26
Item 1
Item 2
Probability
Scale Score
Difficulty
Item 1
Item 2
27
(No Transcript)
28
(No Transcript)
29
Item Parameter Equating
  • Linking Separate Calibration (Mean/Mean Method,
    Mean/Sigma Method, Stocking-Lord Method, Haebara
    Method)
  • Concurrent Calibration
  • Fixed Common-Precalibrated Item Parameter Method

30
IRT-Linking Separate Calibration
31
IRT-Moment Methods
  • Mean/Mean Method
  • Mean/Sigma Method

32
IRT-Characteristic Curve Method
  • Stocking-Lord method
  • Haebara method

33
Example
  • Take Form Y as the base test , Form X as the
    target Test
  • Item 1 on Form X Item Difficulty is 1.0 Item
    Discrimination is 1.896 Guessing is 0.18
  • Equated item parameters for Item 1 on Form X onto
    the scale of Base Form Y can be computed as
    follows,

34
IRT- Concurrent Calibration
  • Concurrent calibration method involves estimating
    item and ability parameters simultaneously on a
    single computer run. In the procedure, the items
    that are not taken by one group of subjects are
    taken as not reached or missing data and the item
    parameters for all items on the two test forms
    are simultaneously estimated. This one estimation
    run makes the item parameters for all items from
    the two test forms put on the same scale (Kim
    Hanson, 2002 Kim Cohen, 1998).
  • Example

35
Concurrent Calibration for Replication
16gtCOMMENTSHorizontal EquatingConcurrent
Calibration for Replication 16gtGLOBAL
NPARM3,DFNAME'D\RESEARCH\REP16\CONH-16\CONH-16.
DAT',SAVEgtSAVE PARM'D\RESEARCH\REP16\CONH-16\C
ONH-16.PAR'gtLENGTH NITEMS140gtINPUT
NTOTAL80,SAMPLE2000,NALT4,NIDCH4,FORMS2(4X,
4A1,6X,I1,1X,80A1)gtFORM1 LENGTH80,ITEMS(1(1)80)
gtFORM2 LENGTH80,ITEMS(1(1)20,81(1)140)gtTEST
ITEMS(1(1)140),LINK(1,1,1,1,1,1,1,1,1,1,1,1,1,1
,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0)gtCALIB CYCLES20gtSCORE
36
IRT-Fixed Common-Item Parameters
  • This procedure combines the features of
    concurrent calibration and linking separate
    calibration methods. In the method, the item
    parameters for the two test forms are estimated
    separately. What differs from linking separate
    calibration is that the common item parameters
    from the target test will be fixed at the
    estimated values from the base test.
  • Example

37
  • Fixed Common Item Parameters for Replication 16
  • gtCOMMENTS
  • FCIP for Replication 16
  • Target Test Form B with N (0,1)
  • gtGLOBAL NPARM3,DFNAME'D\RESEARCH\REP16\FIXV-16\
    B11-16.DAT',SAVE
  • gtSAVE PARM'D\RESEARCH\REP16\FIXV-16\FIXV-16.PAR'
  • gtLENGTH NITEMS(80)
  • gtINPUT NTOTAL80,SAMPLE1000,NALT4,NIDCH4
  • (4A1,1X,80A1)
  • gtTEST ITEMS(1(1)80)
  • gtCALIB TPRIOR,SPRIOR,GPRIOR,READPRI,CYCLES20
  • gtPRIORS
  • TMU(-0.639,1.041,1.701,0.482,-1.144,-0.023,0.616,
    1.133,0.668,0.577,
  • -0.257,0.029,0.904,0.232,1.602,1.642,0.537,-0.228,
    1.439,0.517,0.0(0)60),
  • TSIGMA(0.001,0.001,0.001,0.001,0.001,0.001,0.001,
    0.001,0.001,0.001,
  • 0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.
    001,0.001,2.0(0)60),
  • SMU(-0.688,0.011,-0.810,0.614,-0.811,-0.445,-0.14
    2,-0.387,0.292,-0.449,
  • 0.040,-0.522,0.080,0.660,0.301,0.408,-0.689,-0.079
    ,0.294,-0.174,0.0(0)60),
  • SSIGMA(0.001,0.001,0.001,0.001,0.001,0.001,0.001,
    0.001,0.001,0.001,

38
Comparison of Different Equating Methods
  • No agreements have been gotten
  • Methods based on CTT can be used to equate tests.
    Methods based on IRT are essential to construct
    item bank/pool.
  • Among the methods based on IRT, some researches
    indicated that Concurrent Calibration Method
    could produce more accurate equating results than
    that of Linking Separate Calibration Method and
    FCIP method.

39
Thank You Very Much!
Write a Comment
User Comments (0)
About PowerShow.com