KaizenWhat Can I Do To Improve My Program

About This Presentation

Title:

KaizenWhat Can I Do To Improve My Program

Description:

Test Development Process (Where we have been) ... Based on identified test specifications, select items that match content domains ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 21

Provided by: fhan

Category:

more less

Transcript and Presenter's Notes

Title: KaizenWhat Can I Do To Improve My Program

1
KaizenWhat Can I Do To Improve My Program?

F. Jay Breyer, Ph.D. jay.breyer_at_thomson.com
Presented at the 2005 CLEAR Annual
Conference September 15-17 Phoenix,
Arizona
2
Test Development Process (Where we have been)

Content found to be important for job as
determined by job analysis
Sampling of content How many items are needed
in the test form necessary to assess minimal
competency?
Importance of content domains What is the
emphasis on specific content domains?

Based on identified test specifications, select
items that match content domains
Evaluate total item bank
Pretest new items
Evaluate statistical parameters verify
appropriate performance of items

Outcome Valid reliable test that is sound and
defensible
But wait!!! We can do something else how can
we change what we do to improve the testing
program?

Review and edit items to ensure correct
grammatical structure and adherence to fairness
and sensitivity guidelines
Equate test forms following the standard setting
to ensure comparability of test scores for
different test forms
Prepare test forms for administration
paper-and-pencil delivery or computer delivery

3
After the Examination is Over.

It seems we would never get to this point but
here we are and before the next test is created
What can we learn from this administration?
What should we do to find out about our
examination we just gave and reported?

What is the size and quality of my item bank
Do I have sufficient numbers of items in each
content area for the next examination form?
Can I assemble the next form to content and
statistical specifications?
How do I find out what my statistical
specifications are?
What is the reliability of my test?

Item analysis of the test before scores are
reported helps ensure validity
Correct keys are used to grant points
Items function as intended
But Test Analyses after the test is reported can
be useful for
Construction of new test forms
Evaluation of item creation techniques
Changes that improve the testing program

Determining appropriate psychometric approaches
to item and test development
What do you do if your test is too
Long for the time allotted?
Too hard/easy for the population tested and the
purpose?
Not sufficiently reliable for the tests purpose?

5
Test Analyses

Help ensure quality for testing programs that
wish to verify that appropriate test development
and psychometric procedures are being used.
These analyses help to verify that the programs
test development activities are psychometrically
sound and provide directions for possible
continuous improvement

Assure the public of meeting basic standards of
Quality Fairness
Reliability
Answer the question How are my test development
activities doing?

Analyses should
6
Item Analyses at Different Times

PIA
Preliminary Item Analysis
EIA
Early Item Analysis
IA after PINS but before equating or cut score
study
FIA
Final Item Analysis

7
PIA Only Bad Items
8
PIA Hard Item
9
PIA Key Issue
10
FIA Everything
98.2
72.3
89.0
C
11
Post Test Administration Inquiry
Item/Task Information
Total Score Information

Reliability
Score Distributions
Descriptive Information
Speededness

Quality of items/tasks from past test
Difficulty
Discrimination
DIF

A FAIR TEST
Subscore Information

Reliability of reported subscores
Score Distributions
Descriptive Information

12
Score Information Reliability and Validity

Reliability
Consistency Accuracy
Validity
Score inferences, score meaning, score
interpretations
What we can say about people

13
Score Information Reliability

Reliability
Consistency and Accuracy
Credential Testing
Refers to consistency of test scores across
different test forms given the content sampling
Alpha, Kuder-Richardson, (K-R20)
Refers to consistency of passing and failing the
same people as if they were able to take the test
twice
Subkoviak, PF Consistency, RELCLASS

14
Score Information Reliability

Measurement Error
Refers to random fluctuations in a persons score
due to factors not related to the content of the
test
SEM
CSEM

15
Test Analyses Score Information
0.88
75
16
Test Analyses Score Information
Correlations can add to the understanding of
score reliability
17
Item Information DIF Sensitivity

Sensitivity
How questions appear
Review by TD Person
Removes words and phrases from a test that may be
Insulting
Defamatory
Charged

Differential Item Functioning (DIF)
How question behave
Searches for items with Construct Irrelevant
Variance
Tests differences in item difficulty for k groups
when matched on proficiency
Mantel-Haenszel

18
DIF

Impact is not DIF
The assessment of group differences in test
performance between unmatched focal and reference
group members
Confounding of item performance differences
between focal and reference groups

19
DIF

How DIF is calculated
The criterion is the total test score or
Construct
The question DIF answers is
Is the meaning the same for the focal group as it
is for the reference group?
If the interpretation of the scores the
meaning, is different for subgroups then DIF is
present
DIF has to do with improving validity

20
In Summary