Title: KaizenWhat Can I Do To Improve My Program
1KaizenWhat Can I Do To Improve My Program?
F. Jay Breyer, Ph.D. jay.breyer_at_thomson.com
Presented at the 2005 CLEAR Annual
Conference September 15-17 Phoenix,
Arizona
2Test Development Process (Where we have been)
- Content found to be important for job as
determined by job analysis - Sampling of content How many items are needed
in the test form necessary to assess minimal
competency? - Importance of content domains What is the
emphasis on specific content domains?
- Based on identified test specifications, select
items that match content domains - Evaluate total item bank
- Pretest new items
- Evaluate statistical parameters verify
appropriate performance of items
- Outcome Valid reliable test that is sound and
defensible - But wait!!! We can do something else how can
we change what we do to improve the testing
program?
- Review and edit items to ensure correct
grammatical structure and adherence to fairness
and sensitivity guidelines - Equate test forms following the standard setting
to ensure comparability of test scores for
different test forms - Prepare test forms for administration
paper-and-pencil delivery or computer delivery
3After the Examination is Over.
- It seems we would never get to this point but
here we are and before the next test is created - What can we learn from this administration?
- What should we do to find out about our
examination we just gave and reported?
- What is the size and quality of my item bank
- Do I have sufficient numbers of items in each
content area for the next examination form? - Can I assemble the next form to content and
statistical specifications? - How do I find out what my statistical
specifications are? - What is the reliability of my test?
4- Item analysis of the test before scores are
reported helps ensure validity - Correct keys are used to grant points
- Items function as intended
- But Test Analyses after the test is reported can
be useful for - Construction of new test forms
- Evaluation of item creation techniques
- Changes that improve the testing program
- Determining appropriate psychometric approaches
to item and test development - What do you do if your test is too
- Long for the time allotted?
- Too hard/easy for the population tested and the
purpose? - Not sufficiently reliable for the tests purpose?
5Test Analyses
- Help ensure quality for testing programs that
wish to verify that appropriate test development
and psychometric procedures are being used.
These analyses help to verify that the programs
test development activities are psychometrically
sound and provide directions for possible
continuous improvement
- Assure the public of meeting basic standards of
- Quality Fairness
- Reliability
- Answer the question How are my test development
activities doing?
Analyses should
6Item Analyses at Different Times
- PIA
- Preliminary Item Analysis
- EIA
- Early Item Analysis
- IA after PINS but before equating or cut score
study - FIA
- Final Item Analysis
7PIA Only Bad Items
8PIA Hard Item
9PIA Key Issue
10FIA Everything
98.2
72.3
89.0
C
11Post Test Administration Inquiry
Item/Task Information
Total Score Information
- Reliability
- Score Distributions
- Descriptive Information
- Speededness
- Quality of items/tasks from past test
- Difficulty
- Discrimination
- DIF
A FAIR TEST
Subscore Information
- Reliability of reported subscores
- Score Distributions
- Descriptive Information
12Score Information Reliability and Validity
- Reliability
- Consistency Accuracy
- Validity
- Score inferences, score meaning, score
interpretations - What we can say about people
13Score Information Reliability
- Reliability
- Consistency and Accuracy
- Credential Testing
- Refers to consistency of test scores across
different test forms given the content sampling - Alpha, Kuder-Richardson, (K-R20)
- Refers to consistency of passing and failing the
same people as if they were able to take the test
twice - Subkoviak, PF Consistency, RELCLASS
14Score Information Reliability
- Measurement Error
- Refers to random fluctuations in a persons score
due to factors not related to the content of the
test - SEM
- CSEM
15Test Analyses Score Information
0.88
75
16Test Analyses Score Information
Correlations can add to the understanding of
score reliability
17Item Information DIF Sensitivity
- Sensitivity
- How questions appear
- Review by TD Person
- Removes words and phrases from a test that may be
- Insulting
- Defamatory
- Charged
- Differential Item Functioning (DIF)
- How question behave
- Searches for items with Construct Irrelevant
Variance - Tests differences in item difficulty for k groups
when matched on proficiency - Mantel-Haenszel
18DIF
- Impact is not DIF
- The assessment of group differences in test
performance between unmatched focal and reference
group members - Confounding of item performance differences
between focal and reference groups
19DIF
- How DIF is calculated
- The criterion is the total test score or
Construct - The question DIF answers is
- Is the meaning the same for the focal group as it
is for the reference group? - If the interpretation of the scores the
meaning, is different for subgroups then DIF is
present - DIF has to do with improving validity
20In Summary
- Statistical Information following Test
Administration can provide - Item information
- Difficulty and suitability of the items/tasks for
your candidate samples - DIF
- Potential sources of bias (invalidity)
- Decision Score Information
- Distributions descriptive statistics
reliability information - Subscore Information
- Reliability information intercorrelations
- Help highlight areas for continuous improvement
- Kaizen