Title: Standard Setting
1Standard Setting
Edward Haertel
The Future of Test-Based AccountabilityFestschri
ft in Honor of Robert L. Linn UCLA, Los Angeles,
CA January 22-23, 2007
2Overview
- Standards-Based Score Interpretations
- Problems with Present Practice
- Promising Future Directions
3Standards-Based Score Interpretations
4Score Interpretations
Criterion-Referenced
Norm-Referenced
Standards-Based
5Standards-Based Interpretations
- Most twelfth graders lack even basic knowledge of
U.S. history (NAEP, 2001) - 22 of California 8th graders are at or above
proficient in Mathematics(NAEP, 2005) - 32 of California 8th graders are at or above
proficient in Mathematics(CDE, 2005)
6What does "Proficient" mean?
7 "Saying that all students must be ... proficient
... by 2014, but leaving the definition of
proficient ... to the states has resulted in so
much state-to-state variability ... that
proficient has become a meaningless
designation."
Linn (2005), From CRESST Tech Report No. 651
8Content Stds, Perf Stds, Cutpts
- (Academic) Content Standards Curriculum
framework what's supposed to be taught - Performance (Achievement) Standarddescription
of what "meeting the standard" is supposed to
mean - Cut ScoreMinimum test score defined as "pass"
or "at-or-above" - "Operational definition" of performance standard
9Performance Standards
"Proficient represents solid academic performance
for each grade assessed. Students reaching this
level have demonstrated competency over
challenging subject matter, including
subject-matter knowledge, application of such
knowledge to real-world situations, and
analytical skills appropriate to the subject
matter."
Definition used for NAEP, also used by California
10"Proficient" for 8th grade math
- Be able to conjecture, defend ideas, give
examples - understand connections among fractions, percents,
decimals, ... algebra, functions - be able to convey underlying reasoning skills
beyond ... arithmetic - accurately use the tools of technology
Excerpts from 150 word definition for NAEP
11Problems with Present Practice
12Problems with Perf Stds
- Vague, ill-defined performance standard
- No performance standard at all
- Performance standard not aligned with test
- Performance standard with excess meaning
13Common Origin Is Not Enough
Content Standards
TestSpecification
PerformanceStandard
?
Test
14Interpretive (Validity) Argument
- Content Standards
- Is it clear what test is supposed to measure?
- Alignment
- Does test measure what it is supposed to?
- Accuracy and Precision
- Adequate reliability, freedom from bias, etc.?
- Performance Standard
- Clear, appropriate, aligned with test?
- Cut Score
- Accurately matched to performance standard?
15Problems with Cut Scores
- Test-Centered Methods
- Angoff, Modified Angoff, Bookmark
- Person-Centered Methods
- Contrasting Groups, Borderline Group
- Performance-Centered Methods
- Body of Work
16Test-Centered Methods
... As the Panel's studies demonstrate, the
Angoff ... and other item-judgment methods are
fundamentally flawed. Minor improvements ...
cannot overcome the nearly impossible cognitive
task of estimating the probability that a
hypothetical student at the boundary of a given
achievement level will get a particular item
correct. Shepard, Glaser, Linn, Bohrnstedt
(1993)
17A word about the "Bookmark"
- Pros
- Seems superior to Angoff
- Cons
- Judgment locus is still performance of
hypothetical borderline examinee - Relies on arbitrary mastery probability convention
18Person-Centered Methods
- Pros
- Real-world judgment locus
- "Reality check" as to accuracy of classification
- Cons
- Suspect basis of person classifications
- Weak theoretical foundation
- Dubious basis for generalization to other
examinees/times/places
19Performance-Centered Methods
- Pros
- Locus of judgment is direct sample of student
performance - Cons
- Limits Performance Standard to performance on
test-like tasks - Largely limited to assessments of writing or
assessments using constructed-response
20Promising Future Directions
21Clear communication, modest claims
- Unelaborated labels like "proficient" invite
surplus meaning - There are no incentives to discourage
misinterpretation - Vivid, real-world examples(e.g., representative
student work samples) can help
22Challenging but realistic goals
"Ambitious expectations are desirable to
encourage concentrated effort on the part of
educators and students. However, in order for the
expectations to be met, educators and students
must have the capacity to meet the targets that
are set."
Linn (2005), From CRESST Tech Report No. 650
23Benchmarks, not "standards"
- TIMSS Example
- Benchmarks at 25th, 50th, 75th, 90th iles of
international achievement distribution - E.g., "In 1999, 61 of U.S. 8th graders scored
above the international median performance level - Other options
- U.S. distribution at fixed point in time
- "Grade Level" standards (cf. Linn, 2005)
24Better methods
- Briefing Book Method?
- Choose maybe 10 possible cut scores, and tell
what each means and implies - Empirically derived performance standard
- Projected passing rate
- Subgroup impacts
- School-level distribution of passing
- "Real-world" links if required by language of
performance standard
25Conclusions
- Standards-based score interpretations will be
around for awhile - Vague definitions invite surplus meanings
- flawed standard-setting methods and overuse of
"Proficient" label add to confusion - Better reporting would help right now
- Better alternatives (e.g., benchmarks) exist
- Better methods (e.g., "briefing book") may be
developed
26(No Transcript)