Title: Robert L. Linn
1 Educational Accountability Systems
Robert L. Linn
Paper prepared for The CRESST Conference The
Future of Test-Based Educational Accountability,
January 23, 2007
2Test-based Accountability
- Popular tool for purposes of educational reform
- Accountability is one of few tools available to
policymakers to leverage changes in instruction - In use in many states since the early 1990s
- Quite a range of approaches to using student test
results for accountability systems - Central component of NCLB
3Some Rationales for Testing
- Clarify expectations for teaching and learning
- Motivate greater effort on part of students,
teachers and administrators - Monitor educational progress of schools and
students - Identify schools that need to be improved
- Provide a basis for distributing rewards and
sanctions - Monitor achievement gaps and encourage the
closing of those gaps
4No Child Left Behind
- NCLB is the latest in a series of
re-authorizations of the Elementary and Secondary
Education Act (ESEA) of 1965 - ESEA was the main educational component of
President Johnsons Great Society program - ESEA, as re-authorized every view years, is the
principal federal law affecting elementary and
secondary education throughout the country
5Assessments
- Basic skills and norm-referenced tests of 1980s
and early 90s - Nation of Risk encouragement of more ambitious
tests - performance assessments - NCLB increased uniformity of assessments for
grades 3-8 of reading and mathematics
6Content Standards
- States encouraged to develop content standards by
Goals 2000 and IASA - NCLB requires all states to have academic content
standards in reading/English language arts,
mathematics, and science - All states adopted content standards by 2005 to
meet requirements of NCLB if they had not already
done so
7NCLB
- States required to adopt challenging
academic content standards that specify what
children are expected to know and be able to do
coherent and rigorous content and encourage
the teaching of advanced skills (NCLB, 2001,
part A, subpart 1, Sec. 1111, a (D).
8Performance Standards
- Called Academic Achievement Standards by NCLB
- Absolute rather than normative
- Establish fixed criterion of performance
- Intended to be challenging
- Relatively small number of levels
- Apply to all, or essentially all students
- Depend on judgment
9Standards Movement
- High expectations of NCLB consistent with the
standards movement of 1990s - National Assessment of Educational Progress
(NAEP) standards (called achievement levels) set
at ambitious levels - NAEP 1990 proficient level in mathematics set at
high levels - Grade 4 87th percentile 13 proficient or
above - Grade 8 85th percentile 15 proficient or
above - Grade 12 88th percentile 12 proficient or
above
10(No Transcript)
11(No Transcript)
12States with the Highest and Lowest Percent
Proficientor Above on State Assessments in 2005
- Highest
- Reading Grade 4
- Mississippi 89
- Reading Grade 8
- North Carolina 88
- Math Grade 4
- North Carolina, 92
- Math Grade 8
- Tennessee 87
- Lowest
- Reading Grade 4
- Missouri 35
- Reading Grade 8
- South Carolina 30
- Math Grade 4
- Maine Wyo. 39
- Math Grade 8
- Missouri 16
13Contrasts of Percent Proficient or above on
NAEPand State Assessments (Grade 8 Mathematics)
- NAEP
- Missouri 21
- Tennessee 26
- State Assessments
- Missouri 16
- Tennessee 87
14Alignment
- Alignment of assessments and content standards
viewed as critical by proponents of
standards-based reform - NCLB peer review requires states to demonstrate
alignment, usually through studies by independent
contractors
15Alignment of Assessments to Content Standards
- Webb
- Categorical concurrence
- Depth of knowledge consistency
- Range of knowledge correspondence
- Balance of representation
- Porter
- Content categories by cognitive demand matrix
16Alignment of Assessments to Content Standards
(Contd)
- Achieve
- Content centrality
- Performance centrality
- Challenge
- Balance
- Range
17Approaches to Test-Based Accountability
- Status Approach compare assessment results for a
given year to fixed targets (the NCLB approach) - Growth Approach evaluate growth in achievement
(allowed for NCLB pilot program states) - Growth may be measured by comparing performance
of successive cohorts of students - Growth may be evaluated by longitudinal tracking
of students from year to year
18Status and Growth Approaches
- Status approach has many drawbacks when used to
identify schools as successes or in need of
improvement - Does not account for differences in student
characteristics, most importantly differences in
prior achievement - Growth approach has advantage of accounting for
differences in prior achievement, but may set
different standards for schools that start in
different places
19NCLB Pilot Program
- Five states have received approval to use growth
model approaches to determining AYP - Early results suggest that it does not radically
alter the proportion of schools failing to make
AYP - Constraints on growth models are severe, most
notably the retention of the requirement that
they lead to the completely unrealistic goal of
100 proficiency by 2014
20Multiple-Hurdle Approach
- NCLB uses multiple-hurdle approach
- Schools must meet multiple targets each year
participation and achievement separately for
reading and mathematics for the total student
body and for subgroups of sufficient size - Many ways to fail to make AYP (miss any target),
but only one way to make AYP (meet or exceed
every target) - Large schools with diverse student bodies at a
relative disadvantage in comparison to small
schools or schools with relatively homogeneous
student bodies
21Compensatory Approach
- State systems often use a compensatory approach
rather than a multiple-hurdle approach - An advantage of compensatory approach is that it
creates fewer ways for a school to fall short of
targets - Hybrid models also possible that use a
combination of compensatory and multiple-hurdle
approaches
22Disaggegation
- Critical for monitoring the closing of gaps in
achievement - No real relevance for small schools with
homogeneous student bodies - However, it leads to many hurdles that large,
diverse schools must meet
23Implications of Subgroup Results
- Schools with multiple subgroups at relative
disadvantage to schools with homogeneous student
population - May want to consider combining across more than
one year as is already allowed for students with
disabilities
24Subgroup Gains in NAEP Mathematics Average Scale
Scores (1996 to 2005)
Group Grade 4 Grade 8
White 14 8
Black 22 15
Hispanic 19 11
25Closing Achievement Gaps NAEP Mathematics
Average Scale Scores (1996 to 2005)
Groups Grade 4 Grade 8
White and Black -8 -7
White and Hispanic -5 -3
26Use of Academic Achievement Standards
- Apparent closing or widening of achievement gaps
using percent above cut scores can depend on
choice of level, e.g., basic or above vs.
proficient or above - See, for example, Holland, P. W. (2002). Two
measures of changes in gaps between CDFs of test
score distributions. JEBS, 27, 3-17.
27Subgroup Gains in NAEP Mathematics Percent At or
Above Basic or Proficient (1996 to 2005)
Grade 4 Grade 4 Grade 8 Grade 8
Group Basic Prof. Basic Prof.
White 14 20 7 9
Black 33 10 17 5
Hispanic 28 12 13 5
28Changes in Achievement Gaps NAEP Mathematics
Percent At or Above Basic or Proficient (1996 to
2005)
Grade 4 Grade 4 Grade 8 Grade 8
Groups Basic Prof. Basic Prof.
White and Black -19 10 -10 4
White and Hispanic -14 8 -6 4
29Gaps and Percent Above Cuts
- Using differences in percent above cut scores
can give a confusing impression of a rather
simple situation (Holland, 2002) - Need to look beyond percents basic or above or
proficient or above - Compare average scale scores, effect size
statistics, and comparisons of distributions
30Comparing States on Closing Gaps
- Gaps measured in terms of percent proficient or
above on state assessments can be quite
misleading due to the wide variation in the
stringency of state definitions of the proficient
standard
31Performance Indexes
- Focusing only on percent proficient or above has
disadvantages - Does not give credit to student moving from below
basic to basic - Encourages attention to students thought to be
near the proficient cut, possibly at the expense
of other students - Performance Index scores avoid these problems
32Illustration of MA Index Scores for a
Hypothetical School in 2006 2007
Perfor-mance Level Points N 2006 N 2007 2006 Points 2007 Points
Prof 100 50 50 5,000 5,000
NI high 75 75 100 5,625 7,500
NI low 50 100 125 5,000 6,250
W/F high 25 100 125 2,500 3,125
W/F low 0 75 50 0 0
Total 400 400 18,125 21,875
33 - School Index Scores
- 2006 Score 18,125/400 45.31
- 2007 Score 21,875/400 54.69
- Percent Proficient or Above
- 2006 12.5
- 2007 12.5
34Score Inflation
- Defined as .. a gain in scores that
substantially overstates the improvement in
learning it implies (Koretz, 2005) - Research has found that gains in scores in
high-stakes accountability systems often fail to
generalize to other measures of achievement - Narrow focus on past tests rather than broader
content standard can cause score inflation - Emphasis on alignment and the need to repeat a
substantial percentage of items on assessments
for year-to-year equating may contribute to score
inflation
35Validity of Causal Inferences
- Status approach does not provide a defensible
basis for inferring that higher scoring school is
more effective than a lower scoring school - Making an inference about school quality requires
the elimination of many alternate explanations of
differences in student achievement other than
differences in instructional effectiveness - Prior achievement differences
- Differences in support from home
36Inferences About Schools
- Growth models rule out the alternate explanation
of differences in prior achievement - Nonetheless, causal inferences about school
effectiveness are not justified the growth
approach to test-based accountability - Many rival explanations to between-school
differences in growth besides differences in
school quality or effectiveness - Results better thought of as descriptive for
generating hypotheses about school quality that
need to be evaluated
37School Characteristicsand Instructional Practice
- School differences in achievement and in growth
describe outcomes and can be the source of
hypotheses about school effectiveness - Accountability systems need to be informed by
direct information about school characteristics
and instructional practices
38Conclusions
- Test-based accountability has become a pervasive
part of efforts to improve education in the U.S. - The features of accountability systems matter
- Requirement to include nearly all students in
test-based accountability has brought needed
attention to groups often ignored in the past
39Conclusions (continued)
- Performance standards are supposed to define the
level of achievement that students should reach,
but - The definition of proficient achievement varies
so widely from state to state that it lacks any
semblance of common meaning - Using percent proficient or above a primary
indicator does not give credit for gains of
students at other levels - Using percent proficient or above to monitor gaps
in achievement is not an adequate approach
40Conclusions (continued)
- Status-based approach to accountability does not
provide a valid way of distinguishing successful
schools from schools that are in need of
improvement - Growth models have advantages over status models
but still are best thought of as providing
descriptive information rather than the providing
the basis for causal inferences about school
quality