Robert L. Linn - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Robert L. Linn

Description:

National Center for Research on Evaluation, Standards, and Student Testing ... Some Rationales for Testing. Clarify expectations for teaching and learning ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 41
Provided by: larry112
Category:
Tags: linn | robert | testing

less

Transcript and Presenter's Notes

Title: Robert L. Linn


1

Educational Accountability Systems
Robert L. Linn
Paper prepared for The CRESST Conference The
Future of Test-Based Educational Accountability,
January 23, 2007
2
Test-based Accountability
  • Popular tool for purposes of educational reform
  • Accountability is one of few tools available to
    policymakers to leverage changes in instruction
  • In use in many states since the early 1990s
  • Quite a range of approaches to using student test
    results for accountability systems
  • Central component of NCLB

3
Some Rationales for Testing
  • Clarify expectations for teaching and learning
  • Motivate greater effort on part of students,
    teachers and administrators
  • Monitor educational progress of schools and
    students
  • Identify schools that need to be improved
  • Provide a basis for distributing rewards and
    sanctions
  • Monitor achievement gaps and encourage the
    closing of those gaps

4
No Child Left Behind
  • NCLB is the latest in a series of
    re-authorizations of the Elementary and Secondary
    Education Act (ESEA) of 1965
  • ESEA was the main educational component of
    President Johnsons Great Society program
  • ESEA, as re-authorized every view years, is the
    principal federal law affecting elementary and
    secondary education throughout the country

5
Assessments
  • Basic skills and norm-referenced tests of 1980s
    and early 90s
  • Nation of Risk encouragement of more ambitious
    tests - performance assessments
  • NCLB increased uniformity of assessments for
    grades 3-8 of reading and mathematics

6
Content Standards
  • States encouraged to develop content standards by
    Goals 2000 and IASA
  • NCLB requires all states to have academic content
    standards in reading/English language arts,
    mathematics, and science
  • All states adopted content standards by 2005 to
    meet requirements of NCLB if they had not already
    done so

7
NCLB
  • States required to adopt challenging
    academic content standards that specify what
    children are expected to know and be able to do
    coherent and rigorous content and encourage
    the teaching of advanced skills (NCLB, 2001,
    part A, subpart 1, Sec. 1111, a (D).

8
Performance Standards
  • Called Academic Achievement Standards by NCLB
  • Absolute rather than normative
  • Establish fixed criterion of performance
  • Intended to be challenging
  • Relatively small number of levels
  • Apply to all, or essentially all students
  • Depend on judgment

9
Standards Movement
  • High expectations of NCLB consistent with the
    standards movement of 1990s
  • National Assessment of Educational Progress
    (NAEP) standards (called achievement levels) set
    at ambitious levels
  • NAEP 1990 proficient level in mathematics set at
    high levels
  • Grade 4 87th percentile 13 proficient or
    above
  • Grade 8 85th percentile 15 proficient or
    above
  • Grade 12 88th percentile 12 proficient or
    above

10
(No Transcript)
11
(No Transcript)
12
States with the Highest and Lowest Percent
Proficientor Above on State Assessments in 2005
  • Highest
  • Reading Grade 4
  • Mississippi 89
  • Reading Grade 8
  • North Carolina 88
  • Math Grade 4
  • North Carolina, 92
  • Math Grade 8
  • Tennessee 87
  • Lowest
  • Reading Grade 4
  • Missouri 35
  • Reading Grade 8
  • South Carolina 30
  • Math Grade 4
  • Maine Wyo. 39
  • Math Grade 8
  • Missouri 16

13
Contrasts of Percent Proficient or above on
NAEPand State Assessments (Grade 8 Mathematics)
  • NAEP
  • Missouri 21
  • Tennessee 26
  • State Assessments
  • Missouri 16
  • Tennessee 87

14
Alignment
  • Alignment of assessments and content standards
    viewed as critical by proponents of
    standards-based reform
  • NCLB peer review requires states to demonstrate
    alignment, usually through studies by independent
    contractors

15
Alignment of Assessments to Content Standards
  • Webb
  • Categorical concurrence
  • Depth of knowledge consistency
  • Range of knowledge correspondence
  • Balance of representation
  • Porter
  • Content categories by cognitive demand matrix

16
Alignment of Assessments to Content Standards
(Contd)
  • Achieve
  • Content centrality
  • Performance centrality
  • Challenge
  • Balance
  • Range

17
Approaches to Test-Based Accountability
  • Status Approach compare assessment results for a
    given year to fixed targets (the NCLB approach)
  • Growth Approach evaluate growth in achievement
    (allowed for NCLB pilot program states)
  • Growth may be measured by comparing performance
    of successive cohorts of students
  • Growth may be evaluated by longitudinal tracking
    of students from year to year

18
Status and Growth Approaches
  • Status approach has many drawbacks when used to
    identify schools as successes or in need of
    improvement
  • Does not account for differences in student
    characteristics, most importantly differences in
    prior achievement
  • Growth approach has advantage of accounting for
    differences in prior achievement, but may set
    different standards for schools that start in
    different places

19
NCLB Pilot Program
  • Five states have received approval to use growth
    model approaches to determining AYP
  • Early results suggest that it does not radically
    alter the proportion of schools failing to make
    AYP
  • Constraints on growth models are severe, most
    notably the retention of the requirement that
    they lead to the completely unrealistic goal of
    100 proficiency by 2014

20
Multiple-Hurdle Approach
  • NCLB uses multiple-hurdle approach
  • Schools must meet multiple targets each year
    participation and achievement separately for
    reading and mathematics for the total student
    body and for subgroups of sufficient size
  • Many ways to fail to make AYP (miss any target),
    but only one way to make AYP (meet or exceed
    every target)
  • Large schools with diverse student bodies at a
    relative disadvantage in comparison to small
    schools or schools with relatively homogeneous
    student bodies

21
Compensatory Approach
  • State systems often use a compensatory approach
    rather than a multiple-hurdle approach
  • An advantage of compensatory approach is that it
    creates fewer ways for a school to fall short of
    targets
  • Hybrid models also possible that use a
    combination of compensatory and multiple-hurdle
    approaches

22
Disaggegation
  • Critical for monitoring the closing of gaps in
    achievement
  • No real relevance for small schools with
    homogeneous student bodies
  • However, it leads to many hurdles that large,
    diverse schools must meet

23
Implications of Subgroup Results
  • Schools with multiple subgroups at relative
    disadvantage to schools with homogeneous student
    population
  • May want to consider combining across more than
    one year as is already allowed for students with
    disabilities

24
Subgroup Gains in NAEP Mathematics Average Scale
Scores (1996 to 2005)
Group Grade 4 Grade 8
White 14 8
Black 22 15
Hispanic 19 11
25
Closing Achievement Gaps NAEP Mathematics
Average Scale Scores (1996 to 2005)
Groups Grade 4 Grade 8
White and Black -8 -7
White and Hispanic -5 -3
26
Use of Academic Achievement Standards
  • Apparent closing or widening of achievement gaps
    using percent above cut scores can depend on
    choice of level, e.g., basic or above vs.
    proficient or above
  • See, for example, Holland, P. W. (2002). Two
    measures of changes in gaps between CDFs of test
    score distributions. JEBS, 27, 3-17.

27
Subgroup Gains in NAEP Mathematics Percent At or
Above Basic or Proficient (1996 to 2005)
Grade 4 Grade 4 Grade 8 Grade 8
Group Basic Prof. Basic Prof.
White 14 20 7 9
Black 33 10 17 5
Hispanic 28 12 13 5
28
Changes in Achievement Gaps NAEP Mathematics
Percent At or Above Basic or Proficient (1996 to
2005)
Grade 4 Grade 4 Grade 8 Grade 8
Groups Basic Prof. Basic Prof.
White and Black -19 10 -10 4
White and Hispanic -14 8 -6 4
29
Gaps and Percent Above Cuts
  • Using differences in percent above cut scores
    can give a confusing impression of a rather
    simple situation (Holland, 2002)
  • Need to look beyond percents basic or above or
    proficient or above
  • Compare average scale scores, effect size
    statistics, and comparisons of distributions

30
Comparing States on Closing Gaps
  • Gaps measured in terms of percent proficient or
    above on state assessments can be quite
    misleading due to the wide variation in the
    stringency of state definitions of the proficient
    standard

31
Performance Indexes
  • Focusing only on percent proficient or above has
    disadvantages
  • Does not give credit to student moving from below
    basic to basic
  • Encourages attention to students thought to be
    near the proficient cut, possibly at the expense
    of other students
  • Performance Index scores avoid these problems

32
Illustration of MA Index Scores for a
Hypothetical School in 2006 2007
Perfor-mance Level Points N 2006 N 2007 2006 Points 2007 Points
Prof 100 50 50 5,000 5,000
NI high 75 75 100 5,625 7,500
NI low 50 100 125 5,000 6,250
W/F high 25 100 125 2,500 3,125
W/F low 0 75 50 0 0
Total 400 400 18,125 21,875
33
  • School Index Scores
  • 2006 Score 18,125/400 45.31
  • 2007 Score 21,875/400 54.69
  • Percent Proficient or Above
  • 2006 12.5
  • 2007 12.5

34
Score Inflation
  • Defined as .. a gain in scores that
    substantially overstates the improvement in
    learning it implies (Koretz, 2005)
  • Research has found that gains in scores in
    high-stakes accountability systems often fail to
    generalize to other measures of achievement
  • Narrow focus on past tests rather than broader
    content standard can cause score inflation
  • Emphasis on alignment and the need to repeat a
    substantial percentage of items on assessments
    for year-to-year equating may contribute to score
    inflation

35
Validity of Causal Inferences
  • Status approach does not provide a defensible
    basis for inferring that higher scoring school is
    more effective than a lower scoring school
  • Making an inference about school quality requires
    the elimination of many alternate explanations of
    differences in student achievement other than
    differences in instructional effectiveness
  • Prior achievement differences
  • Differences in support from home

36
Inferences About Schools
  • Growth models rule out the alternate explanation
    of differences in prior achievement
  • Nonetheless, causal inferences about school
    effectiveness are not justified the growth
    approach to test-based accountability
  • Many rival explanations to between-school
    differences in growth besides differences in
    school quality or effectiveness
  • Results better thought of as descriptive for
    generating hypotheses about school quality that
    need to be evaluated

37
School Characteristicsand Instructional Practice
  • School differences in achievement and in growth
    describe outcomes and can be the source of
    hypotheses about school effectiveness
  • Accountability systems need to be informed by
    direct information about school characteristics
    and instructional practices

38
Conclusions
  • Test-based accountability has become a pervasive
    part of efforts to improve education in the U.S.
  • The features of accountability systems matter
  • Requirement to include nearly all students in
    test-based accountability has brought needed
    attention to groups often ignored in the past

39
Conclusions (continued)
  • Performance standards are supposed to define the
    level of achievement that students should reach,
    but
  • The definition of proficient achievement varies
    so widely from state to state that it lacks any
    semblance of common meaning
  • Using percent proficient or above a primary
    indicator does not give credit for gains of
    students at other levels
  • Using percent proficient or above to monitor gaps
    in achievement is not an adequate approach

40
Conclusions (continued)
  • Status-based approach to accountability does not
    provide a valid way of distinguishing successful
    schools from schools that are in need of
    improvement
  • Growth models have advantages over status models
    but still are best thought of as providing
    descriptive information rather than the providing
    the basis for causal inferences about school
    quality
Write a Comment
User Comments (0)
About PowerShow.com