Classroom Assessments in Large Scale Assessment Programs

About This Presentation

Title:

Classroom Assessments in Large Scale Assessment Programs

Description:

Classroom Assessments in Large Scale Assessment Programs Catherine Taylor University of Washington/OSPI Lesley Klenk OSPI History of Criterion-Referenced Assessment ... – PowerPoint PPT presentation

Number of Views:205

Avg rating:3.0/5.0

Slides: 40

Provided by: Cather149

Category:

more less

Transcript and Presenter's Notes

Title: Classroom Assessments in Large Scale Assessment Programs

1
Classroom Assessments in Large Scale Assessment
Programs

Catherine Taylor
University of Washington/OSPI
Lesley Klenk
OSPI

2
History of Criterion-Referenced Assessment Models

Measurement driven instruction" (e.g., Popham,
1987) emerged during the 1980s
A process wherein the tests are used as the
driver for instructional change.
If we value something, we must assess it.
Minimum-competency movement of the 1980's
Drive" instructional practices toward teaching
of basic skills
Movement was successful - Teachers did teach to
the tests.
Unfortunately, teachers taught too closely to
tests (Smith, 1991 Haladyna, Nolen, Hass, 1991).
The tests were typically multiple-choice tests of
discrete skills
Instruction narrowed to the content that was
tested in the same form that it was tested.

3
History of Criterion-Referenced Assessment Models

Large-scale achievement tests came under wide
spread criticism
Negative impacts on the classroom
(Darling-Hammond Wise, 1985 Madaus, West,
Harmon, Lomax, Viator, 1992 Shepard
Dougherty, 1991).
Lack of fidelity to valued performances

4
History of Criterion-Referenced Assessment Models

Studies compared indirect and direct measures of
writing (Stiggins, 1982)
mathematical problem-solving (Baxter, Shavelson,
Herman, Brown, Valadez, 1993)
science inquiry (Shavelson, Baxter, and Gao,
1993)
Demonstrated that some of the knowledge and
skills measured in each assessment format overlap
Moderate to low correlations between different
assessment modes
Questions about the validity of multiple-choice
test scores.
Other studies (Haladyna, Nolen, and Haas, (1991)
Shepard and Dougherty (1991), and Smith (1991))
showed
pressure to raise scores on large scale tests
narrowing of the curriculum to the specific
content tested
substantial classroom time spent teaching to the
test and item formats.

5
History of Criterion-Referenced Assessment Models

In response to criticisms of multiple-choice
tests assessment reformers (e.g., Shepard, 1989
Wiggins, 1989) pressed for
Different types of assessment
Assessments that measure students' achievement of
new curriculum standards
Assessment formats that more closely match the
ways knowledge, concepts and skills are used in
the world beyond tests
Encourage teachers to teach higher order
thinking, problem-solving, and reasoning skills
rather than rote skills and knowledge.

6
History of Criterion-Referenced Assessment Models

In response to these pressures to improve tests
LEAs, testing companies, and projects (e.g., New
Standards Project) incorporated performance
assessments into testing programs
Performance assessments" included
Short-answer items similar to multiple-choice
items
Carefully scaffolded, multi-step tasks with
several short-answer items (e.g., Yen, 1993)
Open-ended performance tasks (California, 1990
OSPI, 1997).

7
History of Criterion-Referenced Assessment Models

Still, writers criticized these efforts
Tasks are contrived and artificial (see, for
example, Wiggins, 1992)
Teachers complain that standardized tests dont
assess what is taught in the classroom
Shepard (2000) indicated that the promises of
high quality performance-based assessments have
not been realized.
Authentic tasks are costly to implement,
time-consuming, and difficulty to evaluate
Less expensive performance assessment options are
less authentic

8
Impact of National Curriculum Standards

Knowledge is expanding rapidly
Education must shift away from knowledge
dissemination
Students must learn how to
Gather information
Comprehend, analyze, interpret information
Evaluate the credibility of information
Synthesize information from different sources
Develop new knowledge

9
Early Attempts to Use Portfolios for State
Assessment

Three states attempted to use collections of
classroom work for state assessment
California (Kirst Mazzeo, 1996 Palmquist,
1994)
Kentucky (Kentucky State Department of Education,
1996)
Vermont (Fontana, 1995 Forseth, 1992 Hewitt,
1993 Vermont State Department of Education,
1993, 1994a, 1994b).

10
Early Attempts to Use Portfolios for State
Assessment

Initial efforts were fraught with problems
Inconsistency of raters when applying scoring
criteria (Koretz, Stecher, Deibert, 1992b
Koretz, Stecher, Klein, McCaffrey, 1994a),
Lack of teacher preparation in high quality
assessment development (Gearhart Wolf, 1996),
Inconsistencies in the focus, number, and types
of evidence included in portfolios (Gearhart
Wolf, 1996 Koretz, et al 1992b), and
Costs and logistics associated with processing
portfolios (Kirst Mazzeo, 1996).

11
Research on Large Scale Portfolio Assessment

Research on impact of portfolios showed mixed
results
Teachers and administrators have generally
positive attitudes about use of portfolios
(Klein, Stecher, Koretz, 1995 Koretz, et al
1992a Koretz, et al 1994a)
Positive effects on instruction (Stecher,
Hamilton, 1994)
Teachers develop a better understanding of
mathematical problem-solving (Stecher Mitchell,
1995)
Too much time spent on the assessment process
(Stecher, Hamilton, 1994 Koretz et al, 1994a)
Teachers work too hard to ensure that portfolios
"look good" (Callahan, 1997).

12
Advantages to using classroom evidence in
large-scale assessment program

Evidence that teachers are preparing students to
meet curriculum and performance standards
(opportunity to learn),
Broader evidence about student achievement
Opportunity to assess knowledge and skills
difficult to assess via standardized tests (e.g.,
speaking and presenting, report writing,
scientific inquiry processes)
Opportunity to include work that more closely
represents the real contexts in which knowledge
and skill are applied

13
Opportunity to Learn

Little evidence is available about whether
teachers are actually teaching to curriculum
standards.
Claims about positive impacts of new assessments
on instructional practices are largely anecdotal
or based on teacher self-report
Legal challenges to tests for graduation,
placement, and promotion demand evidence that
students have had the opportunity to learn tested
curriculum (Debra P. v. Turlington, 1979).
There is no efficient method to assess students
opportunity to learn the valued concepts and
skills
Collections of classroom work provide a window
into the educational experiences of students
Collections of classroom work provide window into
the educational practices of teachers
Collections of classroom work could help
administrators evaluate the effectiveness of
in-service teacher development programs
Classroom assessments could be used in court
cases to provide evidence of individual students
opportunity to learn

14
Broader Evidence of Student Learning

Some students function well in the classroom but
do not perform well on tests.
Stereotype threat" research - fear of negative
stereotype can lead minority students and girls
to perform less well than they should on
standardized tests (Aronson, Lustin, Good,
Keough, Steele, Brown, 1999 Steele, 1999 Steele
Aronson, 2000).
Students may have cultural values or language
development issues that inhibit performance on
timed, standardized tests
These factors threaten the validity of
large-scale test scores.
Classroom work can be more sensitive to students
cultural and linguistic backgrounds
Collections of classroom work can be more
reliable than standardized test scores

15
Including Standards that are Difficult Measure on
Tests

Some desirable curriculum standards are too
unwieldy to measure on large-scale tests (e.g.,
scientific inquiry, research reports, oral
presentations)
Historically, standardized tests measure complex
work by testing knowledge of how to conduct the
work. Examples
Knowing where to locate sources for reports
Knowing how to use tables of contents,
bibliographies, card catalogues, and indexes
Identifying control or experimental variables in
a science experiment
knowing appropriate strategies for oral
presentation
Knowing appropriate ways to use visual aids
Critics often note that knowing what to do
doesn't necessarily mean one is able to do.

16
Authenticity

Frederickson (1984) question of authenticity in
assessment due to misrepresentation of domains by
standardized tests.
Wiggins (1989) claimed that in every discipline
there are tasks that are authentic to the given
discipline.
Frederickson (1998) stated that authentic
achievement is
significant intellectual accomplishment that
results in the construction of knowledge through
disciplined inquiry to produce discourse,
products, or performances that have meaning or
value beyond success in school. (p. 19, italics
added).
Examples of performances
Policy analysis
Historical narrative and evaluation of historical
artifacts
Geographic analysis of human movement
Political debate
Story and poetry writing
Literary analysis/critique
Mathematical modeling
Investment or business analyses
Geometric design and animation
Written report of a scientific investigations
Evaluation of the health of an ecosystem

17
Authenticity

Some measurement specialists question the use of
the terms authentic and direct measurement
All assessments are indirect measures from which
we make inferences about other, related
performances (Terwilliger, 1997))
However
Validity is related to the degree of inference
necessary from scores on a standardized tests to
valued work
Authentic classroom work requires less inference
than multiple choice test scores

18
Challenges with Inclusion of Classroom Work in
Large Scale Programs

Limited teacher preparation in classroom-based
assessment (which can limit the quality of
classroom-based evidence),
Selections of evidence (which can limit
comparisons across students),
Reliability of raters (which can limit the
believability of scores given to student work)
Construct irrelevant variance (which can limit
the validity of scores)

19
Solving Teacher Preparation Issues

Teachers must be taught how to
Select, modify, and develop assessments
Score (evaluate) student work
Write scoring (marking) rules for assessments
that align to standards
Significant, ongoing professional development in
assessment is essential.
Teachers need to re-examine
Important knowledge and skills within each
discipline
How to teach so that students are more
independent learners

20
Selection of Evidence

"For which knowledge, concepts, and skills do we
need classroom-based evidence?"
Koretz, et al (1992b) claimed that, when teachers
are free to select evidence, there is too much
diversity in tasks
Diversity may cause low inter-judge agreement
among raters of the portfolios.
Koretz and his colleagues recommended placing
some restrictions on the types of tasks
considered acceptable for portfolios.
Teachers need guidance in terms of what
constitutes appropriate types of evidence.

21
Improving Selections of Evidence

Provide guidelines for what constitutes an
effective collection of evidence
Provide models for the types of assignments
(performances) that will demonstrate the
standards.
Provide blueprints for tests that can assess that
EALRs assessed by WASL
Provide guides for writing test questions and
scoring rubrics
Provide guides for writing directions and scoring
rubrics for assignments (performances)

22
Guidelines for Collections Include

Lists of important work samples to collect (e.g.,
research reports, mathematics problems)
Number and types of evidence for each category
Outline of steps in performances and work samples
Tools for assessment of students performances
and work samples

23
Example Lists of Number and Types of Work Samples
to Collect

Writing Performances
At least 2 different writing purposes
At least 3 different audiences
Some examples from courses other than English
Science Investigations
At least 3 investigations (physical, earth/space,
life)
Observational assessments of hands-on work
Lab books
Summary research reports

24
Develop Benchmark Performance Assessments

Benchmark performances are performances that
Have value in their own right
Are complex and interdisciplinary
Students expected to do by the end of some
defined period of time (e.g., the end of middle
school).
Performance may require
Application of knowledge, concepts and skills
across subject disciplines (e.g., survey
research)
Authentic work within one subject discipline
(e.g., scientific investigations, expository
writing)

25
Example Description of a Benchmark Performance in
Reading

By the end of middle school, students will select
one important character from a novel, short
story, or play and write a multi-paragraph essay
describing a character, how the character's
personality, actions, choices, and relationships
influence the outcome of the story, and how the
character was affected by the events in the
story. Each paragraph will have a central thought
that is unified into a greater whole supported by
factual material (direct quotations and examples
from the text) as well as commentary to explain
the relationship between the factual material and
the student's ideas.

26
Example Description of a Benchmark Performance in
Mathematics

By the end of high school, students will
investigate and report on a topic of personal
interest by collecting data for a research
question of personal interest. Students will
construct a questionnaire and obtain a sample a
relevant population. In the report, students will
report the results in a variety of appropriate
forms (including pictographs, circle graphs, bar
graphs, histograms, line graphs, and/or stem and
leaf plots and incorporating the use of
technology), analyze and interpret the data using
statistical measures (central tendency,
variability, and range) as appropriate, describe
the results, make predictions, and discuss the
limitations of their data collection methods.
Graphics will be clearly labeled (including name
of data, units of measurement and appropriate
scale) and informatively titled. References to
data in reports will include units of
measurement. Sources will be documented.

27
Example of the Process of Developing Benchmark
Performances

Select work that would be familiar or meaningful
Purchasing decision
Describe the performance in some detail
A person plans to buy a ___ on credit. The
person figures out how much s/he can spend
(down-payment and monthly payments), does
research on the different types of ___, reads
consumer reports or product reviews, compares
costs and qualities, and makes a final selection.
The person then locates the chosen product and
purchases it or finances the purchase.

28
Example of the Process (continued)

Define the steps adults take to complete the
performance
A person plans to buy a ___ on credit for ____
purpose.
The person figures out how much s/he can spend
Determines money available for down-payment
Compares income and monthly expenses to determine
cash available for monthly payment
Does research on the different types of ___
including costs and finance options.
Reads consumer reports or product reviews
Compares costs, qualities, and finance options
Makes a final selection.
Locates the chosen product and finances the
purchase.

29
Example of the Process (continued)

Create grade level appropriate steps
The student plans to buy a ___ on credit for
_____ purpose.
The student
Figures out how much s/he can spend
Determines money available for down-payment
Compares income and monthly expenses to determine
cash available for monthly payment
Does research on the at least 3 types of ______
Determines costs and finance options.
Reads consumer reports or product reviews
Compares costs, qualities, and finance options
Makes a final selection that is optimal for cost,
quality and finance options within budget.

30
Example of the Process (continued)

Identify the EALRs demonstrated at each step
The student plans to buy a ___ on credit for
_____ purpose.
The student
Figures out how much s/he can spend (EALR 4.1)
Determines money available for down-payment (EALR
4.1)
Compares income and monthly expenses to determine
cash available for monthly payment (EALR 3.1)
Does research on the at least 3 types of ______
(EALR 4.1)
Determines costs and finance options (EALR 1.5.4)
Reads consumer reports or product reviews (EALR
4.1)
Compares costs, qualities, and finance options
(EALR 3.1)
Makes a final selection that is optimal for cost,
quality and finance options within budget (EALR
2.1-2.3)

31
Example of the Process (continued)

Modify the steps as needed to ensure
demonstration of the EALRs
The student plans to buy a ___ on credit for
_____ purpose.
The student
Figures out how much s/he can spend (EALR 4.1)
Determines money available for down-payment (EALR
4.1)
Compares income and monthly expenses to determine
cash available for monthly payment (EALR 3.1)
Does research on the at least 3 types of ____
(EALR 4.1)
Determines costs and finance options (EALR 1.5.4)
Reads consumer reports or product reviews (EALR
4.1)
Creates a table to show comparison of costs,
qualities, and finance options (EALR 3.1)
Makes a final selection and explains how it is
optimal for cost, quality and finance options
within budget (EALR 2.1-2.3)

32
Possible Authentic Performances in Mathematics

Survey Research
Community issue
School issue
Return on investment (costs and sales)
Purchasing decisions
Graphic designs
Animation
Social science analyses
Sources of GDP
Major categories of federal budget
Casualties during war

33
Possible Authentic Performances in Reading

Literary analyses
Comparisons across different works by the same
author
Comparisons across works by different authors on
same theme
Analysis of theme, character, plot development
Reading journals
Research reports
Summary of information on a topic from multiple
sources
Investigation of a social or natural science
research question using multiple sources
Position paper based on information from multiple
sources

34
Providing example blueprint for tests that can
assess the standards
Type of Standard Multiple-Choice Items Short-Answer Items Essay Items and/or Performance Tasks
Simple Application 2-4 1-2
Multi-step application 2-3
Solve problem 2-3
Communicate 1-2 1-2
Total 2-4 4-6 4-5
35
Example blueprint for tests that can assess
standards
Learning Target Multiple-Choice Items Short-Answer Items Essay Items and/or Performance Tasks
Main ideas/ important details 3-4 1-2
Analysis, interpretation, synthesis 1-2 2-3
Critical thinking 1-2 2-3
Total 3-4 4-6 4-5
36
Solving Score Reliability Issues

Train expert teachers to evaluate diverse
collections of evidence
Expert teachers evaluate the collection of work
to determine whether it meets standards

37
Construct Irrelevant Variance

Factors that are unrelated to targeted knowledge
and skills that affect validity of performance
Teachers provide too much help
Teachers provide differential types of help
Students get help from parents
Directions for assignments are not clear
Students are taught the content but not how to do
the type of performance

38
Solving Construct Irrelevant Variance Problems

Provide guidelines for what constitutes valid
evidence
Provide model performance assessments or
benchmark performance descriptions
Provide professional development on appropriate
levels of help
Provide professional development on the EALRs and
GLEs
Provide professional development on how to teach
to authentic work

39
Conclusion

Collections of evidence CAN be used to measure
valued knowledge and skills
Collection of Evidence (COE) guidelines for
Washington State
Incorporate many of the characteristics that will
ensure more valid student scores
Will continue to improve as more examples are
provided
Scoring of collections
Will involve use of the same rigor in scoring as
on WASL items
Will provide reliable student level scores