Title: Update on MCAS: Is it Working Is it Fair
1Update on MCAS Is it Working? Is it Fair?
- Ronald K. Hambleton
- University of Massachusetts at Amherst
- EPRA Seminar, November 5, 2005. (revised)
2Purposes
- Address some of the misconceptions that exist
about the MCAS. - In addressing the misconceptions, provide some
insights about MCAS, and answer questions about
its level of success and its fairness. - Share some of my own concerns about MCAS and next
steps.
3General Goals of State Testing Programs Like MCAS
- Provide students, their parents, and teachers
with feedback on student educational progress in
relation to state curricula. - Compile data for monitoring progress or change in
student performance over time. - Provide data for educational accountability.
(e.g., NCLB legislation) -
4Characteristics of MCAS Assessments
- English Language Arts and Mathematics assessments
at several grades. - Science, social studies/history assessments,
second language proficiency are coming in
Massachusetts. - Multiple-choice (60)and performance tasks (40).
- Assessments include a core of items for student
reporting, and other items (for field-testing,
curriculum evaluation, and linking test forms
over time) - Performance standards set by educators.
5MCAS is not just about testing! It is about
- -- substantially increased funding for education
- --curriculum reform
- --integrating of curricula, instruction, and
assessment - --improving administrator and teacher training,
and educational facilities - --addressing special needs of students
6Lets consider next, six common criticisms of
MCAS.
71. State tests encourage teachers to teach to
the test and this narrows the curriculum taught.
- This is potentially a valid concernProblematic
with NRTs10 to 20 coverage of curricula, MC
items only. Same skills and items are assessed
each year. Here, teaching narrowly to the
skills and content of the test improves test
scores but not learning of the broader curricula.
But MCAS assessments are not NRTs!!
81. State tests encourage teachers to teach to
the test and this narrows the curriculum
taught.
- MCAS Assessments are representative of the
curricula and new items are used each year. What
does teaching to the test mean when the tests
are a sample of the curricula? Teach the
curricula! - Consider the next set of displays85 or more of
MCAS curricula are assessed in every three year
cycle.
9Comparison of Percent of Learning Standards
Assessed in Math at Grades 4, 6, 8, and 10 from
2001 to 2004. (about 40/grade)
10Percent of learning standards assessed in
Mathematics at grade 4, 6, 8 and 10 in the time
periods, 2001 to 2003, and 2002 to 2004.
11(No Transcript)
12- In sum, no justification for narrowing the
curricula The assessments are representative of
the curricula, and over three year periods, over
85 of learning standards are being assessed.
(Results at all grades and subjects are equally
good.) Teaching to the test/assessment means
teaching the curricula! - Other states, too (e.g., Minnesota) have found
that when tests and curricula are aligned,
teachers are considerably more supportive of
educational assessments. Teachers need to see
these alignment results in Massachusetts!
132. Important decisions about students should
not turn on one test.
- AERA, APA, and NCME Test Standards highlight
importance of measurement errors, and the
undesirability of a single test driving an
important decision. - Reality State tests (e.g., grade 10 ELA and
Math) are not the only requirements for
studentsEnglish, mathematics, science, history
credits at HS level are required, as well as
regular attendance.
142. Important decisions about students should not
turn on one test.
- Students have five chances to pass grade 10 ELA
and math assessments during their high school
years. - Appeals process (for students close to the
passing score, and who are attending school
regularly, and taking the assessments). - Math assessment is available in Spanish too (to
reduce bias).
152. Important decisions about students should not
turn on one test.
- DOE is expecting schools to be doing their own
assessments too (using multiple methods, such as
projects, portfolios, work samples, classroom
tests, etc.)I think one might question high
school grading practices at grades 11 and 12 if a
grade 10 test was a major block for graduation. - In sum, the criticism does not have merit.
163. State assessments are full of flawed and/or
biased test items.
- Item writing is not a perfect science and
mistakes will be made. - Massachusetts releases all scored-items on the
web-site shortly after their use. Find the
flawed items if you can? (I cant.) This is a
remarkable situationfew states release
itemsexcellent for instructional purposes and
critics. If critics think items are flawed, find
them.
173. State assessments are full of flawed and/or
biased test items.
- Process of preparing items in Massachusetts is
state-of-the art qualified and culturally
diverse item writers content and bias reviews by
committees, department, and contractors field
testing study of statistical evidence for bias
and care in item selection (optimal statistically
and content valid).
183. State assessments are full of flawed and/or
biased test items.
- UMass has looked at over 1000 items over years,
grades, and tests, and found little statistical
evidence for gender and racial bias. - I just dont see the merit of this criticism, and
I have studied these tests to find flaws and
biases and cant (but for a few items in science).
194. Student testing takes up too much time and
money.
- Quality tests are expensive, and require student
time. (Reliability of scores needs to be high.)
(six hours in some gradesgrades 4 and 10) - In Massachusetts, for example, all students at
grades 3, 4, 6, 7, 8, and 10 are tested for some
subjects.
204. Student testing takes up too much time and
money. (Cont.)
- 4 to 6 hours/per student or about 0.5 of
instructional time per year. (one day of 180!) - 7.0 billion on education, 25 million for
assessment/year, 20.00/student. 0.3 of
education budget, or 1 of every 300 dollars spent
on MCAS assessments! Seems obvious that the
amount of time and cost of assessments is not out
of line with value.
21Changes in timing or scheduling of the
testreaction to criticisms in 1998.
- Administer test in short periods.
- Administer at a time of day that takes into
account the students medical needs or learning
style. - Time of testing varies by grades, but takes less
than one day (total) of 180 days of school year,
not all grades assessed, and diagnostic results
from students and groups can be used to improve
instructional practices!
22One Example Using Item Analysis Results at the
School Level
Students Performing At the Proficient Level
Your school
235. Passing scores are set too high.
- Too often judgment of pass rates is made based on
failure rates. - Look at the process used by the states, look for
validity evidence. - Who is setting the passing scores and what method
are they using? - What is the evidence to support performance
standards that are set too high? It doesnt
exist, in my judgment.
245. Passing scores are set too high.
- Typically, passing scores are set by educators,
school administrators and sometimes parents and
local persons are included too. In
Massachusetts, teachers dominated. (52) - Critics need to study the procedures used in
setting passing scores and validity evidence. - As an expert on this topic I will tell you that
the state used exemplary procedures.
255. Passing scores are set too high.
- Test scores are placed on a new reporting scale
with scores from 200 to 280. 220 is passing. - In 2005, more than 75 of grade 10 students
passed both ELA and math assessments (first
time), and pass rates over 80 for each
assessment (first time takers). - I dont see merit in the criticism.
266. There is little or no evidence that MCAS is
producing results.
- Internal evidence (sample)
- --At the grade 10 level, pass rates have been
steadily increasing. - --Research evidence by Learning Innovations
(2000) 90 of schools indicated changes in
curricula changes influenced by test results
instruction influenced by MCAS results in over
70 of teachers.
276. There is little or no evidence that MCAS is
producing results.
- External evidence
- --State received very positive reviews about the
MCAS curricula from Achieve, Inc. (a national
review group) Among the best in country. - --NAEP scores are up since 1992 for white, Black,
and Hispanic students. - --SAT scores are up and more students taking the
SAT.
28- NAEP 2005
- Massachusetts and National Results
- Percentages at NAEP Achievement Levels
-
- Mathematics Grade 4
- Reading Grade 4
29- Mathematics Grade 4
- Percentage at NAEP Achievement Levels
Source Massachusetts Snapshot Report 2005 US
DOE, IES, NCES
30- Reading Grade 4
- Percentage at NAEP Achievement Levels
Source Massachusetts Snapshot Report 2005 US
DOE, IES, NCES
311994-2004 Massachusetts Mean SAT Scores Combined
Verbal Math
MA Nation
32Personal Concerns
- Drop out rates have increased, especially for
inner-city students. But how much? Why? What
can be done if true? - Retention rates at the ninth grade are up. How
much? Why? What can be done? - Consequential validity studies are needed.
Intended and unintended outcomesboth positive
and negative need to be identified, and addressed.
33Personal Concerns
- Funding of schools. Is it sufficient? Are we
spending the money on the right items and in the
appropriate amountsteachers, special programs,
school facilities, etc.? (Assessment results
provide clues, at least, to problem areas.)
34Conclusions
- I am encouraged by educational reform in
Massachusettsmany positive signs funding,
curricula, assessments, concern for students who
need special assistance, etc. - Internal and external validity evidence is very
encouraging. - Importance problems remainnotably achievement
gap, and funding issues.
35Conclusions
- I am troubled by the misconceptions that are so
widely held about the MCAS. They interfere with
effective implementation. - Would like to see everyone get behind educational
reform, and make it work for more students.
Continue with the strengths. - --Compile substantial validity evidence, then
make the necessary changes, with the goal to make
education in Massachusetts meet the needs of all
students.
36Follow-up reading
- R. P. Phelps. (Ed.). (2005). Defending
standardized testing. Mahwah, NJ Lawrence
Erlbaum Publishers.
37- Please contact me at rkh_at_educ.umass.edu for a
copy of the slides, or to forward your questions
and reactions.
38(No Transcript)
39(No Transcript)
40State approach to minimizing drop-outs
- Provide clear understanding to students about
what is needed. - Improve students classroom curricula and
instruction. - Offer after-school and summer programs.
- Find new role for community colleges to meet
student needs. - Do research to identify reasons for drop-outs and
then react if possible.
417. Testing accommodations are not provided to
students with disabilities.
- Federal legislation is very clear on the need for
states to provide test accommodations to students
who need them. (ADA, IDEA legislation) - Validity of scores is threatened.
- State provides a large set of accommodations.
42Long List of Available Accommodations
- About 20 accommodations organized into four main
categories(a) changes in timing, (b) changes in
setting, (c) changes in administration, and (d)
changes in responding.
43b. Changes in test setting
- Administer to a small group or private room
- Administer individually
- Administer in a carrel
- Administer with the student wearing noise buffers
- Administer with the administrator facing student
44c. Changes in test administration
- Using magnifying equipment or enlargement devices
- Clarifies instruction
- Using large-print or Braille editions
- Using tracking items
- Using amplification equipment
- Translating into American Sign Language
45d. Changes in how the student responds to test
questions
- Answers dictated
- Answers recorded
468. State tests must be flawed because failure
rates are high and better students do go onto
jobs and colleges.
- Actually failure rates at the grade 10 level are
not high. (80 pass both tests on first chance) - NAEP results are not that far out-of-line with
state results in New England. in fact, results
are close - Too many colleges must offer basic reading and
math courses.
478. State tests must be flawed because failure
rates are high and better students go onto jobs
and colleges.
- Internationally, we are about middle of the pack.
In one of the recent studies, we were right
there with Latvia and New Zealand, and trailing
Korea, Singapore, and many other industrial
countries.
489. Test items are biased against minorities.
- Another excellent validity concern, but the
evidence is not supportive of the charge in
Massachusetts. - We have analyzed available 1998, 2000, 2001 grade
4, 8, 10, ELA, math, science, history
Male-Female Black-White Hispanic-Black.
49Conditional P-Value Plot of Uniform DIF
(SDIF0.135, UDIF0.136)
50Conditional P-Value Plot of Non-Uniform DIF
(SDIF0.060, UDIF0.029)
51Grade 10 Mathematics DIF Item Plot
52DIF Indices for a Mathematics Test (M-F)
53Mathematics Test Items (M-F) Organized by Content
54Even prior to statistical review, judgmental
reviews take place
- Assessment development committee
- Bias review committee
- Department staff
- External content area experts
- Item writers themselves
55The real discrimination is more likely in the
educational system not the assessments.
- Discrimination does exist in an educational
system that has historically moved students
forward despite poor educational performance,
awarded high school diplomas, and sent them on to
minimum wage jobs. If high schools wont stop
practice, then the state needs to intervene.
5610. Gains in achievement are most likely due to
improvements in testing practices only.
- Many possible reasons for gain includingstudents
are learning to take tests (not all bad),
aligning of instruction to tests (not bad if
tests measure curriculum frameworks), cheating
(little evidence so far), and holding back
students/drop-outs.
57Consider retention argument to explain
achievement growth in Massachusetts retention
rate increased 25! (see Walt Haney)
- Reality It went from 4 at ninth grade to 5.
Increase 25. - With 60,000 students, 600 students affected, pass
and fail rates would be affected by about 1
only! This is not the explanation for growth.
5810. Gains in achievement are most likely due to
improvements in testing practices only. (Cont.)
- Also, it is possible that teachers and students
are working harder, teachers are focused on the
curricula, and are teaching more effectively.
5910. Gains in achievement are most likely due to
improvements in testing practices only. (Cont.)
- Research evidence by Learning Innovations (2000)
90 of schools indicated changes in curricula
changes influenced by test results instruction
influenced by MCAS results in over 70 of
teachers.
6011. Not everything that should be tested is
included.
- Definitely truebut over time, what can be tested
should be tested. And, schools have the
responsibility to pick up the rest! They can use
work samples, portfolios, more performance tasks,
etc. There are course grades too.
6112. Special education students should not be
included.
- Federal laws (ADA, IDEA) require that every
possible effort to be made to include all
students in the assessments. The policy is know
as full inclusion. President Bushs no child
left behind is another example of federal
initiatives.
62Conclusions
- From a technical perspective, many state tests,
especially recent ones, are quite sound. - Technical methods for test development, equating,
assessing reliability and validity, setting
standards, etc. are very much in place and
suitable.
63Conclusions (Cont.)
- Major shortcoming of many state testing programs
(1) too little emphasis on diagnostic testing,
and (2) too little emphasis on efforts to
evaluate program impact. - Impact on students such as drop-outs, retentions,
attitudes, going to college, etc. needed
64Conclusions (Cont.)
- Impact on teachers such as improvements in their
qualifications, changes in instructional
practices, attitudes about teaching, etc. needs
study - Impact on school administrators such as the need
for new forms of leadership, and new demands on
time, needs study.
65Conclusions (Cont.)
- Testing methods and curricula frameworks are in
place (and can be modified as appropriate). - My hope would be that educators try and make the
current program of educational reform
workevaluate as we go, and revise accordingly.
A data-based approach make sense for effective
educational reform.