Title: Three part colloquium series:
1Three part colloquium series
- Can Large Scale Tests be Fair to All Students
Research on Bias Issues for WASL (November 2) - WASL History and Early Research Everything You
Needed to Know About WASL but Didnt Think to Ask
(December 1) - Classroom-Based Assessments and State Standards
Implementing Alternatives to Standardized Tests
(December 11)
2Can Large-Scale Tests be Fair to All
Students?Bias Issues Related to WASL
- Catherine S. Taylor
- University of Washington
- November 2, 2006
3Background
- 10 years experience in test development (1981
1991) prior to coming to the University of
Washington - Moved to the University of Washington in 1991
(School Reform Law passed in 1993) - Principal Investigator for RD Grant (1994 -
1995) to support development of prototype
assessments of the Essential Academic Learning
Requirements (EALRs) - Washington State Technical Advisory Committee for
Assessment (1995-1999) - Principal Investigator for WASL Validity Research
Grant (2000-2004) to investigate validity of
WASL scores
4The focuses of my research
- How to prepare teachers for effective
classroom-based assessments - Validity theory
- Validity and large scale testing policy
- Threats to the validity of large scale tests
5Focuses of this presentation
- Study of Bias and Sensitivity Review procedures
used for WASL (2004) - Report of input from two Public Forums on Bias
and Sensitivity (2004) - Yakima
- Seattle
- Studies of Differential Item Functioning (AKA
statistical bias) in WASL test items (1997-2001)
6What is an Item?
- An item is a question or set of directions
(prompt) - Multiple-choice item
- A question or prompt
- 3-4 answer choices, only one of which is correct
- Performance item
- A question or prompt
- Space in which students construct an answer
- A rule for assigning points to students answers
- WASL performance items
- short answer (0-2 points)
- Extended response (0-4 points)
7WASL items are developed using state of the art
procedures
- Test Specifications define how many and what
types of items will be on a test - Item Specifications define exactly what kinds of
items will assess each Grade Level Expectation
(GLE) - Item writing overseen by skilled test developers
- Item reviews check for match to GLEs by teachers
- Bias and sensitivity reviews by individuals who
represent the diversity of WA State students
8WASL test items are tested using state of the
art procedures
- Item pilots items are randomly assigned to
students throughout WA State - Item data reviews based on students
performances - Statistical difficulty Is the item easy or
difficult because of content tested NOT some flaw
in the item? - Statistical validity Do high performing students
do better on the item than low performing
students? - Statistical bias Is item performance related to
level of knowledge and skill NOT group membership?
9Study 1 Bias Sensitivity Reviews
- Committee members represent diversity in the
student population (regions, ethnicity, gender,
socio-economic status, religion, special
population issues) - Members review reading passages and items for
- Implied or overt stereotyping or negative
representations of any group - Too much or too little representation of any
group - Terms that may be confusing to students based on
language, region, culture, socio-economic status,
etc. - Controversial issues and topics that may affect
some groups more than others
10Procedures Used to Observe Bias Sensitivity
Reviews
- Participant-observer
- Recorded panelists comments during review process
- Cross checked records with facilitator notes
- Looked for patterns in notes/records in relation
to reading passages and items
11Results of Bias and Sensitivity Review
Observations
- Few passages or test items are identified as
problematic - Reading passages present the greatest potential
for bias - Sources of bias in reading passages are subtle
12Reading passages present the greatest potential
for bias
- WASL includes
- narrative and informative passages
- passages with social studies, science, and
literary content - WASL reading passages are from published sources
- Authors resist changes to their published writing
(even when changes lessen bias/stereotyping)
13Sources of bias in reading passages are subtle
- Alterations of original narratives
- Use of legends and folk tales may be altered to
fit Western notions of literature - Language changes can change meaning (first feast
vs. barbeque) - Othering
- Biographies may focus on how individuals overcame
or coped with their minority status (Jackie
Robinson Helen Keller) - Informational passages about cultural groups may
have a patronizing tone (i.e., arent their
ways cute) - Interpretations Items may focus on
interpretations that are unique to middle class
values rather than values of the culture of origin
14Study 2 Bias Sensitivity Forums
- Two community forums (Yakima and Seattle)
- Community members came together to discuss
concerns about WASL - Participants included
- Teachers and school administrators
- Tribal elders
- Latino community leaders
- Parents and community members
15Procedures used to Gather Data during Bias
Sensitivity Forums
- Did mock bias sensitivity review
- Presented methods used for statistical bias
analysis (also called differential item
functioning (DIF)) - Showed items flagged for DIF and asked for likely
causes - Small group discussion with reports to larger
group - Recorded participant ideas about bias issues in
WASL - Examined written notes and chart paper for themes
16Themes in Participant Comments
- Need for involvement of minority teachers in all
stages of WASL development work - Need for sensitivity to cultural values in
selection of reading passages, item content, and
the types of questions (particularly in reading) - Need for inclusion of tribal elders in selection
of text and contexts for WASL items - Need for inclusion of individuals with cultural
expertise in bias/sensitivity review panels
17Study 3 Differential Item Functioning (DIF)
Analyses Typical Steps in a DIF Analysis
- Identify groups to be compared
- Compute item performance for students in
different groups at each total test score - Summarize the differences in performance across
all test scores
18(No Transcript)
19DIF Can Go Both Ways
- When individual students get their total scores
from different items thats normal - When there is a pattern in how groups of students
get their total scores - thats DIF - When students in a group do better than expected
on an item based on their total test score DIF is
in favor of the group - When students in a group do more poorly than
expected on an item based on their total test
score, DIF is against the group.
20Typical Causes of DIF
- Impact Students from different groups receive
different educational experiences such that item
performance differences reflect true differences
in knowledge/skills. - Culture/Background Students from different
backgrounds bring unique perspectives to bear on
test items. - Flaws Flaws in items that cause one group to
respond differently than another.
21Research on DIF for WASL Test Items
- Studies conducted after items had been
- reviewed by bias sensitivity committee
- examined for statistical bias
- used in an operational test
- Compared performance of
- Males and Females
- White students and Black/African American
students - White students and Latino/Hispanic students
- White students and Native American students
- White students and Asian/Pacific Islander students
22Research on DIF for WASL Test Items
- Examined test items from
- 1997, 1998, 1999, 2000, 2001 Grade 4 Reading and
Mathematics - 1998, 1999, 2000, 2001 Grade 7 Reading and
Mathematics - 1999, 2000, 2001 Grade 10 Reading and Mathematics
23DIF Results for Reading
- Most reading items showed no statistical bias
- Reading items flagged for Gender DIF
- Multiple choice items tend to favor boys
- Performance items tend to favor girls
- DIF items favoring boys tend to be related to
informational passages - Reading items flagged for Ethnic DIF
- Multiple-choice items asking for text
interpretation tend to favor white students - Performance-items asking for text interpretation
tend to favor minority students - Patterns became more extreme across grade levels
24Mean Number of Reading Items Flagged for DIF
(Males Females)
25Mean Number of Reading Items Flagged for DIF
(Asian/Pacific Islander White)
26Mean Number of Reading Items Flagged for DIF
(Black/African White)
27Mean Number of Reading Items Flagged for DIF
(Native American White)
28Mean Number of Reading Items Flagged for DIF
(Latino/Hispanic White)
29Excerpt from a reading passage
- The best looking fences are often the simplest.
A simple fence around a beautiful home can be
like a frame around a picture. The house isnt
hidden its beauty is enhanced by the frame. But
a fence can be a massive, ugly thing, too, made
of bricks and mortar. Sometimes the insignificant
little fences do their job just as well as the
ten-foot walls. Maybe its only a string
stretched between here and there in a field. The
message is clear dont cross here. - Every fence has its own personality and some
dont have much. There are friendly fences. A
friendly fence takes kindly to being leaned on.
There are friendly fences around some
playgrounds. And some playgrounds fences are more
fun to play on than anything they surround. There
are more mean fences than friendly fences
overall, though. Some have their own built-in
invitation not to be sat upon. Unfriendly fences
get it right back sometimes. You seldom see one
that hasnt been hit, bashed, or bumped or in
some way broken or knocked down.
30Example of a Reading an Item that Shows
Statistical Bias in Favor of Focal Groups
- In the sixth paragraph, the author talks about
friendly and unfriendly fences. How can you tell
them apart? - _________________________________________________
__________________________________________________
__________________________________________________
__________________________________________________
__________________________________________________
__________________________________________________
__________________________________________________
__________________________________________________
_________ - Favors Latinos, Blacks/African Americans, and
Asian/Pacific Islanders
31Example of a Reading Item that Shows Statistical
Bias in Favor of Focal Groups
- What is the authors attitude toward fences?
Give three pieces of evidence from the essay to
support your point. - _________________________________________________
__________________________________________________
__________________________________________________
__________________________________________________
__________________________________________________
__________________________________________________
__________________________________________________
__________________________________________________
_________ - favors females, Asian/Pacific Islanders, and
Latinos
32Example of a Reading Item that Shows Statistical
Bias in Favor of Males and Whites
33DIF Results for Mathematics
- Most mathematics items showed no statistical bias
- Mathematics items flagged for Gender DIF
- Multiple choice items tend to favor boys
- Performance items tend to favor girls
- DIF items favoring boys tend to require simple
applications of mathematical procedures in
number, algebra, geometry, and statistics - DIF items favoring girls tend to assess data
analysis, measurement, complex applications,
reasoning, and problem-solving - Number of items flagged for DIF increased across
grade levels
34DIF Results for Mathematics
- Ethnic DIF statistical patterns
- Performance items were flagged for DIF more often
than multiple-choice items - Slightly more of the flagged performance items
favored minority students, although differences
were small
35DIF Results for Mathematics
- Content analysis of Mathematics items flagged for
Ethnic DIF - Flagged items favoring Asian/Pacific Islander
students generally assessed number concepts,
computation, geometric procedures, algebraic
procedures, and simple statistics - Flagged items favoring Black/African, Native
American, and Latino/Hispanic students generally
assessed number, number patterns, computation,
and logical reasoning - Flagged items favoring White students generally
assessed data analysis, data representation,
measurement, reasoning, and problem-solving
36Mean Number of Mathematics Items Flagged for DIF
(Males Females)
37Mean Number of Mathematics Items Flagged for DIF
(Asian/Pacific Islander White)
38Mean Number of Mathematics Items Flagged for DIF
(Black/African White)
39Mean Number of Mathematics Items Flagged for DIF
(Native American White)
40Mean Number of Mathematics Items Flagged for DIF
(Latino/Hispanic White)
41Example of a Mathematics Item that Shows
Statistical Bias in Favor of Focal Groups
- Favor Latinos, Native Americans, Asian/Pacific
Islanders, Black/African Americans, and Females
42Example of a Mathematics Item that Shows
Statistical Bias in Favor of Focal Groups
- Favors Asian/Pacific Islanders
43Conclusions from DIF Studies
- Results suggest
- Exclusive reliance on multiple-choice items for
reading tests may result in bias against girls
and minority students particularly when items
assess interpretation of text - Exclusive reliance on multiple-choice items for
mathematics tests may result in bias against
girls - Ethnic DIF results in mathematics suggest that
content of instruction differs for students in
different groups
44Additional Points
- Similar results have been found in studies of
other tests - However, these results can only be generalized
when - Items are written in the same way as WASL items
(structured, not too open-ended) - Diverse, appropriate interpretations and problem
solutions are selected for use to train scorers
45Can Standardized Tests be Fair to All Students?
- Yes, under some conditions
- Use of reading passages that maintain cultural
characteristics - Well developed performance items that present
clear directions to students - Use of item writers from diverse backgrounds
- Selection of anchor papers and training papers
that represent diverse, valid responses - Cultural experts in bias sensitivity reviews
46For further information on my WASL research
- December 1 colloquium for results of research the
overall validity and reliability of WASL scores - December 11 colloquium for discussions of the
research basis for use of classroom-based
evidence as an alternative to WASL for high
school graduation