Title: Outline
1Outline
- Test bias definitions
- The basic issue group differences
- What causes group differences?
- Arguments that tests are not biased
- Differential item functioning analysis
- Criterion-related sources of bias
2Outline
- Other approaches to testing minority groups
- Chitling test
- BITCH test
- SOMPA
- Models of test Bias
- Regression
- Constant Ratio
- Cole/Darlington
- Quota
3Test bias definition
- A test is biased if it gives a systematically
wrong result when used to predict something.
- So, an intelligence test would be biased if, for
example, it underestimated one groups
probability of success in a given endeavor.
4Test bias the basic issue
- Various groups within society differ in their
average scores on some psychological tests
- African-Americans score 1 standard deviation
lower than Whites - Asian-Americans score slightly higher than Whites
- Ashkenazi Jews score highest of all
5What causes group differences?
- We dont know. Here are some candidate accounts
- Genetics
- Socioeconomic factors
- Caste
- Culture
- Stereotype threat
6Genetics
- Highest IQ scores are for Ashkenazi Jews
- Cochran et al. (2006) medieval social
environment for European Jews selected for verbal
math intelligence (but not spatial) - Some relation to disease genes?
7Socioeconomic factors
- Much higher proportion of African-Americans are
poor than of Whites, with consequences for
nutrition, health care, resources such as books
in the home
- But AA White difference is not eliminated when
groups are equated on SES
8Caste
- Involuntary minorities all over the world do
less well in school and drop out earlier than
majority children
- Ogbu African-American children lack effort
optimism the sense that hard work will be
rewarded
9Culture
- African-American culture has a deep-structure
that conflicts with the demands made by typical
American schools
10When children are ordered to do their own work,
arrive at their own individual answers, work only
with their own materials, they are being sent
cultural messages. When children come to believe
that getting up and moving about the classroom is
inappropriate, they are being sent powerful
cultural messages. When children come to confine
their 'learning' to consistently bracketed time
periods, when they are consistently prompted to
tell what they know and not how they feel, when
they are led to believe that they are completely
responsible for their own success and failure,
when they are required to consistently put forth
considerable effort for effort's sake on tedious
and personally irrelevant tasks . . , then they
are pervasively having cultural lessons imposed
on them" (Boykin, 1994, p. 125).
11Racial identity test scores
- 313 African-American university students at a
historically Black university - GRE (Verbal) and several psychological tests
12Awad (2007)
- Cross Racial Identity Scale
- Cross (1991)
- Rosenberg Self-Esteem Scale
- Rosenberg (1965)
- Academic Self-Concept Scale
- Reynolds (1988)
13Racial identity test scores
- Academic self-concept predicted GPA but not GRE
test scores
- Racial identity predicted neither GPA nor GRE
scores - Self-esteem didnt predict either
14Stereotype threat
- Steele Aronson (1995)
- A social-psychological threat produced in a
situation in which a negative stereotype about
your group is made salient
- You fear you will confirm the stereotype
- This affects highly able, school-identified
African-Americans because they feel the most
pressure to do well
15Arguments that tests are not biased
- Major tests have been subjected to impressive
scrutiny for decades - Enormous resources are devoted to this purpose
- Criterion validity has been established very
securely for the major intelligence tests they
do predict college and job performance
16Arguments that tests are not biased
- It is not appropriate to focus on individual
items on a test, which some critics of testing do
- Items should be drawn from a variety of domains,
not all of which will be familiar to anyone
17Arguments that tests are not biased
- Test developers evaluate tests on the basis of
overall patterns of prediction utility
- Theyre future-oriented, not past-oriented
- How will you do in college or in a job?
- Not have you had the opportunity to learn?
18Arguments that tests are not biased
- Do you think of test score results as outcomes
or as information (predictors)?
- Test developers say, results are the beginning,
not the end they are information that will
guide us - Opponents see test results as outcomes
19Arguments that tests are not biased
- Systematic studies have asked whether biased
items produce group differences on tests such as
Stanford-Binet and Wechsler tests
- These studies found no evidence that group
differences disappeared when allegedly biased
items were removed
20Argument that tests are not biased
- Group differences just as large on what is
considered the most culture fair test, Ravens
Progressive Matrices, as on WAIS
- IQ scores have same utility for prediction
regardless of race or socio-economic status.
21Differential item functioning analysis
- In this approach to testing for bias, you first
form groups for comparison which are equated on
overall test score - Implication groups are equivalent in overall
ability
- Then, you look for differences between groups on
individual items - Where difference is found, you conclude that the
item is biased (since groups are not different on
ability)
22Differential item functioning analysis
- But removing such items does not eliminate group
differences
- E.g., people depicted in test items may typically
be White male - But changing this has little effect (McCarty,
Noble, Huntley, 1989)
23Criterion-related sources of bias
- We evaluate criterion validity by looking at
correlation between test scores and criterion
scores
- E.g., SAT scores vs. GPA after 4 years at
university
24Criterion-related sources of bias
- If correlation is good, we use test scores (e.g.,
SAT) to predict criterion and make selection
decisions
- What do we do if the correlation is different for
different groups? - This would imply that test scores mean different
things for different groups
25Criterion-related sources of bias
- In this graph, Group B performs better than Group
A but the correlation is the same for both
26Criterion-related sources of bias
- In this graph, the slopes of the lines are the
same but the intercepts are different - Equal slopes means equal correlations that is,
equally good predictions
Group B
Criterion
Group A
Test score
27Criterion-related sources of bias
- Here, the intercepts are different and the slopes
are different, so predictions for Groups A and B
would not be equally good - Such cases are rare
Group B
Group A
X1
X2
28Criterion-related sources of bias
- Major tests, such as SAT and WISC-R, have equal
criterion validity for various ethnic groups
(e.g, African-American, White, Latino/Latina)
- Similar results have been found in other
multi-ethnic countries, such as Israel
29Other approaches to testing minority groups
- The Chitling Test
- The BITCH Test
- SOMPA
30The Chitling Test (Dove, 1968)
- Developed to make a point about testing for
information a group is unlikely to have acquired - Questions require a particular form of street
smarts to answer correctly
- No validity data exist for this test
- If you want to predict college performance for
minority students, this test wont help
31The BITCH test (Williams, 1974)
- Task define 100 words drawn from the
Afro-American Slang Dictionary and Williams'
personal experience
- African-Americans score higher than Whites
- Williams argues that this test is analogous to
the standard IQ tests, which are also
culture-bound
32The BITCH test (Williams, 1974)
- Problem there is no reason to accept the claim
that this is an intelligence test. - There is no validity evidence no prediction of
any performance
- Does not test reasoning skills
- May have some value for testing familiarity with
African-American culture
33SOMPA (Mercer, 1979)
- System of Multi-cultural Pluralistic Assessment
- Based on idea that what constitutes knowledge is
socially-constructed
- Mercer also suggested that IQ tests are a tool
Whites use to keep minority groups in their
place.
34SOMPA (Mercer, 1979)
- Inspired originally in part by over-representation
of minority group children in EMR classes in US
schools
- Mercer this over-representation resulted from
both - More medical problems
- Unfamiliar cultural references on tests
35SOMPA (Mercer, 1979)
- Fundamental assumption all cultural groups have
the same potential on average
- On this view, if one cultural group does more
poorly than another on a test, that is a fact
about the test, not the groups.
36SOMPA (Mercer, 1979)
- Combines 3 kinds of evaluation
- Medical
- Health, vision, hearing, etc.
- Social
- Entire WISC-R
- Pluralistic
- Compare WISC-R scores to those of same community
37SOMPA (Mercer, 1979)
- Estimated Learning Potentials WISC-R scores
adjusted for socio-economic background
- But these ELPs dont predict school performance
as well as the original WISC-R scores - Mercer ELPs are intended to assess who should be
in EMR classes
38SOMPA (Mercer, 1979)
- A major problem, in my view, is that we dont
know what consequences arise for children who are
removed from EMR classes on basis of ELPs
- Is what we call these children important? It is
if the label has an effect, but data do not show
that effect - SOMPA used much less today than it used to be
39Models of test Bias
- Regression
- Constant Ratio
- Cole/Darlington
- Quota
40Regression
- Basis unqualified individualism
- Treat each person as an individual, not as a
member of a group - Select people with highest scores for job or
college place
- Ignores sex, race, other group characteristics
- Leads to highest average performance on criterion
41Constant Ratio
- Basis choose so that selection ratio for groups
success ratio for groups
- Select the best candidate but give a boost to
minority group members scores so that selection
probability success probability
42Constant Ratio
- Adjust test scores for minority groups upwards by
half the mean difference between groups
- Leads to somewhat lower average performance on
criterion
43Cole/Darlington
- Basis If there is special value in selecting
minority group members, then a minority score of
Y on criterion is equal to a majority score of Y
k on criterion
- Separate regression equations used for different
groups and adjustment made - Leads to lower average performance on criterion
44Cole/Darlington
- If a value is placed on selection of minority
group members, and intercept is lower for that
group, then we consider minority test score X1
and majority test score X2 equal
k
45Quota
- Basis idea that all groups should have equal
outcomes - Selection based on different regression equations
for each group
- Produces lower average performance on criterion
46Quota
- If 10 of population is Asian then 10 of student
body should be Asian
- Another way to look at this if 10 of population
is Jewish then no more than 10 of professors
should be Jewish. This puts the quota idea in a
different light.