Title: Assessment: Reliability, Validity, and Absence of bias
1Assessment Reliability, Validity, and Absence of
bias
2Essential Terminology for Evaluating Assessment
Instruments
- Reliability The consistency of results
(measurement) obtained from an assessment, based
on the control, reduction, and/or elimination of
measurement error. - Validity The accuracy and appropriateness of the
interpretations and inferences (evaluation) drawn
from the results of a test (measurement). - Absence-of-Bias The absence of any
characteristic associated with an assessment that
might offend or unfairly penalize those being
assessed and thus distort a student's score.
3ValidityCharacteristics of Validity (Gronlund,
1998)
- Validity refers to the inferences drawn, not the
instrument. - Validity is specific to a particular use.
- Validity is concerned with the consequences of
using the assessment.
4Validity
- Validity is expressed by degree (high, moderate,
low). - Validity is inferred from available evidence (not
measured). - Validity depends on many different types of
evidence.
5Content Validity
- The degree to which the content of the items of
an assessment adequately and accurately represent
the content of the assessment domain. - Does the assessment match the objectives in both
content and cognitive processes?
6Concurrent Validity
- The extent to which a student's current
performance on an assessment estimates that
student's current performance on another
assessment or task (the criterion measure).
7Predictive Validity
- The extent to which a student's current
performance on an assessment estimates that
student's later performance on another assessment
or task (the criterion measure).
8Constructing an expectancy table
- Pre test scores
- 1 at 1
- 2 at 2
- 8 at 3
- 6 at 4
- 5 at 5
- Grades at end
- 5 As
- 9 Bs
- 6 Cs
- 2 Ds
9Expectancy Table (one possible case)How is the
prediction validity?
10Face Validity
- The degree to which performance on an assessment
appears to be valid in relation to the score's
use and interpretation. Face validity is really
not a measure of validity, but merely the
appearance of validity. Face validity can often
be very misleading.
11Construct Validity
- The degree to which performance on an assessment
may be explained by the presence or absence of
some psychological state or trait (construct). A
construct is a hypothetical psychological
characteristic that is presumed to exist that
explains patterns of behavior and thought.
12Factors that Affect Validity of Classroom
Assessments (Nitko, 1996)
- Content Representativeness and Relevance
- Does my assessment procedure emphasize what I
have taught? - Do my assessment tasks accurately represent the
outcomes specified in my school's or state's
curriculum framework? - Are my assessment tasks in line with the current
thinking about what should be taught and how it
should be assessed? - Is the content in my assessment procedure
important and worth learning?
13Thinking Processes and Skills Represented
- Do the tasks on my assessment instrument require
students to use important thinking skills and
processes? - Does my assessment instrument represent the kinds
of thinking skills that my school's or state's
curriculum framework states are important? - Do students actually use the types of thinking I
expect them use on the assessment to complete the
assessment? - Did I allow enough time for students to
demonstrate the type of thinking I was trying to
assess?
14Consistency with other Classroom Assessments
- Is the pattern of results in the class consistent
with what I expected based on my other
assessments of them? - Did I make the assessment tasks to difficulty or
too easy for my students?
15Reliability and Objectivity
- Do I use a systematic procedure for obtaining
quality ratings or scores from students'
performance on the assessment? - Does my assessment instrument contain enough
tasks relative to the types of learning outcomes
I am assessing?
16Fairness to Different Types of Students
- Do you word the problems or tasks on your
assessment so those students with different
ethnic and socioeconomic backgrounds will
interpret them in appropriate ways? - Did you modify the working or the administrative
conditions of the assessment tasks to accommodate
students with disabilities or special learning
problems? - Do the pictures, stories, verbal statement, or
other aspects of my assessment procedure
perpetuate racial, ethnic, or gender stereotypes?
17Economy, Efficiency, Practicality, Instructional
Features
- Is the assessment relatively easy for me to
construct and not too cumbersome to use to
evaluate students? - Is the time needed to use this assessment
procedure better spent on teaching students
instead? - Does your assessment procedure represent the best
use of your time?
18Multiple Assessment Usage
- Are the assessment results used in conjunction
with other assessment results?
19Features and Procedures in Establishing
Reliability and Validity (Grondlund, 1998)
- Procedures to Follow
- 1. State intended learning outcomes in
performance terms. - 2. Prepare a description of the achievement
domain to be assessed and the sample of tasks to
be used.
- Desired Features
- 1. Clearly specified set of learning outcomes
- 2. Representative sample of a clearly defined
domain of learning task (assessment/ achievement
domain).
20Features and Procedures in Establishing
Reliability and Validity
- 3. Tasks that are relevant to the learning
outcomes to be measured. - 4. Tasks that are at the proper level of
difficulty.
- 3. Match assessment tasks to the specified
performance stated in the learning outcomes. - 4. Match assessment task difficulty to the
learning task, the students' abilities, and the
use to be made of the results.
21Features and Procedures in Establishing
Reliability and Validity
- 5. Tasks that function effectively in
distinguishing between achievers and
non-achievers. - 6. Procedures that contribute to efficient
preparation and use.
- 5. Follow general guidelines and specific rules
for preparing assessment procedures and be alert
for factors that distort the results. - 6. Write clear directions and arrange procedures
for ease of administration, scoring or judging,
and interpretation.
22Features and Procedures in Establishing
Reliability and Validity
- 6. Sufficient number of tasks to measure an
adequate sample of achievement, provide
dependable results, and allow for a meaningful
interpretation of the results.6
- . Where the students' age or available assessment
time limit the number of tasks, make tentative
interpretations, assess more frequently, and
verify the results with other evidence.
23Types of Bias (General) (Popham, 1999)
- Offensiveness Any component of an assessment
that may cause undue resentment, pain,
discomfort, or embarrassment (e.g., stereotyping,
word choice). - Unfair Penalization Any assessment practice that
may disadvantage a student and distort their test
score as a result of group membership (e.g.,
socioeconomic class, race, gender). Unfair
penalization does not result from scores that
differ due to differences in ability.
24Absence of Bias
- Disparate Impact
- An assessment that differentiates according to
group membership is not necessarily biased. The
question is whether or not that differentiation
occurs due to unfair circumstances. If an
assessment is not offensive and does not unfairly
penalize, and there is still group
differentiation, the likely cause is inadequate
prior instructional experiences.
25Types of Bias (Specific) (Nitko, 1996)
- Assessment Bias as Mean Differences Bias may be
indicated if the mean test of one group differs
substantially from another group. However, if the
test is free from offensiveness and unfair
penalization, the test may be representing real
differences between the groups relative to the
domain tested. Mean differences are generally not
a good indicator of bias.
26Assessment Bias as Differential Item Functioning
- Bias may be indicated if the mean score for a
particular item differs substantially from one
group to another. The key to differential item
functioning is to examine persons of equal
ability, from different groups, to see if there
is a difference relative to the item of concern.
If there is, bias may be present, although
differential item functioning does not prove bias.
27Assessment Bias as Misinterpretation of Scores
- Bias may be indicated if the results of an
assessment are interpreted beyond their valid
usage. Scores are valid for a particular use,
relative to a particular group. Inferences beyond
these specifics are invalid and may be biased.
28Assessment Bias as Sexist or Racist Content
- An assessment would be biased if it perpetuates
stereotypes or portrays groups in an offensive
manner.
29Assessment Bias as Differential Validity
- Bias may be indicated if an assessment predicts
performance on a second assessment or task
(predictive validity) differently for different
groups. This source of bias is generally not a
problem in educational assessment.
30- Assessment Bias as Content and Experience
Differential An assessment is biased if the
content of the assessment differs significantly
from a groups life experiences and the evaluation
of the results of the assessment do not take this
difference into account.
31- Assessment Bias in Selection Decisions In cases
where several people are vying for a few openings
(e.g., jobs, programs), assessments are often
used as part of the selection process. The
selection process may be biased if it uses an
assessment that differentially measures groups
unfairly or if the relationship between the
differential assessment and the attributes
necessary for success is not clearly understood.
32Assessment Bias related to Assessment Atmosphere
and Conditions
- Bias may be indicated if the testing situation
differentially affects different groups. Feelings
of being unwelcome, anxiety, or being tested by a
member of an antagonistic group may lead to this
type of bias.