Title: Quantitative Assessment of Learning Objectives
1Quantitative Assessment of Learning Objectives
2Identifying purposes of test score use
- Purposes
- Discriminate among examinees over a broad range
of ability (or temperament) - Should be composed of items of medium difficulty.
- Diagnostic test might be used to identify
specific weaknesses for low-ability students - Should contain a large proportion of relatively
easy items - Minimum competency or competitive selection
3Identifying behaviors to represent the construct
- Instructional objectives
- Experts are asked to review instructional
materials and develop a set of instructional
objectives for an achievement test - An instructional objective specifies an
observable behavior that students should be able
to exhibit after completing a course - Examples
- Students should be able to name the capitals of
all 50 states - Students should be able to multiple single digit
numbers
4Norm-Referenced Tests
- Compare the performance of an individual examinee
to the performance of other examinees - Usually composed (mostly) of items of moderate
difficulty, but may include a range of item
difficulties - Cannot determine students absolute levels of
performance, just how well they perform relative
to one another. - Variability of scores important
5Criterion-referenced tests
- Compare the performance of an individual to some
criterion or standard - Absolute levels of proficiency
- Certify whether examinees have obtained a level
of minimum competency or mastery - Typically, composed of items of similar
difficulty, but difficulty depends on absolute
level of performance expected. - Minimum competency test might have easy items
- Mastery test might have more difficult items
6Criterion-referenced tests
- Require a well-defined domain of performance or
behavior - Example Student will be able to add single digit
numbers - Not concerned about score variability. All
examinees could pass or fail. - Test developer begins with a set of instructional
objectives - Item domain domain of performance to which
inferences from test scores will be made
7Categories of Learning Outcomes
- Hierarchy of cognitive operations (Bloom, 1956)
- Knowledge recall of factual material as it was
presented during instruction (e.g., naming
capital cities of given states) - Comprehension translation, interpretation, or
extrapolation of a concept into a somewhat
different form than originally practiced or
presented (e.g., recognizing nouns in sentences
not used in class)
8Categories of Learning Outcomes
- Hierarchy of cognitive operations (Bloom, 1956)
- Application solving new problems through the
use of familiar principles or generalizations
(e.g., using a formula to solve a problem without
being told what formula should be used) - Analysis breaking down a communication or
problem into its component elements by using a
process that requires recognition of multiple
elements, relationships among elements, and/or
organizational principles (e.g., identification
of a species of plant by its leaf and flower
structures)
9Categories of Learning Outcomes
- Hierarchy of cognitive operations (Bloom, 1956)
- Synthesis combining elements into a whole by
using an original structure or solving a problem
that requires combination of several principles
sequentially in a novel situation (e.g., writing
a computer program to perform a calculation) - Evaluation employment of internal or external
criteria for making critical judgments in terms
of accuracy, consistency of logic, or artistic or
philosophical point of view (e.g., critical
review of a journal article)
10Categories of Learning Outcomes
- Declarative knowledge
- Information on can state verbally
- Recall of facts, principles, trends, criteria,
and ways of organizing events - Procedural knowledge
- Knowledge of how to do something
- Examples convert Fahrenheit to Celsius,
discriminate between a 1 bill and a5 bill - Problem solving
- Problem exists when one has a goal, but has not
identified a means to obtain that goal - May be more than one solution
11Categories of Learning Outcomes
- Procedural knowledge can be subdivided into three
categories - Discrimination reacting to various stimuli and
determining if they are the same or different - Example determining which of two balls is
heavier - Concepts involve a characteristic that can be
used to classify objects or abstractions - Example determining which shapes are triangles
- Rules application of principles that regulate
the relationship among objects or events - Example using a versus an in a sentence
12Assessment of Categories of Learning Outcomes
- Declarative knowledge
- Ask what students know
- Procedural knowledge
- Discrimination Ask students to identify
differences between objects - Concept Ask students to classify illustrations
as examples vs. nonexamples of a concept - Rule Provide students with an example and ask
them to apply a rule - Problem Solving
- Ask students to generate a solution to a problem
13Reliability and Validity
- Reliability consistency of test scores
- Would students obtain the same score if they took
the test multiple times? - Do like items correlate highly with each other?
- Do different raters provide similar ratings?
- Construct Validity
- Link between performance we observe and
underlying theoretical construct we wish to
measure (e.g., math achievement) - Must establish that visible student behaviors are
indicators of construct we wish to assess
14Validity
- Criterion-related Validity
- Indicates how well performance on a test
correlates with performance on some external
criterion - Example How well does the SAT predict college
GPA? - Content Validity
- How well does the content of the test represent
the domain of content that students should be
able to know? - Because of time constraints, a test can only
provide a sample of all behaviors that could be
assessed - Can improve content validity by using a table of
specifications
15Table of Specifications
- A two-way grid with major content areas listed in
one margin and cognitive processes on the other - The number in each cell is a weight representing
the relative emphasis in the examination that the
developer wishes to place on the content and
processes represented by that cell. - Total of the cell weights should equal 100
16Table of Specifications
- First, a test developer creates a set of
instructional or learning objectives, or other
categories of behavior - The test developer then needs to decide on the
relative emphasis that each component should
receive on the test - Balance of items so that different components of
the construct are represented in proportion to
their importance
17Table of specifications
18Table of Specifications
19Item Construction
- Two properties of items
- Substantive content
- Cognitive process examinee must employ
- Example
- Define basic terms related to circles (e.g.,
radius, diameter, central angle) - Compute areas, distances, circumferences, and
angle measures by using properties of circles - First item requires recall of memorized material
- Second item requires knowledge of concepts and
application of principles
20Item construction
- Activities
- Selecting an appropriate item format
- Verifying that the proposed format is feasible
for the intended examinees - Writing the items
- Examine the quality of the items
21Item formats
- Two general types of items
- Require examinee to choose correct answer (e.g.,
multiple choice, matching) - Alternate choice two possible responses
(true-false questions) - Multiple choice a correct response and two or
more incorrect responses (called foils or
distractors) - Matching relating objects in two separate lists
- Require examinee to generate answer (e.g., essay,
short-answer)
22Multiple Choice Items
- All possible responses should appear logically
reasonable to an examinee who does not have the
knowledge or skills measured by the item - In other words, the foils or distractors should
seem like reasonable responses, and not be
ridiculous or absurd - Otherwise, the test really is not measuring
examinees knowledge in the domain being tested - Distractors or foils are often constructed from
common misconceptions, misinterpretations, or
computational errors.
23Advantages/Disadvantages of Different Item Formats
- Alternate choice, multiple choice, and matching
items are useful for sampling a wide range of
content, but limited in the cognitive processes
that must be employed - Also, scoring is simple, relatively quick, and
reliable - However, they can be susceptible to guessing
- Time-consuming to construct
24Advantages/Disadvantages of Different Item Formats
- Essay items can more directly measure behaviors
specified by performance objectives, and can
assess more advanced cognitive processes - Students must supply the response
- Can examine how well students put ideas into
writing - However, because of time constraints, they are
limited in their ability to sample a wide range
of content - Also, scoring is less reliable
25Scoring Rubrics
- A rating scale with descriptions of performance
that range from higher to lower. - Can be used for essays, performance assessments,
portfolios, etc.
26Scoring Rubrics
- Example
- Supporting ideas inadequate or illogical
- Supporting ideas not developed
- Occasional supporting idea or example
- Supporting ideas or examples used
- Ample supporting ideas or examples
27Scoring Rubrics
- Example
- Little evidence of organizational pattern
- Organizational pattern attempted, with lapses
- Organizational pattern evident, with lapses
- Organizational pattern has some lapses
- Logical organizational pattern
28Item review
- Ask colleagues to review items after being
drafted - Criteria
- Accuracy
- Appropriateness or relevance to test
specifications - Technical item-construction flaws
- Grammar
- Offensiveness or appearance of bias
- Level of readability
29Item review
- Common flaws in item construction
- Multiple choice items making the correct
alternative longer (or shorter) than others - Spelling or grammatical errors
- Unwieldy sentence construction
- Flaws in punctuation
- Bias (use of undesirable stereotypes or use of
situations that examinees may not be familiar
with) - Inappropriate for population being tested
- Inappropriate readability
30References
- Allen, M. J., Yen, W. M. (1979). Introduction
to measurement theory. Monterey, CA Brooks/Cole
Publishing. - Crocker, L., Algina, J. (1986). Introduction to
classical modern test theory. Orlando, FL
Harcourt Brace Jovanovich. - Oosterhof, A. (2003). Developing and Using
Classroom Assessments (3rd Ed.). Upper Saddle
River, NJ Pearson Education. - Â
- Rosenthal, R., Rosnow, R. L. (1991) Essentials
of Behavioral Research Methods and Data
Analysis. New York McGraw-Hill.