Title: Evaluation: Testing, ObjectivetoTestItem Matching and Judgments of Worth
1Evaluation Testing, Objective-to-Test-Item
Matchingand Judgments of Worth
2Session Overview
- Evaluation Approaches
- Testing one possible data point in evaluation
- Norm-referenced
- Criterion-referenced
- Objective-to-test-item matching
- Measurement error, reliability and validity
3Evaluation, typically
- Typically, it doesnt happen! That said, it
should - And it is required for many funded projects
- What happened? Were goals and objectives
achieved? How can we find that out? - At the end is NOT the only time to measure worth.
When else? - Strategies tests, observations, surveys, chats
with managers, look at work, results
4Evaluation Approaches
- Objectivist
- Belief in a reality that can be known and
measured. Prevalent in education and our
business. - Objectives-based, deceptively simple. Establish
goals--set objectives-- tailor instruction to
obj--judge effectiveness. - Measures are analytical/quantitative in nature.
- Examples
- Do first-graders know the letters of the
alphabet? - Can the new account representative describe the
features of each checking account as defined by
the bank? - Others?
- Advantages/disadvantages?
5Evaluation Approaches
- Constructivist
- Belief that people construct their own realities.
Advocates believe that truth is a matter of
consensus, not measurement against an objective
reality. - Evaluation creates detailed descriptions of that
which is inside the head of the learner. - Reliance upon open-ended exercises, observation,
cases and immersion in the field. - Observation is useful for us, in that IDs build
prototypes, conduct formative evaluations, revise
and cycle again. - Measures are qualitative in nature.
- Examples
- Role play exercise to deal with a hostile
customer - Theme Park Tycoon running a theme park for a
year - Essay question asking you to describe your
understanding of Educational Technology - Advantages/disadvantages?
6Evaluation Approaches
- Postmodern/Critical
- Objectivists proclaim objectivity.
Constructivists approve of subjectivity.
Postmoderns are social activists. - Focus on questions of power, Who are you to set
objectives for others? Use of deconstruction to
see whats inside texts and materials. - Most interested in the hidden curriculum, such as
the teaching of traditional gender roles. - What does the curriculum teach?
- Why should IDs care about this evaluation
approach?
7Evaluation FrameworksKirkpatricks Model
- Level 4 Does it matter? Does it advance
strategy? - Level 3 Are they doing it (objectives)
consistently and appropriately?
- Level 2 Can they do it (objectives)? Do they
show the skills and abilities? - Level 1 Did they like the experience?
Satisfaction? Use? Repeat use?
8Evaluation Frameworks CIPP
- Context assesses program/product needs, problems
or opportunities specific to the project
environment. - Input to assess, evaluate and allocate project
resources in order to meet identified needs and
objectives, solve problems, and optimize program
impact. - Process assesses project implementation.
- Product assesses planed and unintended
(unforeseen) outcomes, both to keep a project on
track and to determine effectiveness or impact.
9Types of Tests
- Used to evaluate changes in skills and knowledge
- Is testing alone sufficient?
10Test Types Norm-Referenced
- Compare an individual's performance to the
performance of other people. - Require varying item difficulties.
- Assume not everybody is going to "get it"
- Discern those who "got it" from those who didn't.
11Normal Distribution
12Test Types Norm-Referenced
- Norm-referenced tests compare the individual to
the group. - Accomplished statistically by norming the test
with large numbers of people. - Consider
- You sat for the GRE and received the following
scores. You need to retake the test. - What is your study plan?
13Test Types Norm-Referenced
- Limitations
- Not especially helpful for
- identifying individual skill deficiencies
- identifying weaknesses in the instruction
14Test Types Criterion-Referenced
- Compares an individual's performance to the
acceptable standard of performance for those
tasks. - Requires completely specified objectives.
- Asks Can this person do that which has been
specified in the objectives? - Results in yes-no decisions about competence.
15Test Types Criterion-Referenced
- Applications
- Diagnosis of individual skill deficiencies
- Certification of skills
- Evaluation and revision of instruction
- Limitations
- Tend to focus on specific skills
- Results may not reflect general aptitudes
- Everyone may get an A
16Which Test is Which?
NR CRT
IQ test GRE SDSU Writing Competency Red Cross
Lifesaving Certificate EDTEC 540 midterm and
final exams
17Which Test is Which?
NR CRT
Give out a CA driver's license Pick students for
Russian lang. training Determine entrance into
medical school PADI Scuba Certification Select
one EDTEC scholarship recipient Figure out where
to revise a course Decide which students need
remediation
18Utility of Test Scores
- Selection screening (before)
- mastery of prerequisites -- for
remediation/placement - mastery of course objectives -- for acceleration
(testing out) - Individual diagnosis and prescription (along the
way) - Practice (along the way)
- Grades summative scores (at or after the end)
- promotion
- certification and licensure
- Administrative
- course evaluation
- trainer accountability
19Criterion-referenced Test Items
Objectives Items
- Here is a map of the USA with the states
outlined-- but no names. Use the state
abbreviations and fill them in-- you've got 15
mins to get at least 45. - Take a look at this pair of shoes. What problems
do you see? What will you need to fix them? - The goal of the instruction is "ID's will know
how to write resumes." Write at least 2
objectives with all four parts.
Given a map of the USA with state borders marked,
the lwbat write the abbreviation for 45 of 50
states in 15 mins. Given a pair of well-worn
shoes, the lwbat identify what's wrong with the
shoes and the tools and materials necessary to
fix them. Given a goal, lwbt write at least two
appropriate objectives with proper ABCD parts.
20Matching Test Items to Objectives
- Matching ensures validity
- Validity is the extent to which the test measures
what is important to performance. Does a high
score on the test equate to high performance on
the job? - The validity of a criterion-referenced test is
enhanced when - objectives match real-world performances (based
on solid analysis) - test items match stated objectives (including
condition).
21Match, or Not?
- Given any stocked fruit or vegetable, the Ralphs
Grocery Checker will be able to verbally state
the code which matches the produce provided with
100 accuracy. - Here is a persimmon from the produce department
and the produce code job aid. Please state the
produce code for this item. You may examine the
persimmon and reference the job aid.
22Match, or Not?
- Given a tree in need of pruning, the gardeners
apprentice will be able to select the correct
tree pruning device, based upon the type of tree
presented. - Here is an overgrown elm tree. Please select the
appropriate tool with which you will prune the
tree.
23Match, or Not?
- Given a descriptive order for a Café Mocha,
including size, caf/decaf, type of milk, the
barista will be able to create the drink as
specified in the Starbucks Guide to Coffee
Creations. - A customer has just ordered a Grande, non-fat,
mocha. Please list the ingredients you will
need, and describe the steps you would take to
create the drink.
24Evaluating a Training Program
- Consider
- Your evaluation uses a criterion-based test to
see if the new account representatives can
describe the different types of accounts offered
by the bank. - All representatives were able to meet the
specified criteria - Case closed or, do you want to know more?
25Ideas in Testing
- Measurement Error
- Validity
- Reliability
26Measurement Error
- Many causes
- mechanical or scoring errors
- poor wording (confusing, ambiguous)
- poor subject matter, content (validity)
- score variation from one time to another
(reliability) - score variation from "equivalent" tests
- test administration procedure
- inter-rater reliability
- mood of the student
27Validity
- Does the test assess what's important? Does it
really seek out the skill and knowledge linked to
the world? (content validity) - Types
- Content Validity (most important to us)
- Predictive Validity (e.g. SAT, GRE)
28Reliability
- Are the scores produced by the test trustworthy
and stable over time? - Assessed by
- parallel (equivalent) forms or test-retest
- internal consistency
29Testing and Evaluation
- A Look Ahead
- ED 690 Procedures of Investigation
- Provides introduction to evaluation procedures
and methods - Introduces research process, statistical analysis
- ED 791A, 791B, 791C
- Evaluation sequence most often completed by EDTEC
students, over writing a thesis - Conduct a full-scale evaluation (design,
research, report) for a living, breathing client
over a two-semester timeframe