Title: Establishing Passing Standards Without Gambling
1Establishing Passing Standards Without Gambling
Photo by Mel Curtis. Monitor on Psychology,
September 2002.
2What is a Standard?
Cut Score Test score a candidate must attain to
pass Raw Score - for example, 36
correct Percent Correct - 36/50 72 Scaled
Score - for example, 300
Standard The level of knowledge or proficiency
a candidate must demonstrate to pass
3Why Do We Need a Formal Procedure?
70?
- Defensibility
- Fairness
- Validity
4Important Factors in Setting a Standard
- Subject Matter Experts
- A Representative Committee
- Ample Time
- Actual Test Items (Questions)
- Definition of the Level of Proficiency Distingu
ishing Qualified and Unqualified Candidates
5Some Common Methodsof Standard Setting
- Angoff
- Item Mapping
- Bookmark
- Contrasting Groups
6The (Modified) Angoff Method
Subject Matter Experts define the minimally
competent (borderline) candidate in terms of
knowledge. They evaluate every item in the test
and estimate this candidates chances of
answering correctly. The mean estimate across
all experts and all items determines the passing
score.
7Steps in a Typical Angoff Session
Take the Test!
Define the "Just Sufficiently Qualified" (JSQ)
or Minimally Competent (MC) Candidate
What is the Probability the JSQ Candidate Will
Answer an Item Correctly?
8Steps in Angoff, continued
Determine Probabilities in Groups of 10 Items -
Discuss
Change Probability Estimates if Desired
Add Mean Ratings for All Items to Calculate Cut
Score
Evaluate Is This Cut Reasonable?
9Advantages of the Angoff Method
- A relatively straightfoward process
- No data necessary
- Has held up in court
10Disadvantages of the Angoff Method
- Must look at every item on test(s)
- Time and cost
- Fatigue, inattention, rushing
- Difficulty of accurately estimating probabilities
11Item Mapping
A graphical method of determining the level of
competence necessary for licensure
12Item Mapping Process
Administer items to a pilot group. Collect
statistics, including the difficulty of each
item. Group items by difficulty. Display in a
graph.Subject Matter Experts define the
minimally competent (borderline)
candidate.SMEs evaluate a sample of items
Does the borderline candidate have at least a
50 chance of answering correctly? Evaluate
Is this cut reasonable?
13Rasch Model
if Candidate Ability Item
Difficulty then Chance of a correct answer 50
14(No Transcript)
15(No Transcript)
16Item Mapping
- Advantages
- Sound statistical basis
- More discussion (no rushing)
- Portrait of the borderline candidate
- Multiple forms cut simultaneously
- Time
- Disadvantages
- Less straightforward (Rasch model)
- Requires empirical data
17Bookmark MethodConceptually Similar to Item
Mapping
Administer items to a pilot group. Collect
statistics, including the difficulty of each
item. Order items by difficulty. Display in a
booklet.Subject Matter Experts define the
minimally competent (borderline) candidate.
18Bookmark Method, continued
SMEs review items and place a bookmark between
items the minimally acceptable candidate is
likely to answer correctly and items this
candidate is unlikely to answer
correctly. Discuss and repeat the process,
aiming for agreement. Evaluate Is this cut
reasonable?
19Bookmark Method
- Advantages
- More discussion (no rushing)
- Portrait of the borderline candidate
- More focus on item content over entire exam
- High level of face validity
- Disadvantages
- Tends to be time-consuming
- Requires empirical data
20Contrasting Groups
Administer items to a pilot group. Subject
Matter Experts classify each candidate as
qualified or unqualified based on other data.
Score the exam and order candidate IDs by
score. Find a score or a narrow range of scores
for which approximately half of the candidates
have been labeled unqualified.
21Contrasting Groups
Score Number of Candidates Percent Qualified /
Unqualified Qualified
46-50 5 0 100 41-45 14 1 93 36-40 25 7 78 31-35 22
10 69 26-30 17 12 59 21-25 11 12 49 16-20 4 12 33
11-15 0 6 0 0-10 0 1 0
22Contrasting Groups
- Not widely used in licensure testing
- Subjectivity of judgments (Q or UnQ)
- Connection to job is less direct
- Often not feasible to get judgments
23?