Title: Quality Assurance and Regulatory Excellence
1Quality Assurance and Regulatory Excellence
2007 Annual Conference
- Carol Morrison, Elizabeth Azari
- National Board of Medical Examiners
- Lynn Webb
- Testing Consultant
2Creator of Early Quality Assurance Procedures
3He made a list, And checked it TWICE
4AND THEN.Got independent verification
5In a nutshell
- Make a list
- Check it twice
- Get independent verification
6Sequence of Presentations
- Quality Assurance in Test Development (Lynn Webb)
- Quality Assurance in Test Administration
(Elizabeth Azari) - Quality Assurance in Scoring and Score Reporting
(Carol Morrison)
7Ultimate Frisbee
8Have you played?
9Setting the Scene Ultimate Frisbee
- "This is no wobbly game of lob and catch passing
is fast moving, deadly accurate"Time Out, UK
10In a Nutshell
- The thrill of rugby, without the bloodshed!
- Lynn Webb, 2007
11The Game
- Ultimate is played between two teams of seven
players on a large rectangular pitch. A line
drawn across the pitch at either end creates two
"end zones" (like in American Football). These
are the goal-scoring areas. A goal is scored when
a team completes a pass to a player standing (or
more likely running) in the end zone they are
attacking.
12- Players cannot run with the disc. When you get
the disc you must come to a stop and try to throw
it to another player. By passing from player to
player, the offense attempts to work the disc up
the pitch towards the end zone they are
attacking. If the disc hits the ground or is
intercepted or knocked down by the other team,
then the opposition takes possession. -
13- The defending team attempts to stop the team with
the disc from making progress up-field by marking
them (as in soccer or basketball).
14Examinations
- Ultimate Frisbee Rules of the Game
(self-assessment) - Ultimate Frisbee Observer Certification
Examination - Ultimate Frisbee Player Certification Examination
15Observer Certification Examination
- Each of the three examinations followed the
typical test development cycle. - Im going to focus on the results of the Observer
Certification Examination
16Session Assumes Familiarity with TD Cycle
- Job analysis
- Test specifications
- Question writing
- Test Assembly
17Job analysis Preventing Problems
- Load effort into the front end (logical analysis)
so things will turn out well in the validation
(survey) - Front end efforts include job shadowing,
literature searches, focus groups, structured
brainstorming, etc. - Validation from the field (survey) must be
representative and thorough
18(More about the survey)
- Survey contains all the content
- Include write-in to cover anything you missed
- (Yes, its cumbersome, but what if you missed
something?) - Proofread the survey (creator and independent
verification) - Pilot test the survey (directions, timing,
vocabulary, usability issues)
19Test Specifications Preventing Problems
- Must be tied to the job analysis
- Consider all test users
- Candidates for Observer credential
- Test question writers
- Employers of Observers
- Fans
20(More about Test Specs)
- Content domains should be distinct
- Questions should fit into one domain (not all)
- Think ahead to when you will want to inventory
the bank of questions according to the test
specifications (e.g., how many questions do we
have in domain 1?)
21Sample Section of Test Specs
- 2 Rules of play
- 2.1 Objective
- ..
- 2.6 Gameplay
- 2.6.1 The pull or throw-off
- 2.6.2 Movement of the disc
- 2.6.3 Scoring
- 2.6.4 Change of possession
- 2.6.5 Stoppages of play
- 2.6.5.1 Fouls
- -- 2.6.5.2 Violations
- 2.6.5.3 Time outs and half-time
- 2.6.5.4 Injuries
- 2.6.6 Substitutions
22Question Writing Preventing Problems
- Questions must be tied to the test specifications
- Question writers should be content experts and
should receive training - Goals of the testing program
- Item formats to use
- Examples of great and poor questions
- Feedback
23(More about question writing)
- Items must be reviewed for content
- Accuracy
- Clarity
- Currency
- Items must be style-edited
- Items must be reviewed for bias/sensitivity
- Items must be pilot-tested (or pretested)
24Test Assembly Preventing Problems
- Assembly must be tied to Test Specifications
- Make the list
- Check it twice
- The specs are published its your promise to
the candidates
25(More about test assembly)
- Review drafts to ensure absence of confounding
variables - Too many items on certain topic
- Too many items of certain format or type
- Enemy items
- Usability or format issues
26Hypothetical Case 1
- After credentialing the first group of
candidates, it is noted that inappropriate calls
are made by certified Observers throughout the
national tournament.
27What happened?
- It is discovered that the Observers were throwing
the games to cash in on bets estimated at
800,000 per game.
28TD Processes to Promote Success
- Front-end planning in the job analysis study is
essential. There can be critical components of
the credential that arent part of the knowledge
of content. In this case, ETHICS was overlooked. - Ethics (or other non-content considerations) can
be incorporated into the test, or into
eligibility requirements, or into signed
attestations, etc.
29Hypothetical Case 2
- The passing rate for the Observer Certification
Examination takes an unexpected dip on Form 3,
even though the items were pre-equated to ensure
similar difficulty level to forms 1 and 2.
30What happened?
- The test specifications were met in test
assembly, but the item writers were trained in
writing RECALL and APPLICATION questions. Form 3
contained all application items and candidates
became fatigued, performing less well than
expected on the test.
31TD Steps to Promote Success
- Review drafts to ensure absence of confounding
variables - Cognitive level (recall / application of
knowledge) - Another dimension of content
- Type of question or format
32Hypothetical Case 3
- Form 4 of the examination is ready to be printed
(or published) and there are only 149 questions
instead of 150. Staff bump into each other
trying to figure out how the list of 150 is only
producing 149 items. Tempers flare and
accusations are launched. The test developer is
sure the committee selected the correct of
items.
33What happened?
- Some electronic item banking systems are prone to
versionitis unless carefully timed procedures
are followed. The committee of content experts
selected Form 4 before Form 3 was analyzed and
scored. One of the items selected for both forms
was thrown out during a key verification step
for Form 3 (candidates all scored as correct).
The item was deleted from the bank.
34TD Steps to Promote Success
- Make a list, check it twice, get independent
verification! - In the case of items being used across forms for
statistical equating, a step on the list of
procedures would be to ensure that the
overlapping items are still viable for use in the
new form.
35To Err is Human
36True Quality Assurance
- Someone else checks your list, twice
37Speaker Contact Information
- Lynn C. Webb, Ed.D.
- Testing Consultant
- Chicago
- (847) 579-0845
- testing_at_lwebb.com
- or
- ultimatefrisbee_at_lwebb.com
38Quality Assurance Steps For The Ultimate In
Test Administration
- Web-based delivery of the 60-item Rules of the
Game Self Assessment (e.g., at home or Internet
café, not proctored, 30-day window, 1 form) - Pencil and paper delivery of the 150-item
Observer Certification Exam (multiple locations,
proctored, one-day window, 1 form) - CBT delivery of the 200-item Player Certification
Exam (CBT centers, proctored, two-week window, 2
forms)
39 QA Considerations for all Delivery Methods
- Preventing Problems
- Preparing the site
- Site setup requirements, proctor instructions,
vendor expectations, home computer requirements - Preparing the candidate
- Communicating testing rules, documentation
required to test, info for tech support and
troubleshooting - Maintaining security before, during and after the
test - Special considerations for each delivery method
40QA Considerations for All (cont.)
- Mitigating problems
- Anticipate problems / find solutions (before, on
and after test day) - Capture test day events (encouraging consistent
and descriptive proctor reports recording
specific technical problems during WBT) - Transmit relevant test day data to scoring (do
you understand which data are important?) - Turn lessons learned into preventative measures
for the future
41Specific WBT Considerations
- Why web-based for self-assessment?
- Candidate convenience (choose time, location)
- Instant candidate feedback
- Low stakes exam with fewer consequences if
problems arise - Perception less costly for the program to
administer, but depends on support needs - What are some drawbacks?
- Vagaries of the Internet, multiple platforms to
support - Staffing to manage candidate problems
- Security
42WBT Hypothetical Case
- Candidate L (for last-minute) has had a month
to take his self-assessment, but has never logged
on to take it. At 400 p.m., on day 30 of his
30-day window, he realizes that he has forgotten
his authorization code. Without it, he cannot
log on. He intends to take the exam sometime
after 800 that evening.
43WBT Hypothetical Case
- Candidate L takes your sage advice and tries to
access the exam immediately (it is now 430)
using his newly provided authorization code. He
discovers that he can get to some information
screens, but he is having trouble accessing the
exam. In another phone call to you, he comes to
realize that his computer does not meet the
system requirements for this exam.
44WBT Hypothetical Case
- Candidate L finds another computer and
successfully logs on to take the self-assessment
at 900 p.m. At 930, only part-way through, his
computer screen freezes. He is not sure what to
do. He worries that if he reboots, some or all
of his answers will be lost. He calls the
support number provided (thank goodness he
printed that out at 430), but receives no
answer.
45WBT Practices to Promote Success
- Preparing for Administration Success
- Provide technical specifications to candidates
- Provide a systems check to help candidates assess
basic computer readiness in advance and provide a
simple tutorial - Create printable, easy to find FAQs to help
candidates troubleshoot - Discourage procrastination!
- Be sure that support staff have power and
information to assist - Before offering to candidates, take a dry run of
the self-assessment in the production environment
(vary computers and locations) - Keep the exam short
46WBT Considerations
- What about security?
- Consider this a non-secure administration
- Do you care whether the person actually testing
is the person you authorized to test? (Proxy
testing or cheating on a self-assessment?!) - How important is the release of your
self-assessment test material to unauthorized
persons and what will you do to prevent it? - Solutions (electronic proctoring?)
47WBT Considerations (cont.)
- Impact on performance feedback if a problem (How
to handle partial or incomplete results?) - Problem reports (Do you have a standard protocol
for support staff to report reporting problems,
including pre-identified descriptive categories?)
- Contingency plan for late takers (Will you extend
the window for a month long self-assessment?) -
48Specific P P Considerations
- Why pencil and paper delivery for the one-day
Observer Certification exam? - Relatively consistent testing conditions
- Security controls (control exam shipments,
proctored, ability to see ID candidates
throughout process) - Convenience of location (e.g., training program
locations) - What can go wrong with printed materials? (?)
- What are some drawbacks?
- Proctor identification and training
- Manual handling of materials
- Test day problems that may require more than a 1
day window
49P P Hypothetical Case
- On test day, the proctor opens the packages of
test booklets and distributes them to candidates.
When instructed, the candidates open their
booklets only to find that some of the booklets
are missing pages. Some are also missing answer
sheets.
50P P Hypothetical Case
- Your candidates are testing when a fire alarm
sounds and you hear an announcement to evacuate
the building. What are your next steps?
51P P Hypothetical Case
- It was just a fire drill, so your candidates are
back to resume testing. Eventually, time is
called at the end of the first half of the exam,
but candidate W (for wants to quit her desk
job) continues to enter answers on her sheet.
Two other examinees start to talk about the exam.
52P P Practices to Promote Success
- For Administration Success, Prepare Proctors
- Identify proctors early and provide clear
proctors manual (require certification that
proctor read/familiar?) - Be sure to provide examples of test day problems
and what to do about them - Control shipment of materials and provide for
return shipment - Ship using traceable method to designated persons
(monitor) - Require proctors to timely count and report
number of books/answer sheets received - Provide additional test books in each shipment
- Provide a standardized proctor report form for
all locations - Provide pre-test and test day support and contact
information - Be sure to plan for non-standard administrations
(e.g., in some cases, examinees granted test
accommodations may need a separate proctor)
53P P Practices (cont.)
- and Prepare Candidates
- Inform candidates of required information for
admission - Inform candidates of basic test timing (section
times and break times, if any) - Inform candidates of testing rules well ahead of
time - Remind them of the timing and rules on test day
(scripted instructions by proctor, laminated
sheet or post basic instructions)
54P P Practices (cont.)
- and Plan for Security
- Store booklets in secure, locked area with
restricted access - Identify potential breaches immediately
- Require authorization document and ID to test
- Prohibit extraneous items in center and provide
all equipment (e.g., pencil, calculator) - Consider whether a break is necessary and plan
for break protocols - Use proper room setup with enough proctors in
each room - Have emergency evacuation protocols in place
- Report test day candidate irregular behavior
immediately and in detail
55P P Considerations
- You need to transmit relevant data to scoring
- Problem reports
- Do you have a standard protocol for proctors to
report reporting problems, including
pre-identified descriptive categories? - Does each report include relevant data for
scoring needs? - Can they be easily sorted by category?
56P P Considerations
- You need to have contingency plans (or a policy)
for no-shows and partial takers - Will you extend the one-day window for a no-show
and, if so, how will you handle the fact that
there is just one form that others have seen?
Will you have a proctor on standby and a site
available? - For partial takers because of illness or other
disruption, how will you handle the fact that the
examinee saw and took part of it already? -
57Specific CBT Considerations
- Why CBT for the Player Certification Exam
delivered over a two-week window? - Consistent testing conditions (test center
design, equipment, staff) - Security (its their business to hire, train
proctors, provide secure testing environment,
protect the integrity of the exam) - Able to handle longer windows
- Reporting (standard forms of reports and
categories) - What are some drawbacks?
- Risks of computer problems / equipment failure
- Logistics complexity (managing various exam
programs scheduling needs, learning new sets of
rules, systems integration) - Travel to sites / site availability (closure,
storms)
58CBT Hypothetical Case
- Several of your candidates have scheduled to
test on the last day of the test window at a
local test center. Two days before their test
day, the test center is found to have a serious
structural problem, requiring immediate repair
work. Your candidates will not be able to test
at the center as scheduled.
59CBT Hypothetical Case
- Candidate U (for ultimate player, of course)
is in the middle of testing when his computer
crashes. He hails the proctor, who comes to his
workstation to reboot the computer. While at the
workstation, the proctor notices that Candidate U
has a wallet on his workstation and a cell phone
jutting from his pocket. (Neither is permitted
in the testing room.) Candidate U is convinced
that he lost some of his responses because of the
crash.
60CBT Practices to Promote Success
- For Administration Success, Prepare the Vendor
- Communicate specific exam program needs
- Provide necessary program candidate information
promptly (includes test accommodations granted) - Make a dry run with new exam material to be sure
it works as you expect in a test center - Work to resolve issues that arise before the test
date - Establish regular communication to iron out
issues (technical and routine operational calls)
61CBT Practices (cont.)
- Establish clear guidelines re the extent of the
vendors authority to act vis-à-vis responding to
candidate queries and other communications,
candidate eligibility periods and authorizations
to test, etc. - Establish an emergency contact protocol between
the exam program and the vendor to handle
last-minute problems - Establish a specific timeline for return of test
center reports and candidate exam outcomes
62CBT Practices (cont.)
- and Prepare Candidates
- Communicate what the candidate needs to bring
(and should not bring) to the test center - Communicate how to schedule the exam
- Tell the candidate whom to contact in the event
of a problem before, during or after test day - Encourage prompt candidate action and establish
deadlines for candidate activity
63CBT Practices (cont.)
- and Plan for Security
- Work closely with the vendor
- Establish written procedures for reporting
incidents - Have internal and cooperative procedures for
investigation of reported test administration
problems
64CBT Considerations
- You need to transmit relevant data to scoring
- Test center reports
- Does each test center report include relevant
data for scoring needs? - Can they be easily, automatically sorted by
category?
65Using Experience to Inform Future Administrations
- Turning lessons learned into preventative
measures for the future - Documenting (update manuals, include in vendor
discussions, update best practices) - Training (staff, proctors, vendor, candidates )
66Speaker Contact Information
- Elizabeth D. Azari, JD
- Associate Vice President, Examinee Support
Services - National Board of Medical Examiners
- 3750 Market Street
- Philadelphia, PA 19104
- (215) 590-9500
- eazari_at_nbme.org
67Quality Assurance Steps for the Ultimate in
Scoring and Score Reporting
- Preventing Problems
- Data capture
- Answer sheets scanned correctly
- Electronic responses read and unscrambled
correctly - Data entry verified
- Key validation
- Item analysis
- Review by content experts
68- Preventing Problems
- Raw scoring
- Correct key was applied
- Scores were calculated in two independent systems
69- Preventing Problems
- Equating
- Appropriate equating link
- Equating item text and/or pictures did not change
- Equating based on correct group
- Equating procedure done correctly
- Equating produces reasonable results
70- Preventing Problems
- Scaling/Norming
- Scaling based on correct group
- Correct scaling constants applied
- Results of scaling look reasonable and make sense
- Norms based on appropriate group
- Norms look reasonable and make sense
71- Preventing Problems
- Standard Setting
- Standard setting based on defensible procedure
- Appropriate exam material used
- Appropriate panelists are selected for standard
setting study - Panelists are trained appropriately
72- Standard Setting (Continued)
- Standard setting data are entered and verified
- Standard setting data are analyzed correctly
- Appropriate decision making group selects
standard - Standard is applied correctly
73- Preventing Problems
- Score Reporting
- Examinee biographic information is correct
- Scores are correct and belong to examinee
- Examination name, date, year, etc. are correct
74- Score Reporting (Continued)
- Content area titles are correct
- Interpretive text is accurate and clear
- Materials are packed carefully
- Materials are shipped via a reliable and
traceable method
75- Mitigating Problems
- Have QC checks in place at key points in the
scoring process to catch errors if they occur - Establish a culture where staff feel comfortable
coming forward if they identify a problem - Be transparent with stakeholders if a scoring
issue is discovered after scores have been
released
76Hypothetical Case 1
- During the key validation process for the Player
Certification Exam, content experts decide that
seven items should be deleted from scoring and
two items should be re-keyed. During processing,
the correct items are deleted from scoring.
However, one of the items that was supposed to be
re-keyed to A was re-keyed to B instead, which
was also incorrect.
77Hypothetical Case 1 (Continued)
- Processing proceeds and scores are released to
the players. During an item review meeting the
following month, the Player Certification Exam
Committee reviews the item and says that the key
should be A. The committee chair remembers that
this item was reviewed during the key validation
process and was supposed to be changed to A. The
committee is visibly upset and wants to know how
this could have happened.
78Scoring Practices to Promote Success
- Have a quality control process that includes a
check that the correct items are deleted and the
correct keys are in place for re-keyed items - Review item analysis again following key
validation
79Hypothetical Case 2
- Scores from the 2007 Observer Certification Exam
are equated to the 2006 form using a
representative set of items that appear on both
forms. An error is made during the equating
process that results in scores for the 2007
candidates that are approximately .25 standard
deviation units higher than they should be.
80Hypothetical Case 2 (Continued)
- The error is not detected during processing.
When the Observer Exam Committee is reviewing
summary data and passing rates prior to the
release of scores, they express concern that the
passing rate is considerably higher this year
than in the past. They ask that you review the
increase in performance further before scores are
released.
81Scoring Practices to Promote Success
- Have a quality control step in place to review
the equating process to make sure it was done
correctly - Compare current candidate performance to prior
performance (mean scores, passing rates) to see
if it is similar - Compare current performance to previous
performance using other methods
82Hypothetical Case 3
- A content-based standard setting exercise
(modified Angoff procedure) is conducted for the
Observer Certification Exam. The panel of judges
consists of ten players, four team owners, and
one observer. Participants discuss the minimally
proficient observer for five minutes and then
work on their own to provide ratings for a sample
of 15 items from the Observer Certification Exam.
83Hypothetical Case 3 (Continued)
- The standard setting data are entered, verified,
and summarized in a report that is sent to the
Observer Exam Committee. The recommended
standard from the study is much higher than the
current standard and would result in a fail rate
of 90 for the observers who took the current
exam.
84Hypothetical Case 3 (Continued)
- The Observer Exam Committee expresses concern
about the results of the exercise and the process
that was used to set the standard. You agree to
conduct another study for no additional charge to
correct the shortcomings of the current study.
85Scoring Practices to Promote Success
- Panelists are selected to be representative of
the field - Stakeholders approve the panelists who will
participate before the study - Panelists are given extensive training to ensure
that they understand the task - An appropriate sample of items is selected for
review (N, content representative)
86Hypothetical Case 4
- Score reports are given online for the Rules of
the Game Self Assessment. When programming the
score report template, the wrong variable name
was inserted for the total test score field. As
a result, the percent correct score for the
Penalties content category was reported in the
total test field instead of the total test
percent correct score.
87Hypothetical Case 4 (Continued)
- The error was discovered when an examinee called
and asked how he could have gotten a 100 on the
total test when he didnt get 100 on all of the
content areas.
88Scoring Practices to Promote Success
- Have a quality control step in place to check
online score reports for accuracy before allowing
immediate score reporting - Verify that scores were calculated correctly
- Verify that scores appear in correct fields
89Using Experience to Inform Future Activities
- Have routine quality control checks built into
processing - Document procedures as well as exam-specific
information - Develop staff so they can spot things that look
unusual
90Speaker Contact Information
- Carol A. Morrison, PhD
- Associate Vice President, Scoring Services
- National Board of Medical Examiners
- 3750 Market Street
- Philadelphia, PA 19104
- (215) 590-9745
- cmorrison_at_nbme.org