Title: Sterling Examples of Computer Simulations
1Sterling Examples of Computer Simulations
OSCEs (Objective Structured Clinical
Examinations)
- Carol OByrne Jeffrey Kelley
- Richard Hawkins Sydney Smee
Presented at the 2005 CLEAR Annual
Conference September 15-17 Phoenix,
Arizona
2Session Format
- Introduction 25 years of Performance Assessment
- Presentations
- Richard Hawkins, National Board of Medical
Examiners - overview of a new national OSCE
- Jeff Kelley, Applied Measurement Professionals
development of a new real estate computer
simulation - Sydney Smee, Medical Council of Canada
- setting performance standards for a national
OSCE - Carol OByrne, Pharmacy Examining Board of Canada
scoring performance and reporting results to
candidates - for a national OSCE
- QA
3Session Goals
- Consider the role and importance of simulations
in a professional qualifying examination context - Explore development and large scale
implementation challenges - Observe how practice analysis results are
integrated with the implementation of a
simulation examination - Consider options for scoring, standard setting
and reporting to candidates - Consider means to enhance fairness and
consistency - Identify issues for further research and
development
4Defining Performance Assessment
- ...the assessment of the integration of two or
more learned capabilities - i.e., observing how a candidate performs a
physical examination (technical skill) is not
performance-based assessment unless findings from
the examination are used for purposes such as
generating a problem list or deciding on a
management strategy (cognitive skills) - (Mavis et al, 1996)
5Why Test Performance?
- To determine if individuals can do the job
- integrating knowledge, skills and abilities to
solve complex client and practice problems - meeting job-related performance standards
- To complement MC tests
- measuring important skills, abilities and
attitudes which are difficult to impossible to
measure through MCQs alone - reducing impact of factors, such as cuing,
logical elimination luck or chance that may
confound MC test results
6A 25 Year Spectrum of Performance Assessment
- Pot luck direct observation
- apprenticeship, internship, residency programs
- Oral and pencil-paper, short- or long-answer
questions - Hands-on job samples
- military, veterinary medicine, mechanics,
plumbers - Portfolios
- advanced practice, continuing competency
7Simulations
- Electronic
- architecture, aviation, respiratory care, real
estate, nursing, medicine, etc. - Objective Structured Clinical Examination (OSCE)
- medicine, pharmacy, physiotherapy,
chiropractic medicine, massage therapy and
including the legal profession, psychology, and
others
8Simulation Promotes Evidence-based Testing
- 1900 Wright brothers flight test
- Flew manned kite 200 feet in 20 seconds
-
- 1903 Wright brothers flight test
- Flew manned glider 852 feet in 59 seconds,
- 8 to 12 feet in the air!
- In between they built a wind tunnel
- to simulate flight under various wind direction
- and speed conditions, varying wing shapes,
- curvatures and aspect ratios
- to test critical calculations and glider lift
- to assess performance in important and
potentially risky situations without incurring
actual risk - Â
9Attitudes, Skills and Abilities tested through
Simulations
- Attitudes
- client centeredness
- alignment with ethical and professional values
and principles - Skills
- interpersonal and communications
- clinical, e.g. patient / client care
- technical
- Abilities to
- analyze and manage risk, exercise sound judgment
- gather, synthesize and critically evaluate
information - act systematically and adaptively, independently
and within teams - defend, evaluate and/or modify decisions/actions
taken - monitor outcomes and follow up appropriately
10Performance / Simulation Assessment Design
Elements
- Domain(s) of interest sampling plan
- Realistic context practice-related problems and
scenarios - Clear, measurable performance standards
- Stimuli and materials to elicit performance
- Administrative, observation and data collection
procedures - Assessment criteria that reflect standards
- Scoring rules that incorporate assessment
criteria - Cut scores/performance profiles reflecting
standards - Quality assurance processes
- Meaningful data summaries for reports to
candidates and others
11Score Variability and Reliability
- Multiple factors interact and influence scores
- differential and compensatory aptitudes of
candidates (knowledge, skills, abilities,
attitudes) - format, difficulty and number of tasks or
problems - consistency of presentation between candidates,
locations, occasions - complex scoring schemes (checklists, ratings,
weights) - rater consistency between candidates, locations,
occasions - Designs are often complex (not crossed)
- examinees nested within raters - within tasks
within sites, etc. - Problems and tasks are multidimensional
12Analyzing Performance Assessment Data
- Generalizability (G) studies to identify and
quantify sources of variation - Dependability (D) studies to determine how to
minimize the impact of error and optimize score
reliability - Heirarchical linear modeling (HLM) studies to
quantify and rank sources of variation in complex
nested designs
13Standard Setting
- What score or combination of scores (profile)
indicates that the candidate is able to meet
expected standards of performance, thereby
fulfilling the purpose(s) of the test? - What methods can be used to determine this
standard?
14Reporting Results to Candidates
- Pass-fail (classification)
- May also include
- Individual test score and passing score
- Sub-scores by objective(s) and/or other criteria
- Quantile standing among all candidates or among
those who failed - Group data - score ranges, means, standard
deviations) - Reliability and validity evidence (narrative,
indices and/or error estimates and their
interpretation) - Other
15Some Validity Questions
- Exactly what are we measuring with each
simulation? Does it support the test purpose? - To what extent is each candidate is presented
with the same or equivalent challenges? - How consistently are candidates performances
assessed no matter who or where the assessor is? - Are the outcomes similar to findings in other
comparable evaluations? - How ought we to inform report to candidates
about performance standards / expectations
their own performance strengths/gaps?
16Evaluation Goals
- Validity evidence
- Strong links from job analysis to interpretation
of test results - Simulation performance relates to performance in
training and other tests of similar capabilities - Reliable, generalizable scores and ratings
- Dependable pass-fail (classification) standards
- Feasibility and sustainability
- For program scale (number of candidates, sites,
etc.) - Economic, human, physical, technological
resources - Continuous evaluation and enhancement plan
17Wisdom Bytes
- Simulations should be as true to life as possible
(fidelity) - Simulations should test capabilities that cannot
be tested in more efficient formats - Simulation tests should focus on integration of
multiple capabilities rather than on a single
basic capability - The nature of each simulation/task should be
clear but candidates should be cued only as far
as is realistic in practice - Increasing the number of tasks contributes more
to the generalizability and dependability of
results than increasing the number of raters
18Expect the Unpredictable
- Candidate diversity
- Language
- Training
- Test format familiarity
- Accommodation requests
- Logistical challenges
- Technological glitches
- Personnel fatigue and/or attention gaps
- Site variations
- Security cracks
- Test content exposure in prep programs, study
materials in various languages