Title: The Fresno Test of Evidence Based Medicine
1The Fresno Test of Evidence Based Medicine
- Kathleen Ramos, PhD Sean Schafer, MD
- University of California-San Francisco,
- Fresno Medical Education Program
- Department of Family Practice
- Susan Tracz, PhD
- California State University-Fresno
- Department of Education
2Our Need
- Training grant to develop EBM curriculum included
obligation to evaluate the effectiveness of the
curriculum. - We didnt find any satisfactory existing measures
- Limitation of existing measures
- Generally measure only attitudes
- Or only critical appraisal skills
- None appear to be standardized
3The Test We Wanted
- Comprehensive
- Objective
- Performance-based
- Demonstrated reliability and validity
4Methods
- Wrote clinical scenarios as premise
- Wrote test questions about each of the skills
necessary for evidence based practice - Distributed test to colleagues for face validity
- Administered test to our (n43) residents and
faculty, and a group (n53) of volunteer experts - Graded and re-graded with various revisions of
the grading rubrics - Same statistics calculated on small validation
data set
5The TestShort Essay Questions
- Focused clinical question (PICO)
- Sources Advantages and disadvantages (People,
Text, Pre-appraised, Original, Internet) - Study design (Identify and Justify design)
- Medline search strategy (Terms, Tags, Delimiters)
- Determine relevance (POEM, Subjects, Feasibility)
- Determine validity (Sampling issues, Internal
Validity) - Determine effect (Magnitude, Statistical
Significance)
6Grading Rubrics for Short Essay Questions
- Essay questions allow assessment of a higher
level of learning than recognition - But grading can be difficult and subjective
- Rubrics standardize the grading of essay answers,
make it easier and more objective
7Sample RubricFormulating a Clinical Question
Patient Intervention/Exposure Comparison Outcome
Excellent (3 points) gt 1 appropriate descriptor Specific intervention Specific intervention Objective, patient-oriented
Strong (2 points) 1 appropriate descriptor Type of intervention Type of intervention Surrogate marker
Limited (1 point) Descriptor lacking specificity Intervention Comparison Non-specific outcome
Not Evident (0 points) None of above None of above None of above None of above
8The TestCalculation Questions
- Sensitivity
- Specificity
- Positive Predictive Value
- Negative Predictive Value
- Likelihood Ratio
- Absolute Risk Reduction
- Relative Risk Reduction
- Number Needed to Treat
9Whats a Passing Score? A Recommendation
- Essay Questions
- Excellent, Strong Limited categories
extrapolated from specific point values - Cut-point for passing could be mid-range in the
Strong category - Calculations/Fill-in-Blank
- Determine acceptable criterion
- Mean score of experts?
10How did they do?The short essay questions
11How did they do?The calculations
Sens Spec PPV NPV LR ARR RRR NNT
Expert correct 84 76 71 66 58 87 76 87
Novice correct 60 33 40 35 15 33 10 30
Chi Square .018 lt.001 .006 .007 lt.001 lt.001 lt.001 lt.001
12Inter-rater ReliabilityCorrelations between
coders
- PICO r .98
- Sources r .95
- Study Design r .89
- Searching r .90
- Relevance r .76
- Validity r .85
- Effect r .91
- Total of Short Essay Questions r .98
13Item Analysis
- Item Difficulty
- range from difficult (24 correct design for
diagnosis question) to moderate (73 passing
response regarding internal validity) - Item Discrimination
- range from .41 (moderate) to .86 (strong)
abilities of individual items to discriminate
between upper and lower quartiles - Item Total Correlations
- range from .47 to .75
14Construct Validity
- Method of Group Separation
- compare novices to experts
- As a group, the experts scored better on all
questions and total score - Novice mean 96 (out of 212) points
- Expert mean 148
- 15 of 17 items are statistically significantly
different
15What next?
- Equivalent forms reliability needs to be
assessed, because scenarios and examples need to
change often - Predictive Validity
- Do scores improve after implementation of EBM
curriculum? (our data reflect improvement) - Do scores predict medical knowledge?
- Do scores predict practice?