Title: Comparing Student Performances in Old
1Comparing Student Performances in Old New
Curricula
- 4th 5th Year Summative Examinations
2Comparing Performances in Fourth Year Community
Medicine Finals
- Students in New (2005) vs Old Curriculum (2003
2004)
3Methodology
- Seven questions were drawn from Epidemiology
(Statistics and Research Design), Environmental
Health and Occupational Health. These seven
questions were in the form of short answer
questions Three different instructors were
involved - Comparisons were made between the year 2004-05
(cohort in new curriculum) and the years 2003-04,
2002-03, and 2000-01 (cohorts taught in old
curriculum). The short answer and essay questions
were marked by the same personnel.
4Results
- Including subtotals, there were 9 possible
comparisons. Among these, two were statistically
significant, with 1 favoring the old curriculum
and 1 favoring the new curriculum. - Given that these results were based on modules
with sample sizes of 40 and 38 students,
statistical power was low. Therefore, we looked
at the directional differences also. - There were four directional differences in favor
of old curriculum and 5 directional differences
in favor of new curriculum.
5Summary Table
6Comparing Performances in Fourth Year Family
Medicine Finals
- Students in New (2006) vs Old Curriculum (2004)
7Methodology
- Three OSCE stations were selected Cluster
Headache, Clinical Problem and Hand Temor. - Comparisons were made between the years 2005-06
(cohort in new curriculum) and 2003-04 (cohort in
old curriculum). It is unknown if these compared
OSCE stations were marked by the same individuals.
8Results
- There were three OSCE questions for comparison
between the years 2003-04 (old curriculum) and
2005-06 (new curriculum). Among these three
tests, one was statistically significant (and in
favor of the old curriculum). - Given the results were based on modules with
sample sizes of 43 and 37 students statistical
power was poor. Therefore, we also looked at the
directional differences. - There were two directional differences in favor
of new curriculum and one directional difference
in favor of old curriculum.
9Summary Table
10Comparing Performances in Fourth Year Obstetrics
and Gynaecology Finals
- Students in New (2006) vs Old Curriculum (2002)
11Methodology
- Two written questions were selected - one short
answer question and one essay question. - Comparisons were made between the years 2005-06
(cohort in new curriculum) and 2001-02 (cohort in
old curriculum). It is unknown if questions were
marked by the same personnel.
12Results
- Including the subtotals, there were three
comparisons between the years 2001-02 (old
curriculum) and 2005-06 (new curriculum). All
were statistically significant and in favor of
the old curriculum. - Sample sizes of 146 and 150 students were
involved, the statistical power was adequate for
detecting medium effects.
13Summary Table
14Comparing Performances in Fourth Year
Paediatrics Finals
- Students in New (2006) vs Old Curriculum (2002)
15Methodology
- Four questions were selected dealing with
Convulsion, Downs Syndrome and Hydration. - These questions were in the form of short answer
and essay questions. - The comparisons were made between the years
2005-06 (cohort in new curriculum) and 2001-02
(cohort in old curriculum). - Three compared questions were marked by the same
personnel, one question by different personnel.
16Results
- Including the subtotal were five possible
comparisons. Among these five tests, all were
statistically significant (i.e., three in favor
of new curriculum and two favor of old
curriculum). - Given the results were based on sample sizes of
150 and 146 students, the statistical power was
adequate for detecting medium effects.
17Summary Table
18Paediatric Summary Findings
- 5 significant differences
- 3 favour new curriculum
- 2 favour Old curriculum
19Comparing Performances in Fourth Year Psychiatry
Finals
- Students in New (2005) vs Old Curriculum (2003
2004)
20Methodology
- The examinations were in the form of MCQ and
written (short answer and essay questions). - The comparisons were made between the year
2004-05 (cohort in new curriculum) and the years
2003-04, and 2002-03 (cohort in old curriculum).
It is unknown if the written questions were
marked by the same personnel.
21Results
- There were total of 10 possible comparisons.
Among these, eight were statistically significant
(two favoring old curriculum and 6 favoring new
curriculum). - Given that the results were based on sample sizes
of 40 and 38 students, the statistical power was
low. Therefore, we looked at the directional
differences. - There were eight directional differences in favor
of new curriculum and two directional differences
in favor of old curriculum.
22Summary Table
23Summary Tablefor4th Year Summative Examinations
24Summarized Findings
25Comparing Performances in 5th Yr Medicine
- Students in New (2006) vs Old Curriculum
(2001-2005) - Results for the above were reportted in
Curriculum Retreat 2006 and are provided here in
hardcopy for convenience only (i.e., not
discussed in this retreat). However, additional
information pertaining to Medicines and
Surgerys Final Exams in 2007 (the second cohort
to graduate under the new curriculum is provided
in Professor Paul Lais PowerPoint
Presentation(included)
26Methodology
- Final MCQ Exam in Medicine had 150 items
- Seventy-five (75) items were never used before
- Seventy-five (75) MCQs had been used previously
- - 15 items had been used in 2001
- a different group of 15 items were used in 2002
- a further different group of 16 items were used
in 2003 - a further different group of 14 items were used
in 2004 - a further different group of 15 items were used
in 2005 - All old items had reasonable psychometric
properties (in terms of their difficulty level
discrimination power) when administered on
cohorts in the old curriculum (i.e., during
2001-2005)
27Methodology
- In 2005 the final MCQ examination also had 150
items - Seventy-five (75) never used before
- Seventy-three (73) used once before in 2001
2004 (in sets of 15 or 13 different items in each
year) none of these were used in 2006 final - The pattern of differences between cohorts within
old curriculum used to - evaluate size of differences between cohorts
from new old curricula and - estimate if such differences are likely due to
changes in curriculum
2 additional items were originally included,
but omitted in these analyses as the items were
modified at the time of the second administration
28Additional Measures
- Final examinations in 2005 2006 also included
three OSCE stations - 1 Hx station
- 2 Px stations (cardiovascular neurological
respiratory abdomen) - focus of these assessments were the same in both
yrs, although assessed patients were not the
same - Final examinations in same years also included
Short Notes, a form of modified essays - questions used in 2005 2006 were not the same
29Limitations
- 1. All MCQ data (2001-2006) comparative
analyses are based on marks with no penalty
scoring applied (did not have corresponding
penalty applied data) - - reported comparisons (differences) are
accurate only if size of penalty (effect) was
equivalent across cohorts and subgroups - - note that Medicine decided its pass/fails
assigned marks based on penalty scoring (altho
Faculty policy uses no penalty) - Estimating impact of new curriculum in terms of
clinical skills (as measured by OSCE) is
confounded by use of different patients - Comparisons of 2005-06 scores based on short
notes is as or more likely to reflect differences
in the questions inherent difficulty level than
differences in the cohorts
30MCQ Results
- Comparing groups based on
- Bottom 27, Middle, Top 27 overall classes in
2001-2005 (cohorts from old curriculum) in
2006 (cohort from new curriculum)
31Average Performance in 5th Year Medicine
Cohorts New Curriculum (2006) vs Old Curriculum
(2001)
Performance Groups
32Average Performance in 5th Year Medicine (MCQ)
Cohorts New Curriculum (2006) vs Old Curriculum
(2002)
Performance Groups
33Average Performance in 5th Year Medicine (MCQ)
Cohorts New Curriculum (2006) vs Old Curriculum
(2003)
Performance Groups
34Average Performance in 5th Year Medicine (MCQ)
Cohorts New Curriculum (2006) vs Old Curriculum
(2004)
Performance Groups
35Average Performance in 5th Year Medicine (MCQ)
Cohorts New Curriculum (2006) vs Old Curriculum
(2005)
Performance Groups
36Average Performance in 5th Year Medicine (MCQ)
Total Group Performances by Years
New Curriculum (2006) vs Old Curriculum
(2001-2005)
37Average Performance in 5th Year Medicine (MCQ)
Top 27 Group Performances by Years
New Curriculum (2006) vs Old Curriculum
(2001-2005)
38Average Performance in 5th Year Medicine (MCQ)
Middle Group Performances by Years
New Curriculum (2006) vs Old Curriculum
(2001-2005)
39Average Performance in 5th Year Medicine (MCQ)
Bottom 27 Group Performances by Years
New Curriculum (2006) vs Old Curriculum
(2001-2005)
40Was Something Unique about the 2003 Cohort?
- Why did the 2003 cohort have better performances
(particularly in its average group)? - did the 2003 cohort spend more time studying for
exams given that they had less opportunity to
be on the wards because of SARS?
41Evaluating the Differences
- Differences among cohorts within old curriculum
(2001-2005) are fluctuations most likely due to
effects of - - cohorts (sampling error)
- - teaching (emphasis or quality) and/or
- - reliability (measurement error)
- Size of these differences represents a yardstick
for - - estimating size of difference between cohorts
in new and old curriculum - - evaluating if the differences are due to the
above same influences or also an effect of
curriculum change - Analysis of differences among cohorts within old
curriculum is provided in the Appendix
42Comparative Variations Across Cohorts 2001 2006
(based on use of same items)
- Average difference in performance among cohorts
within old curriculum (across 5 years by each
performance level) is - 4.9 range 0.4 to 10.3 for all possible
groupings (see below) - 4.5 range 3.1 to 6.1 for the total groups
(P1-99) - 7.3 range 6.1 to 10.3 for the top
groups (P83 99) - 4.8 range 0.4 to 8.2 for the middle
groups (P27 83) - 3.1 range 0.9 to 5.6 for the bottom
groups (P1-27) - Average differences in performance between
cohorts in new old curriculum (across 6 years
by each performance level) are - - 3.5 range 0.4 to 8.8 for all
possible groupings (see below) - 3.1 range 0.4 to 8.2 for the total
groups (P1-99) - 4.0 range 1.5 to 8.2 for the top
groups (P83 99) - 2.8 range 0.8 to 8.8 for the middle
groups (P27 83) - 2.8 range 0.9 to 8.0 for the bottom
groups (P1-27)
C O N T R O L E X P E R I M E N T A L
43Overview of Findings for MCQ Scores
- For 5 out of 5 compared years, the average of
total group differences between 2006 vs 2001- 05
was less than corresponding variations in the old
curriculum - For 15 out of 20 possible comparisons1, the
2005-2006 differences were less than the
corresponding 2005-2001 differences - for 3 of the five times in which the
experimental variation was larger than in the
corresponding control period, the year 2003 was
involved - Conclusion there is no evidence that differences
between cohorts in old and new curriculum is
attributable to the new curriculum - - this conclusion is repeatedly verified by size
and direction of the results reported in the
attached Appendix
1 between the various possible cohorts between
new old curriculum matched to corresponding
performance level cohorts within old curriculum
44Caution A Trend was Observed in the MCQ Scores
- With rare exceptions, the cohort in the new
curriculum was never better than cohorts in old
curriculum - - although mean differences were not
statistically significant, the above trend
test was statistically significant, and - - apparently this trend was observable to
examiners in medicine - Shouldnt this trend be expected?
- - reduced curriculum to core in terms of subject
content in the basic medical sciences - - literature shows faculty takes time to adjust
to a new curriculum - - it may be surprising how well the new cohort
did on material that was not necessarily
covered in same depth and by faculty who
were adjusting to new emphases
45Caution (contd)
- A parallel trend might be present in the passing
rates of repeated items used in 2006 and in
2001-2005 - - see next slide for plotted results
46Passing Rate on Repeated Items Between Old New
Cohorts One-half of the Items Answered
Correctly
P R O P O R T I O N P A S S I N G
1.007 0.834 9.489
0.142 4.348
P 0.345 0.420 0.003
0.782 0.043
47Parallel Trend Might be Present in the Passing
Rates
- 2006 passing rate is less in all five compared
years in two of these years, the differences
are statistical significant - for one (2003), the difference is probably
augmented by an anomaly (SARS), but 2005 had no
such additional confounding variable - average passing rate is 5.4 less in new cohort
range 0.9 to 9.8 - passing rate in 2006 on newly developed items
was 80.4 but without any comparative group to
evaluate this rate, one cannot determine if item
difficulty, teaching and/or student quality were
the underlying reasons - To evaluate the above, need to examine
comparisons of the differences in passing rates
among cohorts within old curriculum - - i.e., comparing 2005 to 2001- 2004, one of 4
is statistically significant
among the comparisons, there is an average
difference of 5.5 range 3.6 to
9.9 - - see following slide (for details see
added slides in Appendix)
48Passing Rate on Repeated Items Among Cohorts in
Old Curriculum One-half of the Items Answered
Correctly
P R O P O R T I O N P A S S I N G
?2 0.694 5.301
1.290 1.722 p 0.460
0.028 0.268
0.222
49Equivalent Size Range of Differences in Passing
Rates Among Cohorts in Old Curriculum
- The average difference in passing rates among
cohorts in old curriculum is - the same as the average difference between old
new curriculum - the variation in these differences is also about
the same -
- However, 2006 is never better (just not less than
the amount of difference that can occur among
cohorts). - if this negative trend is real (i.e., repeatable
over time) what hypotheses may be explanatory? -
50If Negative Trend is Real what are Reasonable
Hypotheses
- Are some bottom end students less pushed
(threaten, challenged and/or stimulated) by the
instructional design or by the curriculum design
embedded in the new curriculum? - - Instructional design students expected to
develop self-learning skills - - Curriculum design integrated, system-based
focusing on core content clinical
skills -
51If Negative Trend is Real what are Reasonable
Hypotheses (contd)
- A trend in more failures (if real) appears less
likely to be attributable to the overall
curriculum design - that is, if depth or breadth of coverage was the
real cause, the average performances of at least
the bottom (if not also the middle and top
performing) groups would have changed in the new
curriculum (our initial analyses shows that this
did not occur) - however, could it be that the modified
instructional design is more discriminatory
(i.e., detecting those in the very bottom of the
class distribution who are least likely to be
able to (and apparently did not) develop
sufficiently their independent learning skills)? -
- However, the above is speculative since its based
on limited data. - until more data are available, the relative
effects of instructional curriculum design
cannot be accurately determined, and only if
at least one has had a systematic detectable
impact on student learning.
522005- 06 Comparative Performances(based on OSCE
Stations Short Notes)
- Keeping in mind the previously noted confounding
effects - of using different patients in the
OSCE stations, and - the larger confounding
effect of using different questions in the
short notes, - the following slide depicts the relative
performance levels of the 2005 and 2006 cohorts -
-
53OSCE Station Short Note Scores () for 2005
2006
54Comparative Performances on Repeated
Non-repeated MCQs Items Short Note Questions
- For interest only, the following slide also
includes the relative performance levels of the
2005 and 2006 cohorts on - - the repeated and non-repeated MCQ items
55MCQ Scores () for Repeated Non-repeated Items
2005 2006
56Interpreting OSCE, Short Notes Non-repeated MCQ
Comparisons between 2005-06
- Performances in OSCE stations (Hx, Px Overall
Assessment skill) - - no statistically significant differences
detected due to curriculum changes - - in absolute terms, no detectable educationally
important differences due to curriculum changes - - given greater emphasis to assessment skill in
new curriculum this result is disappointing - Performances on Short Notes
- - no statistically or educationally significant
differences due to curriculum changes - Performances on non-repeated MCQ items
- - no statistically or educationally significant
differences due to curriculum changes
57- Appendix
- The following slides are
- differences among cohorts within old curriculum
- comparative plots of the difference values for
2005 2006
58Average Performance in 5th Year Medicine
Cohorts Old Curriculum (2005 vs 2001)
Performance Groups
59Average Performance in 5th Year Medicine (MCQ)
Cohorts Old Curriculum (2005 vs 2002)
Performance Groups
60Average Performance in 5th Year Medicine (MCQ)
Cohorts Old Curriculum (2005 vs 2004)
Performance Groups
61Average Performance in 5th Year Medicine (MCQ)
Cohorts Old Curriculum (2005 vs 2003)
Performance Groups
62Average Performance in 5th Year Medicine (MCQ)
Total Group Performances by Years
Old Curriculum (2005 vs 2001- 2004)
63Average Performance in 5th Year Medicine (MCQ)
Top 27 Group Performances by Years
Old Curriculum (2005 vs 2001- 2004)
64Average Performance in 5th Year Medicine (MCQ)
Middle Group Performances by Years
Old Curriculum (2005 vs 2001- 2004)
65Average Performance in 5th Year Medicine (MCQ)
Bottom 27 Group Performances by Years
Old Curriculum (2005 vs 2001- 2004)
66Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and year 2001
Performance Groups
67Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and year 2002
Performance Groups
68Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and year 2003
Performance Groups
69Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and year 2004
Performance Groups
70Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and years 2001- 2004
Performance Groups
71Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and years 2001, 2002 and 2004
Performance Groups with SARS yr (2003) Removed
72Analyses Requested by Prof Sung
- Chi-square tests on proportions passing repeated
items between cohorts in old and new curriculum
73Chi-Square Test Year 2006 with Year 2001
?2 1.007 Prob (2-tailed) 0.345
74Chi-Square Test Year 2006 with Year 2002
?2 0.834 Prob (2-tailed) 0.42
75Chi-Square Test Year 2006 with Year 2003
?2 9.489 Prob (2-tailed) 0.003
76Chi-Square Test Year 2006 with Year 2004
?2 0.142 Prob (2-tailed) 0.782
77Chi-Square Test Year 2006 with Year 2005
?2 4.348 Prob (2-tailed) 0.043
78- Chi-square tests on proportions passing repeated
items among cohorts in the old curriculum
79Chi-Square Test Year 2005 with Year 2001
?2 0.694 Prob (2-tailed) 0.460
80Chi-Square Test Year 2005 with Year 2002
?2 5.301 Prob (2-tailed) 0.028
81Chi-Square Test Year 2005 with Year 2003
?2 1.290 Prob (2-tailed) 0.268
82Chi-Square Test Year 2005 with Year 2004
?2 1.722 Prob (2-tailed) 0.222
83- Report on the Results in
- Final Year Surgery
- (2005 and 2006)
- Paul Lai
- Surgery, CUHK
84Presentation covers
- The examination format of surgery final
examination in 2005 and 2006 - Comparison of the scores that students got in the
past two years - Some observations on those weak students who
performed badly in medicine final in 2006 - Some observation on those bright students who
performed well in medicine final in 2006 - Moving into the next year (coming July)
85The examination format of surgery final
examination in 2005 and 2006
86Final Surgery Exam 2006
- Written papers
- - paper 1 (MCQs) 4 hrs
- carries 200 marks
- test of knowledge
- paper 2 (R-type MCQs MEQs) 3 hrs
- R-type 80 marks
- MEQs 120 marks
- total marks 400 marks
- OSCE 2.5 hrs
- total 21 stations and 20 of them carrying marks
- 10 marks for each station
- total marks 200 marks
- heavily skill-based
- Stringent control of observer rating thro
briefing, model answers and global rating to
achieve uniformity of judgement on students
performance
It is a robust assessment on the competence of
the students !
87Standard setting for written paper and OSCE for
2005 and 2006
- Written paper
- Using mean 2SD as the pass/fail line
- Also used global rating scale to determine the
pass marks for the MEQs - OSCE
- Using mean 2SD as the pass/fail line
- Also taking account into the global rating scale
(i.e. using pass/fail/borderline impression
marking to determine the pass mark for individual
stations) - Overall pass/fail determination
- If a student failed in both written and OSCE
straight failure without a pull-up viva - If a student failed either the written or OSCE,
he/she would be invited for an observed clinical
pull-up viva where student would be examined by 2
pairs of examiners in total of one hour on a pool
of real surgical and orthopaedic patients
88Overall grades class 2006
- grade A/B 8 students
- grade B 18 students
- (got B/B in both written and OSCE or B/C where it
was OSCE that score B grade) - grade C 116 students
- (got C/C in both written and OSCE or C/B where it
was written that score B grade) - grade C/D pull-up viva 9 students
- grade D/D straight fail
89Comparison of the scores students got in the two
years
Examination of the wreckage 2005 and 2006
90Written paper 1 (MCQ) score distribution
91Written paper 2 (MEQ R-type) score distribution
92Total written score distribution
93Total OSCE score distribution
94 2005 /2006 comparison of marks
MCQ (paper 1) 2006 2005
MEQ R-type (paper 2) 2006 score out of 100
2005
Written total (paper 1 paper 2) 2006 score out
of 100 2005
OSCE (clinical) 2006 score out of 100 2005
Class means in both written and clinical were
different by less than 3 (class 2006 doing
slightly better)
95PAPER 1, 200 A-TYPE MCQS
70
60
2006
2005
50
2005
2006
2006
2005
40
TOP 27
BOTTOM 27
MIDDLE GROUP
Paper 1 results, by groups, mean 1SE
mean 1SD cut score 1999-203
96CORRELATION OF ITEMS 2005 vs. 2006
r2 0.78
Middle groups from Old and New curricula, n 71
candidates in both years
97- In conclusion, there is no objective evidence
from the surgery final examinations to suggest
the performance of the class 2006 (from new
system-based curriculum) was inferior to the
class 2005
98Some observations on those weak students who
performed badly in medicine final in 2006
99List of students who had straight fail in Medicine
100List of students invited for borderline viva in
Medicine
101Some observation on those bright students who
performed well in medicine final in 2006
102List of students invited for distinction viva in
Medicine
103PRE-TEST SURVEY, ALL, n208(/223)
Agree
Agree
Undecided
Disagree
Disagree
Textbooks -0.067
FACS -1.667
Ward Work 1.228
Lectures 0.031
Web Based -0.307
Library
Small Grp. 0.782
104Worrying signs?
- Module end assessment by observed long case and
clinical viva has no predictive value on those
who performed badly at the final examination - Students were bothered by the marks of assessment
and examination - The concept of examination drive learning did
not work as good as expected in our students - No major behavioral changes like going to the
wards more often was observed
105Modification for the Better
- More user friendly module timetable throughout
the final year - Intensive surgery course
- Seminars, workshops, practical skills training,
case studies at the start of the general surgery
module - Tighter enforcement of ward attachment
- Better incorporation into the team
- Change of module assessment format to fish out
the weak students - Introduction of short cases using real patients
to the final surgery examination