Comparing Student Performances in Old - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

Comparing Student Performances in Old

Description:

Seven questions were drawn from Epidemiology (Statistics and Research Design) ... the new curriculum is provided in Professor Paul Lai's PowerPoint Presentation ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 78
Provided by: per5152
Category:

less

Transcript and Presenter's Notes

Title: Comparing Student Performances in Old


1
Comparing Student Performances in Old New
Curricula
  • 4th 5th Year Summative Examinations

2
Comparing Performances in Fourth Year Community
Medicine Finals
  • Students in New (2005) vs Old Curriculum (2003
    2004)

3
Methodology
  • Seven questions were drawn from Epidemiology
    (Statistics and Research Design), Environmental
    Health and Occupational Health. These seven
    questions were in the form of short answer
    questions Three different instructors were
    involved
  • Comparisons were made between the year 2004-05
    (cohort in new curriculum) and the years 2003-04,
    2002-03, and 2000-01 (cohorts taught in old
    curriculum). The short answer and essay questions
    were marked by the same personnel.

4
Results
  • Including subtotals, there were 9 possible
    comparisons. Among these, two were statistically
    significant, with 1 favoring the old curriculum
    and 1 favoring the new curriculum.
  • Given that these results were based on modules
    with sample sizes of 40 and 38 students,
    statistical power was low. Therefore, we looked
    at the directional differences also.
  • There were four directional differences in favor
    of old curriculum and 5 directional differences
    in favor of new curriculum.

5
Summary Table
6
Comparing Performances in Fourth Year Family
Medicine Finals
  • Students in New (2006) vs Old Curriculum (2004)

7
Methodology
  • Three OSCE stations were selected Cluster
    Headache, Clinical Problem and Hand Temor.
  • Comparisons were made between the years 2005-06
    (cohort in new curriculum) and 2003-04 (cohort in
    old curriculum). It is unknown if these compared
    OSCE stations were marked by the same individuals.

8
Results
  • There were three OSCE questions for comparison
    between the years 2003-04 (old curriculum) and
    2005-06 (new curriculum). Among these three
    tests, one was statistically significant (and in
    favor of the old curriculum).
  • Given the results were based on modules with
    sample sizes of 43 and 37 students statistical
    power was poor. Therefore, we also looked at the
    directional differences.
  • There were two directional differences in favor
    of new curriculum and one directional difference
    in favor of old curriculum.

9
Summary Table
10
Comparing Performances in Fourth Year Obstetrics
and Gynaecology Finals
  • Students in New (2006) vs Old Curriculum (2002)

11
Methodology
  • Two written questions were selected - one short
    answer question and one essay question.
  • Comparisons were made between the years 2005-06
    (cohort in new curriculum) and 2001-02 (cohort in
    old curriculum). It is unknown if questions were
    marked by the same personnel.

12
Results
  • Including the subtotals, there were three
    comparisons between the years 2001-02 (old
    curriculum) and 2005-06 (new curriculum). All
    were statistically significant and in favor of
    the old curriculum.
  • Sample sizes of 146 and 150 students were
    involved, the statistical power was adequate for
    detecting medium effects.

13
Summary Table
14
Comparing Performances in Fourth Year
Paediatrics Finals
  • Students in New (2006) vs Old Curriculum (2002)

15
Methodology
  • Four questions were selected dealing with
    Convulsion, Downs Syndrome and Hydration.
  • These questions were in the form of short answer
    and essay questions.
  • The comparisons were made between the years
    2005-06 (cohort in new curriculum) and 2001-02
    (cohort in old curriculum).
  • Three compared questions were marked by the same
    personnel, one question by different personnel.

16
Results
  • Including the subtotal were five possible
    comparisons. Among these five tests, all were
    statistically significant (i.e., three in favor
    of new curriculum and two favor of old
    curriculum).
  • Given the results were based on sample sizes of
    150 and 146 students, the statistical power was
    adequate for detecting medium effects.

17
Summary Table
18
Paediatric Summary Findings
  • 5 significant differences
  • 3 favour new curriculum
  • 2 favour Old curriculum

19
Comparing Performances in Fourth Year Psychiatry
Finals
  • Students in New (2005) vs Old Curriculum (2003
    2004)

20
Methodology
  • The examinations were in the form of MCQ and
    written (short answer and essay questions).
  • The comparisons were made between the year
    2004-05 (cohort in new curriculum) and the years
    2003-04, and 2002-03 (cohort in old curriculum).
    It is unknown if the written questions were
    marked by the same personnel.

21
Results
  • There were total of 10 possible comparisons.
    Among these, eight were statistically significant
    (two favoring old curriculum and 6 favoring new
    curriculum).
  • Given that the results were based on sample sizes
    of 40 and 38 students, the statistical power was
    low. Therefore, we looked at the directional
    differences.
  • There were eight directional differences in favor
    of new curriculum and two directional differences
    in favor of old curriculum.

22
Summary Table
23
Summary Tablefor4th Year Summative Examinations
24
Summarized Findings
25
Comparing Performances in 5th Yr Medicine
  • Students in New (2006) vs Old Curriculum
    (2001-2005)
  • Results for the above were reportted in
    Curriculum Retreat 2006 and are provided here in
    hardcopy for convenience only (i.e., not
    discussed in this retreat). However, additional
    information pertaining to Medicines and
    Surgerys Final Exams in 2007 (the second cohort
    to graduate under the new curriculum is provided
    in Professor Paul Lais PowerPoint
    Presentation(included)

26
Methodology
  • Final MCQ Exam in Medicine had 150 items
  • Seventy-five (75) items were never used before
  • Seventy-five (75) MCQs had been used previously
  • - 15 items had been used in 2001
  • a different group of 15 items were used in 2002
  • a further different group of 16 items were used
    in 2003
  • a further different group of 14 items were used
    in 2004
  • a further different group of 15 items were used
    in 2005
  • All old items had reasonable psychometric
    properties (in terms of their difficulty level
    discrimination power) when administered on
    cohorts in the old curriculum (i.e., during
    2001-2005)

27
Methodology
  • In 2005 the final MCQ examination also had 150
    items
  • Seventy-five (75) never used before
  • Seventy-three (73) used once before in 2001
    2004 (in sets of 15 or 13 different items in each
    year) none of these were used in 2006 final
  • The pattern of differences between cohorts within
    old curriculum used to
  • evaluate size of differences between cohorts
    from new old curricula and
  • estimate if such differences are likely due to
    changes in curriculum

2 additional items were originally included,
but omitted in these analyses as the items were
modified at the time of the second administration
28
Additional Measures
  • Final examinations in 2005 2006 also included
    three OSCE stations
  • 1 Hx station
  • 2 Px stations (cardiovascular neurological
    respiratory abdomen)
  • focus of these assessments were the same in both
    yrs, although assessed patients were not the
    same
  • Final examinations in same years also included
    Short Notes, a form of modified essays
  • questions used in 2005 2006 were not the same

29
Limitations
  • 1. All MCQ data (2001-2006) comparative
    analyses are based on marks with no penalty
    scoring applied (did not have corresponding
    penalty applied data)
  • - reported comparisons (differences) are
    accurate only if size of penalty (effect) was
    equivalent across cohorts and subgroups
  • - note that Medicine decided its pass/fails
    assigned marks based on penalty scoring (altho
    Faculty policy uses no penalty)
  • Estimating impact of new curriculum in terms of
    clinical skills (as measured by OSCE) is
    confounded by use of different patients
  • Comparisons of 2005-06 scores based on short
    notes is as or more likely to reflect differences
    in the questions inherent difficulty level than
    differences in the cohorts

30
MCQ Results
  • Comparing groups based on
  • Bottom 27, Middle, Top 27 overall classes in
    2001-2005 (cohorts from old curriculum) in
    2006 (cohort from new curriculum)

31
Average Performance in 5th Year Medicine
Cohorts New Curriculum (2006) vs Old Curriculum
(2001)
Performance Groups
32
Average Performance in 5th Year Medicine (MCQ)
Cohorts New Curriculum (2006) vs Old Curriculum
(2002)
Performance Groups
33
Average Performance in 5th Year Medicine (MCQ)
Cohorts New Curriculum (2006) vs Old Curriculum
(2003)
Performance Groups
34
Average Performance in 5th Year Medicine (MCQ)
Cohorts New Curriculum (2006) vs Old Curriculum
(2004)
Performance Groups
35
Average Performance in 5th Year Medicine (MCQ)
Cohorts New Curriculum (2006) vs Old Curriculum
(2005)
Performance Groups
36
Average Performance in 5th Year Medicine (MCQ)
Total Group Performances by Years
New Curriculum (2006) vs Old Curriculum
(2001-2005)
37
Average Performance in 5th Year Medicine (MCQ)
Top 27 Group Performances by Years
New Curriculum (2006) vs Old Curriculum
(2001-2005)
38
Average Performance in 5th Year Medicine (MCQ)
Middle Group Performances by Years
New Curriculum (2006) vs Old Curriculum
(2001-2005)
39
Average Performance in 5th Year Medicine (MCQ)
Bottom 27 Group Performances by Years
New Curriculum (2006) vs Old Curriculum
(2001-2005)
40
Was Something Unique about the 2003 Cohort?
  • Why did the 2003 cohort have better performances
    (particularly in its average group)?
  • did the 2003 cohort spend more time studying for
    exams given that they had less opportunity to
    be on the wards because of SARS?

41
Evaluating the Differences
  • Differences among cohorts within old curriculum
    (2001-2005) are fluctuations most likely due to
    effects of
  • - cohorts (sampling error)
  • - teaching (emphasis or quality) and/or
  • - reliability (measurement error)
  • Size of these differences represents a yardstick
    for
  • - estimating size of difference between cohorts
    in new and old curriculum
  • - evaluating if the differences are due to the
    above same influences or also an effect of
    curriculum change
  • Analysis of differences among cohorts within old
    curriculum is provided in the Appendix

42
Comparative Variations Across Cohorts 2001 2006
(based on use of same items)
  • Average difference in performance among cohorts
    within old curriculum (across 5 years by each
    performance level) is
  • 4.9 range 0.4 to 10.3 for all possible
    groupings (see below)
  • 4.5 range 3.1 to 6.1 for the total groups
    (P1-99)
  • 7.3 range 6.1 to 10.3 for the top
    groups (P83 99)
  • 4.8 range 0.4 to 8.2 for the middle
    groups (P27 83)
  • 3.1 range 0.9 to 5.6 for the bottom
    groups (P1-27)
  • Average differences in performance between
    cohorts in new old curriculum (across 6 years
    by each performance level) are
  • - 3.5 range 0.4 to 8.8 for all
    possible groupings (see below)
  • 3.1 range 0.4 to 8.2 for the total
    groups (P1-99)
  • 4.0 range 1.5 to 8.2 for the top
    groups (P83 99)
  • 2.8 range 0.8 to 8.8 for the middle
    groups (P27 83)
  • 2.8 range 0.9 to 8.0 for the bottom
    groups (P1-27)

C O N T R O L E X P E R I M E N T A L
43
Overview of Findings for MCQ Scores
  • For 5 out of 5 compared years, the average of
    total group differences between 2006 vs 2001- 05
    was less than corresponding variations in the old
    curriculum
  • For 15 out of 20 possible comparisons1, the
    2005-2006 differences were less than the
    corresponding 2005-2001 differences
  • for 3 of the five times in which the
    experimental variation was larger than in the
    corresponding control period, the year 2003 was
    involved
  • Conclusion there is no evidence that differences
    between cohorts in old and new curriculum is
    attributable to the new curriculum
  • - this conclusion is repeatedly verified by size
    and direction of the results reported in the
    attached Appendix

1 between the various possible cohorts between
new old curriculum matched to corresponding
performance level cohorts within old curriculum
44
Caution A Trend was Observed in the MCQ Scores
  • With rare exceptions, the cohort in the new
    curriculum was never better than cohorts in old
    curriculum
  • - although mean differences were not
    statistically significant, the above trend
    test was statistically significant, and
  • - apparently this trend was observable to
    examiners in medicine
  • Shouldnt this trend be expected?
  • - reduced curriculum to core in terms of subject
    content in the basic medical sciences
  • - literature shows faculty takes time to adjust
    to a new curriculum
  • - it may be surprising how well the new cohort
    did on material that was not necessarily
    covered in same depth and by faculty who
    were adjusting to new emphases

45
Caution (contd)
  • A parallel trend might be present in the passing
    rates of repeated items used in 2006 and in
    2001-2005
  • - see next slide for plotted results

46
Passing Rate on Repeated Items Between Old New
Cohorts One-half of the Items Answered
Correctly
P R O P O R T I O N P A S S I N G
1.007 0.834 9.489
0.142 4.348
P 0.345 0.420 0.003
0.782 0.043
47
Parallel Trend Might be Present in the Passing
Rates
  • 2006 passing rate is less in all five compared
    years in two of these years, the differences
    are statistical significant
  • for one (2003), the difference is probably
    augmented by an anomaly (SARS), but 2005 had no
    such additional confounding variable
  • average passing rate is 5.4 less in new cohort
    range 0.9 to 9.8
  • passing rate in 2006 on newly developed items
    was 80.4 but without any comparative group to
    evaluate this rate, one cannot determine if item
    difficulty, teaching and/or student quality were
    the underlying reasons
  • To evaluate the above, need to examine
    comparisons of the differences in passing rates
    among cohorts within old curriculum
  • - i.e., comparing 2005 to 2001- 2004, one of 4
    is statistically significant
    among the comparisons, there is an average
    difference of 5.5 range 3.6 to
    9.9
  • - see following slide (for details see
    added slides in Appendix)

48
Passing Rate on Repeated Items Among Cohorts in
Old Curriculum One-half of the Items Answered
Correctly
P R O P O R T I O N P A S S I N G
?2 0.694 5.301
1.290 1.722 p 0.460
0.028 0.268
0.222
49
Equivalent Size Range of Differences in Passing
Rates Among Cohorts in Old Curriculum
  • The average difference in passing rates among
    cohorts in old curriculum is
  • the same as the average difference between old
    new curriculum
  • the variation in these differences is also about
    the same
  • However, 2006 is never better (just not less than
    the amount of difference that can occur among
    cohorts).
  • if this negative trend is real (i.e., repeatable
    over time) what hypotheses may be explanatory?

50
If Negative Trend is Real what are Reasonable
Hypotheses
  • Are some bottom end students less pushed
    (threaten, challenged and/or stimulated) by the
    instructional design or by the curriculum design
    embedded in the new curriculum?
  • - Instructional design students expected to
    develop self-learning skills
  • - Curriculum design integrated, system-based
    focusing on core content clinical
    skills

51
If Negative Trend is Real what are Reasonable
Hypotheses (contd)
  • A trend in more failures (if real) appears less
    likely to be attributable to the overall
    curriculum design
  • that is, if depth or breadth of coverage was the
    real cause, the average performances of at least
    the bottom (if not also the middle and top
    performing) groups would have changed in the new
    curriculum (our initial analyses shows that this
    did not occur)
  • however, could it be that the modified
    instructional design is more discriminatory
    (i.e., detecting those in the very bottom of the
    class distribution who are least likely to be
    able to (and apparently did not) develop
    sufficiently their independent learning skills)?
  • However, the above is speculative since its based
    on limited data.
  • until more data are available, the relative
    effects of instructional curriculum design
    cannot be accurately determined, and only if
    at least one has had a systematic detectable
    impact on student learning.

52
2005- 06 Comparative Performances(based on OSCE
Stations Short Notes)
  • Keeping in mind the previously noted confounding
    effects - of using different patients in the
    OSCE stations, and - the larger confounding
    effect of using different questions in the
    short notes,
  • the following slide depicts the relative
    performance levels of the 2005 and 2006 cohorts

53
OSCE Station Short Note Scores () for 2005
2006
54
Comparative Performances on Repeated
Non-repeated MCQs Items Short Note Questions
  • For interest only, the following slide also
    includes the relative performance levels of the
    2005 and 2006 cohorts on
  • - the repeated and non-repeated MCQ items

55
MCQ Scores () for Repeated Non-repeated Items
2005 2006
56
Interpreting OSCE, Short Notes Non-repeated MCQ
Comparisons between 2005-06
  • Performances in OSCE stations (Hx, Px Overall
    Assessment skill)
  • - no statistically significant differences
    detected due to curriculum changes
  • - in absolute terms, no detectable educationally
    important differences due to curriculum changes
  • - given greater emphasis to assessment skill in
    new curriculum this result is disappointing
  • Performances on Short Notes
  • - no statistically or educationally significant
    differences due to curriculum changes
  • Performances on non-repeated MCQ items
  • - no statistically or educationally significant
    differences due to curriculum changes

57
  • Appendix
  • The following slides are
  • differences among cohorts within old curriculum
  • comparative plots of the difference values for
    2005 2006

58
Average Performance in 5th Year Medicine
Cohorts Old Curriculum (2005 vs 2001)
Performance Groups
59
Average Performance in 5th Year Medicine (MCQ)
Cohorts Old Curriculum (2005 vs 2002)
Performance Groups
60
Average Performance in 5th Year Medicine (MCQ)
Cohorts Old Curriculum (2005 vs 2004)
Performance Groups
61
Average Performance in 5th Year Medicine (MCQ)
Cohorts Old Curriculum (2005 vs 2003)
Performance Groups
62
Average Performance in 5th Year Medicine (MCQ)
Total Group Performances by Years
Old Curriculum (2005 vs 2001- 2004)
63
Average Performance in 5th Year Medicine (MCQ)
Top 27 Group Performances by Years
Old Curriculum (2005 vs 2001- 2004)
64
Average Performance in 5th Year Medicine (MCQ)
Middle Group Performances by Years
Old Curriculum (2005 vs 2001- 2004)
65
Average Performance in 5th Year Medicine (MCQ)
Bottom 27 Group Performances by Years
Old Curriculum (2005 vs 2001- 2004)
66
Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and year 2001

Performance Groups
67
Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and year 2002

Performance Groups
68
Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and year 2003

Performance Groups
69
Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and year 2004

Performance Groups
70
Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and years 2001- 2004
Performance Groups
71
Average Differences () Between New (2006) and
Old (2005) Curriculum byAverage Differences ()
Between year 2005 and years 2001, 2002 and 2004
Performance Groups with SARS yr (2003) Removed
72
Analyses Requested by Prof Sung
  • Chi-square tests on proportions passing repeated
    items between cohorts in old and new curriculum

73
Chi-Square Test Year 2006 with Year 2001
?2 1.007 Prob (2-tailed) 0.345
74
Chi-Square Test Year 2006 with Year 2002
?2 0.834 Prob (2-tailed) 0.42
75
Chi-Square Test Year 2006 with Year 2003
?2 9.489 Prob (2-tailed) 0.003
76
Chi-Square Test Year 2006 with Year 2004
?2 0.142 Prob (2-tailed) 0.782
77
Chi-Square Test Year 2006 with Year 2005
?2 4.348 Prob (2-tailed) 0.043
78
  • Chi-square tests on proportions passing repeated
    items among cohorts in the old curriculum

79
Chi-Square Test Year 2005 with Year 2001
?2 0.694 Prob (2-tailed) 0.460
80
Chi-Square Test Year 2005 with Year 2002
?2 5.301 Prob (2-tailed) 0.028
81
Chi-Square Test Year 2005 with Year 2003
?2 1.290 Prob (2-tailed) 0.268
82
Chi-Square Test Year 2005 with Year 2004
?2 1.722 Prob (2-tailed) 0.222
83
  • Report on the Results in
  • Final Year Surgery
  • (2005 and 2006)
  • Paul Lai
  • Surgery, CUHK

84
Presentation covers
  • The examination format of surgery final
    examination in 2005 and 2006
  • Comparison of the scores that students got in the
    past two years
  • Some observations on those weak students who
    performed badly in medicine final in 2006
  • Some observation on those bright students who
    performed well in medicine final in 2006
  • Moving into the next year (coming July)

85
The examination format of surgery final
examination in 2005 and 2006
86
Final Surgery Exam 2006
  • Written papers
  • - paper 1 (MCQs) 4 hrs
  • carries 200 marks
  • test of knowledge
  • paper 2 (R-type MCQs MEQs) 3 hrs
  • R-type 80 marks
  • MEQs 120 marks
  • total marks 400 marks
  • OSCE 2.5 hrs
  • total 21 stations and 20 of them carrying marks
  • 10 marks for each station
  • total marks 200 marks
  • heavily skill-based
  • Stringent control of observer rating thro
    briefing, model answers and global rating to
    achieve uniformity of judgement on students
    performance

It is a robust assessment on the competence of
the students !
87
Standard setting for written paper and OSCE for
2005 and 2006
  • Written paper
  • Using mean 2SD as the pass/fail line
  • Also used global rating scale to determine the
    pass marks for the MEQs
  • OSCE
  • Using mean 2SD as the pass/fail line
  • Also taking account into the global rating scale
    (i.e. using pass/fail/borderline impression
    marking to determine the pass mark for individual
    stations)
  • Overall pass/fail determination
  • If a student failed in both written and OSCE
    straight failure without a pull-up viva
  • If a student failed either the written or OSCE,
    he/she would be invited for an observed clinical
    pull-up viva where student would be examined by 2
    pairs of examiners in total of one hour on a pool
    of real surgical and orthopaedic patients

88
Overall grades class 2006
  • grade A/B 8 students
  • grade B 18 students
  • (got B/B in both written and OSCE or B/C where it
    was OSCE that score B grade)
  • grade C 116 students
  • (got C/C in both written and OSCE or C/B where it
    was written that score B grade)
  • grade C/D pull-up viva 9 students
  • grade D/D straight fail

89
Comparison of the scores students got in the two
years
Examination of the wreckage 2005 and 2006
90
Written paper 1 (MCQ) score distribution
91
Written paper 2 (MEQ R-type) score distribution
92
Total written score distribution
93
Total OSCE score distribution
94
2005 /2006 comparison of marks

MCQ (paper 1) 2006 2005
MEQ R-type (paper 2) 2006 score out of 100
2005
Written total (paper 1 paper 2) 2006 score out
of 100 2005
OSCE (clinical) 2006 score out of 100 2005
Class means in both written and clinical were
different by less than 3 (class 2006 doing
slightly better)
95
PAPER 1, 200 A-TYPE MCQS
70
60
2006
2005
50
2005
2006
2006
2005
40
TOP 27
BOTTOM 27
MIDDLE GROUP
Paper 1 results, by groups, mean 1SE
mean 1SD cut score 1999-203
96
CORRELATION OF ITEMS 2005 vs. 2006
r2 0.78
Middle groups from Old and New curricula, n 71
candidates in both years
97
  • In conclusion, there is no objective evidence
    from the surgery final examinations to suggest
    the performance of the class 2006 (from new
    system-based curriculum) was inferior to the
    class 2005

98
Some observations on those weak students who
performed badly in medicine final in 2006
99
List of students who had straight fail in Medicine
100
List of students invited for borderline viva in
Medicine
101
Some observation on those bright students who
performed well in medicine final in 2006
102
List of students invited for distinction viva in
Medicine
103
PRE-TEST SURVEY, ALL, n208(/223)
Agree
Agree
Undecided
Disagree
Disagree
Textbooks -0.067
FACS -1.667
Ward Work 1.228
Lectures 0.031
Web Based -0.307
Library
Small Grp. 0.782
104
Worrying signs?
  • Module end assessment by observed long case and
    clinical viva has no predictive value on those
    who performed badly at the final examination
  • Students were bothered by the marks of assessment
    and examination
  • The concept of examination drive learning did
    not work as good as expected in our students
  • No major behavioral changes like going to the
    wards more often was observed

105
Modification for the Better
  • More user friendly module timetable throughout
    the final year
  • Intensive surgery course
  • Seminars, workshops, practical skills training,
    case studies at the start of the general surgery
    module
  • Tighter enforcement of ward attachment
  • Better incorporation into the team
  • Change of module assessment format to fish out
    the weak students
  • Introduction of short cases using real patients
    to the final surgery examination
Write a Comment
User Comments (0)
About PowerShow.com