Title: Computer Based Testing of Medical Knowledge.
1Computer Based Testing of Medical Knowledge. Tom
Mitchell, Nicola Aldridge Intelligent Assessment
Technologies Ltd. Walter Williamson Faculty of
Medicine, University of Dundee. Peter
Broomhead Brunel University.
2Overview.
- Project carried out in Medical School at Dundee
University in autumn 2002 / spring 2003. - Computerisation of an existing paper-based test
of medical knowledge. - Test comprised of 270 short-answer free-text
items. - Marking of the paper-based tests consumed
unsustainable amount of faculty resources. - Computer system developed and rolled-out for 2003
tests.
3Background
- The GMC defines core knowledge which is
essential for a medical student. - The Medical School at Dundee has implemented this
by teaching to 12 learning outcomes. - Assessment of the course involves written and
practical tests. - The GMC review team rated Dundee Excellent, but
also recommended a new assessment to improve
student feedback and course audit - A Progress Test.
4Progress Tests.
- What is a Progress Test ?
- A comprehensive assessment of medical knowledge.
- Inform students about their year-on-year progress
against learning outcomes. - Highlight gaps in their knowledge, and their
performance relative to their peers. - At Dundee the Progress Test is administered
annually throughout the five years of the
undergraduate programme each year group sits
same test.
5(No Transcript)
6The Dundee Progress Test.
- Piloted in April / June 2001.
- Test designed by Professor M. Friedman.
- MCQ discounted
- Testing recall of knowledge, not recognition.
- A doctor does not get five choices.
- Many US schools moving to open-ended format.
- The first test comprised 250 short-answer
free-text items. Longer term aim is to build up a
bank of items.
7Progress Test Items (1).
- Items are short-answer free-text text.
- What simple clinical test can distinguish between
solid and cystic scrotal swellings ? - Accept Transillumination, shining light through
swelling. - Allow Light goes through cyst.
- Dont accept on own shine light at/on/behind
8Progress Test Items (2).
Free-text text responses
1 transillumination of the area with a light
source in a darkened room, cystic lesions will
transilluminate but solid ones wont. 1 shine a
light through it - cystic lesions allow light
through, solid lesions don't 1 Illumination - can
light pass throught the swelling - cystic if it
does 1 shine a torch behind the swelling. cystic
swelling will transilluminate 1 using a torch to
shine a light through the swelling 1
Tranillumination of the scrotum with a torch 1
trans illumination of the scrotal swelling 0
using a pen torch to illuminate the swelling 0
illumination of the swelling using a light source
9Paper-Based Testing (1).
- 150 students per academic year, 750 - 800
students in total. - 3 hour test, 250 270 short-answer free-text
items. - Admin Print, collation, etc. of different 30
page test booklets (items in different order),
test admin, script storage etc. - Marking 800 scripts, 750 x 240 180,000 items
to mark data entry, rapid feedback required. - Plus, moderation of marking guidelines required.
10Paper-Based Testing (2).
- Moderation
- To achieve consistent marking, the marking
guidelines must be moderated in light of real
student responses. - Approach at Dundee was to use Year 5 marking
process to moderate marking guidelines. - Group of senior academics mark Year 5, the
resulting marking guidelines are used to mark all
other years by a team of 6 markers.
11Paper-Based Testing (3).
- Problems with the paper-test.
- Moderation. Script-by-script marking is tedious
and inefficient way to moderate marking
guidelines, and required significant time element
from senior academics. - Marking. 160 scripts per year group, a team of
6 markers can together mark around 15 scripts per
hour. 30 man-days just to mark scripts. - Admin. Data entry for 180,000 marks.
- Feedback. Due to the intensity of work required,
timely feedback was not achieved. - Conclusion Paper-based progress test was
unsustainable.
12A Computerised Pilot (1).
- Computerised pilot ran in autumn 2002
- To assess the reaction of the students to a
computerised progress test - To examine the accuracy of computerised marking
for progress test items - To contribute towards defining the specification
of a full system. - The pilot system used IATs free-text marking
engine, AutoMark (see 2002 CAA paper).
13Computerised Marking
- How do we mark free-text responses by computer ?
- IATs Marking Engine does not operate on raw text,
but on the output of a sentence analyser.
14A Computerised Mark Scheme
- How do we represent the mark scheme ?
- Each mark scheme answer is represented as a
template. - Each template specifies one particular form of
acceptable or unacceptable answer.
15Computerised Mark Schemes.
16A Computerised Pilot (2).
- The Pilot.
- Computerised mark schemes were developed for 25
items used in previous years progress tests. - An online test comprising the items was delivered
to approximately 30 students in November /
December 2002. - Student responses were computer marked, and the
marking accuracy analysed. - The error in computerised marking was 1.
- Student feedback from the pilot was positive.
17A Computerised Progress Test.
18Test Delivery.
19Computerised Marking (1).
20Computerised Marking (2).
21Computer-Assisted Moderation (1).
22Computer-Assisted Moderation (2).
23After Moderation.
- Subsequent to moderation of marking guidelines.
- Where necessary, computerised mark schemes were
re-worked. - Any outstanding tests were re-marked, and the
results output. - The re-worked computerised mark schemes are now
considered moderated, and can be used to mark
future tests with a high level of confidence.
24Conclusions on Moderation.
- The academics view
- Being able to view all student responses to an
item together is a major advantage. - The process of moderation via computer is
actually a positive experience for academics
could lead to better item writing. - On-screen moderation was quicker than expected,
responses could be scanned quickly, and most
items required little input - Computer-assisted moderation is a significant
improvement over the previous ordeal.
25Accuracy of Marking (1).
- Data from Year 5 Moderation.
- 5.8 of marks changed by moderators.
- Most (4.2) due to omissions in original marking
guidelines or problems in item wording. - Only 1.6 due to errors in computerised marking.
- After Re-Working the Comp. Mark Schemes.
- Agreement between moderated marks and
computerised marking 99.4 for Year 5. - 0.6 error due to system errors in marking
engine.
26Accuracy of Marking (2).
Responses from 10 Year 2 and Year 3 students
selected at random, and hand marked.
Number of Students Affected Marks Gained / Lost by Hand Marking
5 0
4 1 (0.37)
1 2 (0.74)
Mean error from the sample was 0.22, highest
error 0.74
27Accuracy of Marking (3).
- As a further check, 4 Year 5 students chosen.
- Two who had unexpectedly over-performed, two who
had unexpectedly under-performed. - Responses hand marked.
- No discrepancies between human and computer
marking encountered.
28Human vs. Computerised Marking.
- Hand-marking the progress test is onerous.
- 800 scripts, 270 items per script, a team of 6
markers can mark approx 15 scripts per hour. - The error in hand marking has been measured at
between 5 and 5.5 (two studies). - This is comparable with unmoderated computerised
marking (5.8). - Moderated computerised marking is significantly
better - of the order of 1.
29Conclusions (1).
- Advantages of the computerised system include
- Moderation less painful, and more productive.
- After sample-based moderation, re-marking takes
hours, not weeks of work. - For this test, marking accuracy is actually
improved. - Production of reports automated, data entry not
required. - Moderated items can be re-used in future tests.
- Flexibility of test-taking is greatly increased.
30Conclusions (2).
- The model of computerised marking and
computer-assisted moderation can benefit CAA. - Enables use of educationally valued free-text
items. - Credibility-gap addressed marking can be
checked and moderated on a sample of the cohort. - Enables banks of moderated free-text items to be
assembled. - Moderation process benefits item-writing
better assessment, not just better CAA.
31Future Work.
- Project
- Complete testing of remaining 150 students.
- Add new items for next years tests.
- Technology
- Enable item writers / academics to create, test,
and modify computerised mark schemes. - Integrate marking / moderation functionality with
QuestionMark Perception.
32Computer Based Testing of Medical
Knowledge. www.IntelligentAssessment.com