Title: tussen kunst en techniek
1Redesigning the computer assisted language exams
for federal government employees a didactic,
methodological and technological challenge
2- 0. Context
- Didactic challenge
- Methological challenge
- Technological challenge
- Results future developments
30. Context
- SELOR Belgian governmental selection bureau
- in a trilingual country, multilingualism is
crucial for the state servants - -gt language testing is crucial part of
assessment selection procedures - Since the 90s
- ATLAS electronic language testing system for
Dutch and French. - Yearly thousands of candidates
- Wide variety of governmental jobs (policeman as
well as diplomate)
4- ATLAS state-of-the art at its creation, but
needed a complete overhaul in three domains - 1. didactical component
- strongly focused on language knowledge
- 9 modules of unequal weight
- of which 5 on lexicon grammar,
- 3 on receptive skills
- 1 on communicative skills
- But lexical part represents 12.000 of the
20.000 items - each test contained 80/90
knowledge items out of a total
of 114/115 items or around 70/78 of an entire
test - weak integration of skills-based view on
language competence (Common European Framework
of Reference didnt exist at that time) - item development not based on the definition of
an explicit test construct
5- 2. Methodological component
- Level structure without psychometric
underpinning - 4 levels from 4 (basic, comparable to primary
school level) - to 1 (advanced, comparable to an academic
level) - No evaluation of the reliability and validity of
the ATLAS tests - no systematic use of all items in the database
- no analysis of all test results item data
- no monitoring based on item scores, difficulty
indices, - internal consistency measures, etc.
-
-
6- 3. Technological component
- ATLAS operates on the intranet within SELOR
- closed, non-adaptable and non-updatable system
-
- Off-line accompanying training module on cd-rom
-
- No itembanking
- No integration into selor admin
7- Constraints
- 1. Legal constraints
- e.g. vocabulary grammar should be tested
separately - e.g. 4 levels (1 to 4) should be distinguished
-
- 2. Practical constraints
- e.g. each examen takes maximum 120 minutes
- e.g. SELOR wanted us to reuse the existing items
as much as possible - e.g. whole operation should be realised within
one year
8- Research team partners
- Didactic component
- French Piet Desmet (K.U.Leuven Campus
Kortrijk) - Dutch Guy Deville (FUNDP Namur)
- Methodological component
- Sara Gysen (K.U.Leuven)
- Technological component
- Bert Wylin (Televic Education)
- Global coordination
- Piet Desmet Sara Gysen
9I. Didactic challenge
- I.I. Construct definition
- I.2. Item revision and new item writing
- 1.3. Metadata
101.1. Construct definition
- From 9 modules to 4 components
- 2 knowledge-oriented vocabulary grammar
- 2 skills-oriented listening reading
- Separate knowledge-oriented modules because of
legal practical constraints - Law imposes separate modules for vocabulary and
grammar - Selor wants maximal reuse of existing items
- Maximal focus on the new skills-oriented modules
in terms of new development - Goal-directed and contextualised tasks, closely
linked to the action-oriented - view in the CEFR
11- Skills oriented components more complex
construct definition - Listening (component 3) as well as reading
(component 4) are structured on the basis of the
same parameters - text type contextual aspects
- personal texts with visual support
- public texts without visual support
- official texts
- cognitive activity
- global understanding
- selecting relevant information (one input text)
- comparing/linking information (two input texts)
12- Knowledge oriented components
- Component 1 Vocabulary
- ordered in terms of morphological category
- and semantic field
- Component 2 Grammar
- word forms (morphology)
- word forms in context (morpho-syntaxis)
- sentence structure (syntax)
- restructuring of modules 1 to 5 of ATLAS
13- 4 domains
- G General professional domain (default)
- P Police domain
- M Medical domain
- L Legal domain
- Testing framework
-
-
14I.2. Item Revision New item writing
- Revision of existing items
- uniformity (e.g. same type MCQ only one gap in
all cloze exercises) - transparancy for test candidates (e.g.
dichotomous rating for all items) - restructuring (e.g. single noun items in module 6
tagged as vocabulary items)
15- New item writing
- New items were developed for the new categories
within the listening and reading component - as authentic as possible real audiofragments,
scanned articles, letters etc same look and
feel, same distribution of images as in real-life
tasks - A spectrum of different item types, not only
multiple choice, in order to test as direct as
possible the different tasks specified in the
construct - Standard choice of technical item type for each
part of the construct -
16(No Transcript)
171.3. Metadata
- Item tags and features of 3 types
- Content metadata (automatic and manual)
- Psychometric metadata (cf. 2)
- Dynamic metadata (evolving through use of system)
- Important for itembanking
- Control of item selection in examen versions
- Monitoring of item quality (cf. psychometric
data and dynamic metadata)
18- Metadata for each item of the database
- Content metadata
- Identification number
- Question format
- Excluded when other item present
- Linked to other item
- Date of creation
- Date of adaptation
- Adapted for candidates with special needs
- Rating
- Popularity of item
- In training environment
- Inactive
- Assets (multimedia)
- Length audio/video
- Length text
- Example item
- Psychometric metadata
192. Methodological challenge
- 2.I. Screening and calibration of existing
database - 2.2. Development of an IRT-based item database
- 2.3. Standard setting selection rules
202.1. Screening of the existing database
- Screening based on test data from 1995-2000
- Elimination of items based on
- their p-value (percentage correct answers
provided by the test candidates) - lower than 0.10 (extremely difficult)
- higher than 0.95 (extremely easy)
- and their occurence in test versions
- at least 100 times in test version
- Results 218 French items and 849 Dutch items
are eliminated - 70 of the items belong to difficulty level
2, - almost 30 belong to level 3 (levels 1 4
are not concerned) -
-
212.2. Development of an IRT-based database
- Submitting the items to a psychometric analysis
based on Item Respons Theory (IRT) which allows
to place items on a scale - which orders items in function of their
intrinsic difficulty level (logit value) -
- which orders examinees in terms of their
ability -
-
-
22- Example of a measurement scale in IRT model
- Probabilistic model
- Person B has great potential to answer items c
b correctly but far less to answer items d or f
correctly, which will normally be solved
correctly by person A. - The chance that person B will be able to answer
item e is almost non existent -
A
B
23- Eight different scales one per target language
and per component -
- E.g. Logit distribution of candidates and items
(French) for component 4 - Reading
242.3. Standard setting and selection rules
- 1. Test management by candidate him/herself
- Candidate decides when to start up the next
component (but fixed time limit of 120min for the
whole exam) - Possibility to review within component
- Overview screen and breinbreker tag
- No restriction on playing audio and video input,
but limited time allocation is mentioned in
instruction (1 play time for one question 2 play
times for two questions) - Possibility of not answering an item
- But restrictions
- Time limit fixed per component (time interval on
screen) - Fixed order of components C1 ? C2 ? C3 ?C4
- 2. Resuming possible in case of problems
- 3. Equal share of components in overall score
25Exam version
263. Technological challenge
- 3.I. E-testing Edumatic as basis of the whole
environment - 3.2. Exam preparatory learning environment
- 3.3. Itembanking Selor Test Administration
System
273.1. E-testing Edumatic as basis of the whole
environment
- Edumatic is an authoring system for exercises and
tests - for both online and offline assessments(online
server based, with export button to offline) - xml-based data in a flash user interface(ims-qti
and scorm compliant) - 20 question types
- supports multimedia in all question types
- hints
28- extensive feedback options
- general/reusable feedback
- specific feedback
- correct answer feedback
- error specific feedback
- exercise and test mode
- full tracking and logging
- featureserrors and stumbling blocks
- separation of data and interface
- multilingual interface
293.2. Exam preparatory learning environment
- Customization of Edumatic-environment
- Selor-skin on the Eduma-tic platform
- Single sign-on(SSO login once, get acces to
multiple applications) - Construct definition required the development of
new item types in the electronic environment such
as case bundling different questions (relating
to different language activities) regarding one
input text or regarding the combination of two
different input texts
30(No Transcript)
31(No Transcript)
32(No Transcript)
33- Exam
- Secure browser
- Resume
- Strict time allocation
- Sequencing
34- Online learning environmentfree at
http//www.selor.be - Preparatory learning environment
- Login for free via My Selor
- Components 1 2 (vocabulary grammar)
- access to the entire database
- Components 3 4 (listening reading)
- only model items
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
403.3. Itembanking Selor Test Administration
System
- Database structure adapted to the new construct
- Basic categories are branches of the itembank
tree - Target language
- Component
- Domain
- Level
- Categories
- Subcategories
- Additional features of the items are included as
item metatags - (cf. 1.3. metadata)
41(No Transcript)
42(No Transcript)
43- Scenarios for the different examens allow for an
appropriate selection of items within the
database
44(No Transcript)
45 46 474. Results further developments
- ATLAS -gt S-ALTO
- S-ALTO summersault
- taking the test can cause a jump in ones
career -
- Selor Authentic Language Testing Online
- Fully operational environment
- Exam version in use since October 2007
- Online learning environment online since
September 2007
48- Further developments some first ideas
- content more items in components 3 4
- contextualisation of components 1 2
- more feedback in online environment
- methodology establish user profiles
- calibration of new items
- technology reporting service, enrollment tool,
new question types
49this presentation is available _at_
http//www.itec-research.eu/presentations/worldcal
l
contactpiet.desmet_at_kuleuven-kortrijk.bebert.wyli
n_at_kuleuven-kortrijk.beb.wylin_at_televic-education.c
om
Edumatic authoring tool _at_
http//www.edumatic.be