Title: Shifting from paper and pencil testing to CBA
1Shifting from paper and pencil testing to CBA
2Some background
- Iceland inherited a Danish education system 100
years ago. - Lots of changes have been done in the 20th
century leading to a number of differences. - Traditionally the state has been responsible for
primary and secondary education. - Today the responsibility lies with the local
communities. - The state continues to publish a central
curriculum and does quality evaluations of
various kinds. - The development has been going faster and faster
during the last 50 years, and especially the last
25. - Changes have even been too fast, to little
experience gathered before the system is changed
once again.
3Background
- Teacher training was changed 25 years ago, and is
now being changed again (5-year MA programme). - Laws about primary and secondary education were
changed in 1974, and now the whole system is
being revised again.
4Quality and accountability
- National testing
- For the 4th and 7th grades (10 and 12).
- For the 10th grade (15), high stakes test
admittance to upper secondary. - Schools do self-evaluating according to
guidelines from the ministry. - Regular external evaluations by outside experts.
- National coordinated tests are done in order to
ensure that all students are evaluated in the
same way, with comparable results. An equity
thing.
5National testing
- Has been going on for the last 80 years. Started
in 1929 (an educator was in the US and studied
with E. Thorndike). - Various methods and tests have been used, often
changed every 10-20 years. - Current system started in 1993, for the first
time with the extensive use of psychometrics and
other modern methods (ETI). - 10th grade tested in Icelandic (reading), math,
english, danish, social studies and science. - In 1996 there were implemented new tests for both
4th and 7th grade - reading/writing and
mathematics. - National testing in upper secondary started in
2003, stopped in 2006 (short sad story).
6Purpose of the tests?
- Check that the goals and subgoals in the
curriculum are reached. (or how many reach them). - Give teachers directions for each students
continued studies. - Give students, parents and schools information
about each students status and development. - Gather information about schools, how do they do
in each subject, compared to other schools. - And in the 10th grade.
- Collect information for upper secondary schools
about each students standing. - Information about
- The whole system-regions-municipalities-schools-cl
asses-students. - The requirements are multiple use of the same
tests-leads to psychometric problems.
7In practice
- Tests are written at the ETI, cooperation with
groups of teachers and experts, piloted in
various ways. Psychometrically sound. - Administered at the same time for all students in
the whole country. - Students in the 10th grade can choose how many
tests they take, i.e. if they want to enter upper
secondary school. In practice they take as many
as they can manage. Almost everyone takes
icelandic, math and english. - Entirely new tests every year (everything is
published afterwards).
8In practice
- Every student gets a grade 3-4 weeks after the
tests- Various reports for the ministry, the
schools, classes/teachers etc. - All tests are centrally evaluated at the ETI.
- In order to ensure scorer reliability and
coordinated grades. - Tests for the 4th and 7th grade in October each
year. - Tests for 10th grade in May each year.
- Students from 8th og 9th grade can take the 10th
grade tests, but must then wait for a year to
enter upper secondary school. - This is going to be changed now(last week).
9How do we change the system?
- The whole school system has recently been more
and more leaning towards individualized teaching
and learning. - Nobody knows really how to do this, with the same
size classes and the teacher-student ratio
relatively unchanged. - The national tests have both positive and
negative aspects. - The ETI has proposed a change of the current
testing system over to computerized adaptive
testing. - This proposal has been enthusiastically received
by everyone. (i.e. until we begin to talk about
financing). - We have been actively promoting the idea now for
two years, so now everyone is askingWhen do we
start?
10What do we gain-CAT/CBA
- Shorter testing
- Adaptive tests better test-student fit
- Quick results
- Better measurement of the extremes-both ends
- More enjoyable for students?
- Less stress and press (not everyone at the same
time) - Testing with modern technology which everybody is
using all the time. - More rich items and materiels (multimedia)
- Reuse of items.
- Cheaper and quicker coding
- Probably better information about the
schoolsystem
11The downside
- Very expensive startup, especially CAT
- Some competencies cannot be tested.
- High requirements for technical and psychometric
competence. - More personell for testing.
- Takes time.
12An adaptive test
13Higher measurement precision
14But the central questions are!
- What are we really measuring?
- How does it change when we go from PP tests over
to CBA or even all the way to CAT? - Are the competencies the same?
- Do some students get an unfair advantage/disadvant
age in a CBA/CAT test? (e.g. gender) - Many effects have to be evaluated!
- Long list of effects!
- And then there are the technical problems
- Nonstandardized computers-software.
- Congested internet (CAT and Internet)
- Etc.
- So what do we do?
15Some CBAS results
- Three countries (Iceland, Denmark and Korea)
participated in the Computerized Assessment of
Science-CBAS which was conducted alongside the
PISA 2006. - Here you have some results (with the permission
of the OECD and the countries involved). - The CBAS was as standardized as possible,
everyone used the same laptops, software etc.,
trained test-administrators. No internet
connection, restricted environment (no connection
outside). - So what happened with performance
16Some CBAS results
- All data was rescaled for the three participating
countries. - Mean scores in science, both CBAS scores and PP
scores are not comparable to the scores which
will be presented on the 4th of december. - Everything was put on a scale with a mean of 500
and a SD of 100, for these three countries. - So not only the means are different, the whole
scale is different, i.e. not the same distance
between scores. - Mention this just so it is clear that I am not
talking about PISA 2006 results.
17Many different types of items
18Some restrictions
- Reduced reading load.
- Probable big effect from just this as it is known
that the correlation between scores in reading
and science are very high, above 0,60 and even
higher. - So students with lower reading abilities should
do relatively better (Boys should do better) - So how is it possible to unravel this, i.e. the
reading effect from the computer effect?
19Average scores (plausible values)
20So what does this mean?
- Gender differences change.
- In Iceland (and to some extent in Korea) they not
only change in magnitude they change in
direction! - What happened to the boys?
- What happened to the girls?
- What did they experience?
- Lets look at some of that
21Correlations between CBAS and PP test.
The Icelandic discrepancy?
22I found the computer test enjoyable
Boys generally enjoy the CBA more,-another
Icelandic Exception?
23I found the PP test enjoyable
Nobody enjoys the PP test or what? - but there
are differences
24If you had to take a two hour test which option
would you choose?
The third Icelandic exception?
25Where are we then?
- The PP and the CBAS most probably are measuring
the same competency (high correlation) - Gender differences are different and in some
cases very different. - Students are ambivalent towards CBA, and more so
in some places. - Must be analyzed more- according to
- Type of item
- Content
- Presentation mode
- And more
26And finally
- How can we implement a CBA system knowing what we
do both from the CBAS and from other tests (not
mentioned here). - The only viable course of action is to implement
a pilot project which - Compares PP and CBA (in each country)
- Compares sequential CBA and CAT
- Compares PP and CAT
- And does all of this in the same population, with
randomized assignment to test mode and a balanced
design where students take two or more modes of
tests in the same subject. - Only after analyzing the results from such a test
will it be possible to answer the questions posed
here earlier. - Are we measuring the same things, and if not what
is different? - And all the other questions.