Title: Guidelines and standards for computerbased testing
1Guidelines and standards for computer-based
testing
- Dave Bartram
- ISPRA November 2007
2Outline
- The rise of computer-based assessment and the
impact of the Internet - Quality standards in the area of assessment
- Quality standards focusing on computer-based
assessment - Points for discussion
3The impact on testing
- The Internet has become the medium of choice for
work-related assessment and for much of licensing
and certification testing in most of the
Developed Countries. - While paper pencil tests are still a major part
of the market, Internet delivered tests are the
fastest growing sector. - Internet delivered tests offer lots of
advantages, but also raise some issues.
4Summary of trends
- The shift from PP and CBT administration to
online. - Merging and blurring of distinctions between test
delivery media Paper, PC online, PC offline,
PDA, TV, Cell Phone. - Access from any location ? remote administration.
Use of unproctored online administration and
development of remote proctoring solutions - Differentiation between content providers (test
designers, psychometricians) and delivery
providers (ASPs, workflow vendors etc) - Use of item banking and random test generation
for generating continually changing test content
. - Continual content updating.
- The growth of new specialisms - like data
forensics - The change in locus of control client side to
server-side - Increase in direct distribution to test taker
- The changing roles of test users and consumers of
test reports.
5Three interdependent aspects to standards
Process delivery standards The Assessment
Process Testing process/policy standards
Product standards The test(s) used Test quality
and psychometrics
Personnel standards The test user User
qualification
6 Some examples of international testing
guidelines and standards
- ITC
- Guidelines on Test Use (2000)
- Guidelines on Computer-based testing and testing
on the Internet (2005) - Guidelines on Test Adaptation (under revision)
- EFPA Standing Committee on Tests and Testing
- Common European criteria and process for test
reviews, including CBTI report evaluation. Test
reviewing and registration in UK, Norway and
Sweden - EFPA-EAWOP Working Group on European standards
for test user qualification - Test user certification e.g. in UK, Sweden, and
Norway - DIN 33430 and related product and user
certification procedures in Germany. - ISO PC230. Standard for Psychological Assessment
in Work and Organizational Settings under
development
7ISO9126 1991. Valenti et al 2002
- Valenti et al review use of ISO 9126 as basis for
CBA system evaluation - ISO9126 is a standard for Information Technology
Software Quality characteristics and sub
characteristics - Focuses on
- Functionality
- Usability
- Reliability
- Efficiency
- Portability
- Maintainability
- Valenti et al base their review around the first
three of these.
8ATP Guidelines for Computer-Based Testing 2002
- SCOPE Although computer-based testing is
increasingly used in a wide spectrum of testing
environments, the Guidelines are primarily aimed
at high stakes testing environments. To the
extent, therefore, that the issues surrounding
various low stakes applications are different
from those of high stakes testing, they will not
be discussed. By implication, this means that
the Guidelines are primary directed at tests that
are designed for older students or adults. In
addition to the low stakes applications mentioned
above, areas not addressed by the Guidelines
would include computer-based administration,
scoring, or interpretation of personality
measures or other measures of psychological
constructs, placement examinations, interest
inventories, or attitude surveys. Those who are
interested in the use of those types of measures
are referred to the Joint Standards APA,NCME,
AERA, 1999.
9ATP Guidelines Content
- PART 1 BACKGROUND EXPLANATIONS
- Chapter 1 Introduction
- AUDIENCES FOR THE GUIDELINES
- COMPUTER-BASED TESTING
- Test Administration Models
- Test-taker Response Types
- Low- and High-Stakes Testing Environments
- SCOPE OF THE GUIDELINES
- OVERVIEW
-
- Chapter 2 Validity and Test Design
- PLANNING THE TEST
- Testing Purpose
- Needs Analysis
- Validation Plan
-
- TEST SPECIFICATIONS
- The Job/Task Analysis
- Hardware and Software Specifications
- EVALUATION PLAN
-
- Chapter 3 Test Development and Analysis
- ITEM BANKING
- TEST ASSEMBLY
- TEST FAIRNESS
- ITEM AND TEST ANALYSIS
- TRACKING AND MONITORING
-
- Chapter 4 Test Administration
- GUIDELINES FOR TEST ADMINISTRATION
-
- PART 2 COMPUTER-BASED TESTING GUIDELINES
- Chapter 1 Planning and Design
- Chapter 2 Test Development
- Chapter 3 Test Administration
- Chapter 4 Scoring and Score Reporting
- Chapter 5 Psychometric Analysis
- Chapter 6 Stakeholder Communications
10BS7988 2002
- A Code of Practice for the use of information
technology for the delivery of assessments (2002) - The Standard relates to the use of Information
Technology to deliver assessments to candidates
and to record and score their responses. The
Scope is defined in terms of three dimensions -
the types of assessment to which it applies, the
stages of the assessment 'life cycle' to which it
applies and the Standard's focus on specifically
IT aspects.
11ISO/IEC 23988 2007
- Information technology -- A code of practice for
the use of information technology (IT) in the
delivery of assessments. Educational - Growth in the power and capabilities of
information technology (IT) has led to the
increasing use of IT to deliver, score and record
responses of tests and assessments in a wide
range of educational and other contexts. Suitably
used, IT delivery offers advantages of speed and
efficiency, better feedback and improvements in
validity and reliability, but its increased use
has raised issues about the security and fairness
of IT-delivered assessments, as well as resulting
in a wide range of different practices. - ISO/IEC 239882007 provides a means of
- showing that the delivery and scoring of the
assessment are fair and do not disadvantage some
groups of candidates, for example those who are
not IT literate - showing that a summative assessment has been
conducted under secure conditions and is the
authentic work of the candidate - showing that the validity of the assessment is
not compromised by IT delivery - providing evidence of the security of the
assessment, which can be presented to regulatory
and funding organizations (including regulatory
bodies in education and training, in industry or
in financial services) - establishing a consistent approach to the
regulations for delivery, which should be of
benefit to assessment centres who deal with more
than one assessment distributer - giving an assurance of quality to purchasers of
"off-the-shelf" assessment software.
12ISO 23988 2007
- ISO/IEC 239882007 gives recommendations on the
use of IT to deliver assessments to candidates
and to record and score their responses. Its
scope is defined in terms of three dimensions
the types of assessment to which it applies, the
stages of the assessment "life cycle" to which it
applies and its focus on specifically IT aspects. - Scope does not include many areas of occupational
and health related assessment - Includes Assessments of knowledge, understanding
and skills (i.e. achievement tests) - Excludes psychological tests of aptitude and
personality
13Aims for ITC guidelines
- ITC identified a need to look at the issues
surrounding CBT/Internet testing - Market and use is increasing
- Technological sophistication is improving
- Scope
- Broad sense of tests and testing
- Online on-screen (part or whole testing)
- Low and/or high stakes
- Proctored and un-proctored testing
- Used in different testing scenarios
- Who for?
- Test developers, publishers, users
- Relevant to others (consultants, test-takers,
national bodies, trainers etc.)
14General issues
- Technical speed, network integrity, reliability,
bandwidth etc. - Security Protecting IPR controlling test access
and distribution, keeping scoring and rules
confidential - Privacy Personal data access, legal issues
relating to data protection and storage. - Fairness Equality of access for all groups to
the net - the digital divide. - Range of administration modes
15ITC Internet Test Administration modes
Insecure mode
Moderately secure mode
Secure modes
Bartram, 2001
16(No Transcript)
17(No Transcript)
18End users want reports not test scores
Rethinking our paradigm
Does test produce valid scores?
Consequential validity Feedback from recipient
Expert end users
Does system interpret scores correctly?
Do recipients interpret reports correctly?
Standard validation paradigms Do test scores
correlate with criterion X?
Comparison with expert judgements Validity of
decisions
19Better practice in online assessment in work
assessments
- Manage the security of your assessments
- Use web-patrols
- Use data forensic analyses
- Use cheat-resistant assessments
- Use IRT-based test generation (randomised
testing) - Refresh items regularly
- Build large item bank
- Track item parameters
- Build verification procedures into your
assessment process - Retest short-listed applicants
- Establish and communicate a clear assessment
contract with the candidate - Set out a clear contract with applicants
- Explain the rules of engagement
- Ensure reports are designed to be fit for purpose
- Validate the report and not just the tests.
20Conclusions
- Most standards have focused on achievement
testing mainly within an educational and training
context NB ISO 23988 - ITC have broader remit and covers all types of
testing. - ISO PC230 not specially about CBT, but is an
example of a service delivery standard approach. - Key area missed by all of these tends to be
standards for the outcome of the assessment
21Questions?