Title: Lesson Six
1Lesson Six
2Contents
- Definition of reliability
- Factors contributing to unreliability
- Types of reliability
- Indication of reliability Reliability
coefficient - Ways of obtaining reliability coefficient
- Alternate/Parallel forms
- Test-retest
- Split-half KR-21/KR-20
- Two ways of testing reliability
- How to make test more reliable
3Definition of Reliability (1)
- The consistency of measures across different
times, test forms, raters, and other
characteristics of the measurement context
(Bachman, 1990, p. 24). - If you give the same test to the same testees on
two different occasions, the test should yield
similar results.
4Definition of Reliability (2)
- A reliable test is consistent and dependable.
- Scores are consistent and reproducible.
- The accuracy or precision with which a test
measures something that is, consistency,
dependability, or stability of test results.
5Factors Contributing to Unreliability
- XT E (observed score true score error
score) - Concerned with freedom from nonsystematic
fluctuation. - Fluctuations in
- the student
- scoring
- test administration
- the test itself
6Types of Reliability
- Student- (or Person-) related reliability
- Rater- (or Scorer-) related reliability
- Intra-rater reliability
- Inter-rater reliability
- Test administration reliability
- Test (or instrument-related) reliability
7Student-Related Reliability (1)
- The source of the error score comes from the test
takers. - Temporary illness
- Fatigue
- Anxiety
- Other physical or psychological factors
- Test-wiseness (i.e., strategies for efficient
test taking)
8Student-Related Reliability (2)
- Principles
- Assess on several occasions
- Assess when person is prepared and best able to
perform well - Ensure that person understands what is expected
(e.g., instructions are clear)
9Rater (or Scorer) Reliability (1)
- Fluctuations including human error,
subjectivity, and bias - Principles
- Use experienced trained raters.
- Use more than one rater.
- Raters should carry out their assessments
independently.
10Rater Reliability (2)
- Two kinds of rater reliability
- Intra-rater reliability
- Inter-rater reliability
11Intra-Rater Reliability
- Fluctuations including
- Unclear scoring criteria
- Fatigue
- Bias toward particular good and bad students
- Simple carelessness
12Inter-Rater Reliability (1)
- Fluctuations including
- Lack of attention to scoring criteria
- Inexperience
- Inattention
- Preconceived biases
13Inter-Rater Reliability (2)
- Used with subjective tests when two or more
independent raters are involved in scoring - Train the raters before scoring (e.g., TWE, dept.
oral and composition tests for recommended
students).
14Inter-Rater Reliability (3)
- Compare the scores of the same testee given by
different raters. If r high, theres
inter-rater reliability.
15Test Administration Reliability
- Street noise
- Listening comprehension test
- Photocopying variations
- Lighting
- Variations in temperature
- Condition of desks and chairs
- Monitors
16Test Reliability
- Measurement errors come from the test itself
- Test is too long
- Test with a time limit
- Test format allows for guessing
- Ambiguous test items
- Test with more than one correct answer
17Reliability Coefficient (r)
- To quantify the reliability of a test ? allow us
to compare the reliability of different tests. - 0 r 1 (ideal r 1, which means the test gives
precisely the same results for particular testees
regardless of when it happened to be
administered). - If r 1 100 reliable
- A good achievement test rgt .90
- Rlt.70 ? shouldnt use the test
18How to Get Reliability Coefficient
- Two forms, two administrations
alternate/parallel forms - One form, two administrations test-retest
- One form, one administration (internal
consistency) - split-half (Spearman-Brown procedure)
- KR-21
- KR-20
19Alternate/Parallel Forms
- Two forms, two administrations
- Equivalent forms (i.e., different items testing
the same topic) taken by the same test taker on
different days - If r is high, this test is said to have good
reliability. - the most stringent form
20Test-Retest
- One form, two administrations
- The same test is administered to the same testees
with a short time lag, and then calculate r. - Appropriate for highly speeded test
21Split-half (Spearman-Brown Procedure)
- One test, one administration
- Split the test into halves (i.e., odd questions
vs even questions) to form two sets of scores. - Also called internal consistency
Q1 Q2 Q3 Q4 Q5 Q6
First Half
Second Half
22Split-half (2)
- Note that the r isnt the reliability of the test
- A math relationship between test length and
reliability the longer the test, the more
reliable it is. - Rel.total nr/1 (n-1)r ? Spearman Brown
Prophecy Formula - E.g., correlation between 2 parts of test r .6
? rel. of full test .75 - If lengthen the test items into 3 times r .82
23Kuder-Ridchardson formula 21
- KR-21 k/(k-1)1-x (1- x/k)/s2
- k number of items x mean
- s standard deviation (formula see Bailey 100)
- description of the spread outness in a set of
scores (or score deviations from the mean) - olts ? the larger s, the more spread out
- E.g., 2 sets of scores (5, 4,3) and (7,4,1)
which group in general behaves more similarly?
24Kuder-Ridchardson formula 20
- KR-20 k/(k-1)1-(?pq/s2)
- p item difficulty (percent of people who got an
item right) - q 1-p (i.e., percent of people who got an item
wrong)
25Ways of Testing Reliability
- Examine the amount of variation
- Standard Error of Measurement (SEM)
- The smaller the better
- Calculate reliability coefficient
- r
- The bigger the better
26Standard Error of Measurement (1)
- Average SD of an individual over a large number
of testing - Essence of variability of scores of an individual
- How large the error component is likely to be
- Particularly useful in interpretation of test
scores - SEM Sv1-rel.
27Standard Error of Measurement (2)
- Average of a set of scores true score of the
individual - X1T1 E1
- X2T2 E2
-
- Xn Tn En
- X T 0
28Standard Error of Measurement (3)
- E.g., GRE SD 100, rel. .91
- SEM 100 v1-.91 30
- How do we apply the SEM in the interpretation of
the score? - For a given spread of scores, the greater the
reliability coefficient, the smaller will be the
SEM.
29Ways of Enhancing Reliability
- General strategies
- Consider possible sources of unreliability
- Reduce or average out nonsystematic fluctuations
in - raters
- persons
- test administration
- instruments
30How to Make Tests More Reliable? (1)
- Take enough samples of behavior
- Try to avoid ambiguous items
- Provide clear and explicit instructions
- Ensure tests are well layout perfectly legible
- Provide uniform and undistracted condition of
administration - Try to use objective tests
31How to Make Tests More Reliable? (2)
- Try to use direct tests
- Have independent, trained raters
- Provide a detailed scoring key
- Try to identify the test takers by number, not by
names - Try to have more multiple independent scoring in
subjective tests - (Hughes, 1989, pp. 36-42).