Lesson Six - PowerPoint PPT Presentation

About This Presentation
Title:

Lesson Six

Description:

Lesson Six Reliability Contents Definition of reliability Factors contributing to unreliability Types of reliability Indication of reliability: Reliability ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 32
Provided by: engFjuEd4
Category:
Tags: lesson | six | taking | test

less

Transcript and Presenter's Notes

Title: Lesson Six


1
Lesson Six
  • Reliability

2
Contents
  • Definition of reliability
  • Factors contributing to unreliability
  • Types of reliability
  • Indication of reliability Reliability
    coefficient
  • Ways of obtaining reliability coefficient
  • Alternate/Parallel forms
  • Test-retest
  • Split-half KR-21/KR-20
  • Two ways of testing reliability
  • How to make test more reliable

3
Definition of Reliability (1)
  • The consistency of measures across different
    times, test forms, raters, and other
    characteristics of the measurement context
    (Bachman, 1990, p. 24).
  • If you give the same test to the same testees on
    two different occasions, the test should yield
    similar results.

4
Definition of Reliability (2)
  • A reliable test is consistent and dependable.
  • Scores are consistent and reproducible.
  • The accuracy or precision with which a test
    measures something that is, consistency,
    dependability, or stability of test results.

5
Factors Contributing to Unreliability
  • XT E (observed score true score error
    score)
  • Concerned with freedom from nonsystematic
    fluctuation.
  • Fluctuations in
  • the student
  • scoring
  • test administration
  • the test itself

6
Types of Reliability
  • Student- (or Person-) related reliability
  • Rater- (or Scorer-) related reliability
  • Intra-rater reliability
  • Inter-rater reliability
  • Test administration reliability
  • Test (or instrument-related) reliability

7
Student-Related Reliability (1)
  • The source of the error score comes from the test
    takers.
  • Temporary illness
  • Fatigue
  • Anxiety
  • Other physical or psychological factors
  • Test-wiseness (i.e., strategies for efficient
    test taking)

8
Student-Related Reliability (2)
  • Principles
  • Assess on several occasions
  • Assess when person is prepared and best able to
    perform well
  • Ensure that person understands what is expected
    (e.g., instructions are clear)

9
Rater (or Scorer) Reliability (1)
  • Fluctuations including human error,
    subjectivity, and bias
  • Principles
  • Use experienced trained raters.
  • Use more than one rater.
  • Raters should carry out their assessments
    independently.

10
Rater Reliability (2)
  • Two kinds of rater reliability
  • Intra-rater reliability
  • Inter-rater reliability

11
Intra-Rater Reliability
  • Fluctuations including
  • Unclear scoring criteria
  • Fatigue
  • Bias toward particular good and bad students
  • Simple carelessness

12
Inter-Rater Reliability (1)
  • Fluctuations including
  • Lack of attention to scoring criteria
  • Inexperience
  • Inattention
  • Preconceived biases

13
Inter-Rater Reliability (2)
  • Used with subjective tests when two or more
    independent raters are involved in scoring
  • Train the raters before scoring (e.g., TWE, dept.
    oral and composition tests for recommended
    students).

14
Inter-Rater Reliability (3)
  • Compare the scores of the same testee given by
    different raters. If r high, theres
    inter-rater reliability.

15
Test Administration Reliability
  • Street noise
  • Listening comprehension test
  • Photocopying variations
  • Lighting
  • Variations in temperature
  • Condition of desks and chairs
  • Monitors

16
Test Reliability
  • Measurement errors come from the test itself
  • Test is too long
  • Test with a time limit
  • Test format allows for guessing
  • Ambiguous test items
  • Test with more than one correct answer

17
Reliability Coefficient (r)
  • To quantify the reliability of a test ? allow us
    to compare the reliability of different tests.
  • 0 r 1 (ideal r 1, which means the test gives
    precisely the same results for particular testees
    regardless of when it happened to be
    administered).
  • If r 1 100 reliable
  • A good achievement test rgt .90
  • Rlt.70 ? shouldnt use the test

18
How to Get Reliability Coefficient
  • Two forms, two administrations
    alternate/parallel forms
  • One form, two administrations test-retest
  • One form, one administration (internal
    consistency)
  • split-half (Spearman-Brown procedure)
  • KR-21
  • KR-20

19
Alternate/Parallel Forms
  • Two forms, two administrations
  • Equivalent forms (i.e., different items testing
    the same topic) taken by the same test taker on
    different days
  • If r is high, this test is said to have good
    reliability.
  • the most stringent form

20
Test-Retest
  • One form, two administrations
  • The same test is administered to the same testees
    with a short time lag, and then calculate r.
  • Appropriate for highly speeded test

21
Split-half (Spearman-Brown Procedure)
  • One test, one administration
  • Split the test into halves (i.e., odd questions
    vs even questions) to form two sets of scores.
  • Also called internal consistency

Q1 Q2 Q3 Q4 Q5 Q6
First Half
Second Half
22
Split-half (2)
  • Note that the r isnt the reliability of the test
  • A math relationship between test length and
    reliability the longer the test, the more
    reliable it is.
  • Rel.total nr/1 (n-1)r ? Spearman Brown
    Prophecy Formula
  • E.g., correlation between 2 parts of test r .6
    ? rel. of full test .75
  • If lengthen the test items into 3 times r .82

23
Kuder-Ridchardson formula 21
  • KR-21 k/(k-1)1-x (1- x/k)/s2
  • k number of items x mean
  • s standard deviation (formula see Bailey 100)
  • description of the spread outness in a set of
    scores (or score deviations from the mean)
  • olts ? the larger s, the more spread out
  • E.g., 2 sets of scores (5, 4,3) and (7,4,1)
    which group in general behaves more similarly?

24
Kuder-Ridchardson formula 20
  • KR-20 k/(k-1)1-(?pq/s2)
  • p item difficulty (percent of people who got an
    item right)
  • q 1-p (i.e., percent of people who got an item
    wrong)

25
Ways of Testing Reliability
  • Examine the amount of variation
  • Standard Error of Measurement (SEM)
  • The smaller the better
  • Calculate reliability coefficient
  • r
  • The bigger the better

26
Standard Error of Measurement (1)
  • Average SD of an individual over a large number
    of testing
  • Essence of variability of scores of an individual
  • How large the error component is likely to be
  • Particularly useful in interpretation of test
    scores
  • SEM Sv1-rel.

27
Standard Error of Measurement (2)
  • Average of a set of scores true score of the
    individual
  • X1T1 E1
  • X2T2 E2
  • Xn Tn En
  • X T 0

28
Standard Error of Measurement (3)
  • E.g., GRE SD 100, rel. .91
  • SEM 100 v1-.91 30
  • How do we apply the SEM in the interpretation of
    the score?
  • For a given spread of scores, the greater the
    reliability coefficient, the smaller will be the
    SEM.

29
Ways of Enhancing Reliability
  • General strategies
  • Consider possible sources of unreliability
  • Reduce or average out nonsystematic fluctuations
    in
  • raters
  • persons
  • test administration
  • instruments

30
How to Make Tests More Reliable? (1)
  • Take enough samples of behavior
  • Try to avoid ambiguous items
  • Provide clear and explicit instructions
  • Ensure tests are well layout perfectly legible
  • Provide uniform and undistracted condition of
    administration
  • Try to use objective tests

31
How to Make Tests More Reliable? (2)
  • Try to use direct tests
  • Have independent, trained raters
  • Provide a detailed scoring key
  • Try to identify the test takers by number, not by
    names
  • Try to have more multiple independent scoring in
    subjective tests
  • (Hughes, 1989, pp. 36-42).
Write a Comment
User Comments (0)
About PowerShow.com