Reliability - PowerPoint PPT Presentation

About This Presentation
Title:

Reliability

Description:

Reliability As my grand pappy, Old Reliable, used to say . . . Who is this famous bloodhound? What was he noted for saying? * What CU? * Reliability Topics: The Basic ... – PowerPoint PPT presentation

Number of Views:224
Avg rating:3.0/5.0
Slides: 17
Provided by: McEw
Category:

less

Transcript and Presenter's Notes

Title: Reliability


1
Reliability
  • As my grand pappy, Old Reliable,
  • used to say . . .
  • Who is this famous bloodhound?
  • What was he noted for saying?

2
What CU?
3
Reliability Topics
  • The Basic Notion of Reliability
  • Factors Affecting Reliability
  • Methods of Determining Reliability
  • Methods Used by Professional Test Makers
  • Method Suggested for Your Own Tests
  • Standard Error of Measurement
  • Confidence Bands

4
Basic Notions of Reliability
  • Reliability refers to the reliability of a test
    score or set of test scores, not the reliability
    of the test.
  • Reliability questions ask Are the scores
    consistent? Are they stable?
  • Reliability is a matter of degree it is NOT
    all-or-none.
  • Reliability is not the same as validity
    validity asks Does a test measure what is
    suppose to? (reliability is necessary for, but
    not a sufficient condition for, validity) .
  • Reliability deals with unsystematic error in
    assessment. Systematic error (examples, I test
    well because I am test-wise or I do not test
    well because English is not my first language)
    will not be uncovered through tests of reliability

5
Factors Affecting ReliabilitySources of
Unreliability
  • Test Scoring
  • difference between two scorers judgments
  • one scorer over time (fatigue) and/or halo effect
  • Test Content
  • the sample of test items is too small
  • the sample of test items is not evenly selected
    across material
  • Test Administration
  • noise, time limits not consistent, physical
    conditions
  • Personal Conditions
  • temporary ups and downs
  • (chronic test anxiety would be a systematic error
    and thus undetectable through measures of
    reliability)
  • Note None of these factors automatically result
    in unreliability, but as we build our
    assessments, we hope to reduce the impact of
    these factors. The extent to which these factors
    may be affecting test scores is an empirical
    question and we can and will address this as we
    continue.

6
A Bit of Theory (True/Observed)
  • The perfect test would be unaffected by the
    sources of unreliability and on this perfect test
    each examinee should get his or her true score.
    Unfortunately, we know the observed score we get
    was likely affected by one or more of the sources
    of unreliability.
  • So, our observed score is likely too high or to
    low. The difference between the observed score
    and the true score we call the error score and
    this score can be positive or negative.
  • We can express this mathematically as
  • True Score Obtained Score /- Error
  • T O /- E (or, looking at it another way, O T
    /- E)
  • Theory Time If we could re-administer a test to
    one person an infinite number of times, what
    would expect the distribution of their scores to
    look like? Answer The Bell Shaped Curve. We
    will return to this concept when we discuss the
    standard error of measurement.

7
Determining Reliability by Usingthe Concept of
Correlation
  • I can use my understanding of correlation (how
    two things are related) to come up with a
    mathematical calculation that will suggest the
    strength (or lack of strength) regarding one or
    more of the sources of unreliability that I have
    identified.
  • I will be calculating what will be called the
    reliability coefficient (since it is a
    correlation coefficient measuring a type of
    reliability). This value will range -1 to 1.
  • For example, lets consider rater reliability.
    That is, do different scorers rate equally or,
    another concern, does one scorer rate differently
    over time. We express that as either
  • Inter-rater reliability among raters
    (international many nations)
  • Intra-rater same rater (intramural sports
    within 1 school)
  • Note the hyphen after inter- and intra- may not
    be used by some authors
  • Compute using Spearman Rank Correlation

8
Re-enter the Correlation Coefficient - the
calculated number that best describes the
relationship between two variables, but now we
will call it the reliability coefficient
  • Reliability coefficient symbol is r
    linear relationships
  • Range -1.00 through .00 to 1.00
  • Sign indicates direction
  • indicates that as one variable increases, the
    other variable increases
  • - indicates that as one variable increases,
    the other variable decreases
  • Number indicates strength
  • Although the following table is somewhat
    arbitrary, the following thinking might be useful
    in interpretation
  • -1.0 to -0.7 strong converse association.
  • -0.7 to -0.3 weak converse association.
  • -0.3 to 0.3 little or no association.
  • 0.3 to 0.7 weak direct association.
  • 0.7 to 1.0 strong direct association.

9
Some History . . . Karl Pearson (1857-1936)
  • Pearson was a Galton protégé and was appointed
    the first Galton Professor or Eugenics (1911) at
    University College of London .
  • Introduced a new science  "Biometrics" which
    integrated statistics with evolutionary theory.
  • Advocated social imperialism "superior" races
    and countries should produce more offspring than
    those considered to be less developed.
  • In the United States, Indiana was the first to
    pass a pioneering statute (1907) allowing state
    officials to sterilize those deemed unfit to
    breed. California enacted an even stricter
    eugenics law.  California made it legal for state
    officials to asexualize those considered
    feeble-minded, prisoners exhibiting sexual or
    moral perversions, and anyone with more than
    three criminal convictions. 

10
More Reliability Approaches to Consider
  • Test-retest (impractical for you important in
    standardized tests)
  • Alternate Forms (again, impractical for you but
    important in standardized tests)
  • Internal Consistency (not appropriate for speeded
    tests)
  • Kuder-Richardson (really a series of formulas
    based on dichotomously scored items)
  • Coefficient alpha - Cronbachs (most widely used
    as can be used with continuous item types)
  • Split-half odd-even w/Spearman-Brown
    correction to apply to full test (easiest for you
    to do and understand)

11
Reliability of Your Classroom Tests
  • I would recommend doing Split-Half Reliability.
  • Step 1 Split your test into two parts (odd
    even).
  • Step 2 Use Pearson Product Moment Correlation
    - Ungrouped Data to determine rxy (rxy
    represents the correlation between the two halves
    of the scale). By doing the split-half we reduce
    the number of items which we know will
    automatically reduce the reliability, SO
  • Step 3 To estimate reliability of whole test
    then use the Spearman Brown correction formula
  • rsb 2rxy /(1rxy)
  • where rsb is the split-half reliability
    coefficient

12
As a Teacher, What Do I Need to Know Most About
Reliability
  • For tests I create myself
  • Increasing number of items increases reliability.
  • Moderate difficulty level increases reliability.
  • Having items measuring similar content increases
    reliability.
  • For standardized tests I use
  • Look for each tests published reliability data.
  • Use the published reliability coefficient to
    determine the Standard Error of Measurement
    (abbreviated SEM) found in the data
  • See the following illustration

13
Standard Error of Measurement
  • The SEM is the standard deviation of a
    hypothetically infinite number of obtained scores
    around a persons true score.

14
SEM and Confidence Bands
  • The SEM is a standard deviation of a distribution
    assumed to be normal .
  • So computing the SEM can help me better interpret
    scores
  • Formula SEM SD ? 1 - r
  • I can take the computed SEM and build a
    Confidence Band around my score.
  • Confidence Band
  • 68 Confidence Band /- 1 SEM
  • 95 Confidence /- 1.96 SEM
  • 99 Confidence /- 2.58 SEM
  • I can also do percentiles (a bit harder).
  • Many professional test makers give me this
    information

15
Final Thoughts Advice
  • Use multiple sources of information.
  • Find and Use a published tests SEM to help
    interpretation.
  • Standard Error of Measurement is distinct from
  • Standard error of mean (samples/populations)
  • Standard error of estimate (prediction)
  • Reliability for Criterion-referenced Items may
    use techniques already covered but sometimes
    require special treatment.
  • Worry about scorer reliability when score depends
    on judgment.

16
More Final Words . . .
  • Reliability for Sub Scores is problematic since
    small clusters are usually quite unreliable.
  • For important decisions, get reliability gt.90.
  • Be wary of short tests. To increase reliability,
    increase number of items, exercises, or
    observations.
  • Occasionally check reliability of your classroom
    tests.
  • Be able to distinguish between reliability and
    validity.
Write a Comment
User Comments (0)
About PowerShow.com