Psychometrics - PowerPoint PPT Presentation

1 / 32
About This Presentation



Reliability: Refers to the dependability, consistency, and stability of a test ... Internal consistency may be high, but don't tell you about stability of test ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 33
Provided by: susank8


Transcript and Presenter's Notes

Title: Psychometrics

  • Reliability Refers to the dependability,
    consistency, and stability of a test

  • Measurements are more or less variable from one
    occasion to the next.
  • If they are reliable, then they are relatively
  • If unreliable, then they are relatively unstable

  • You have an orthopedic patient and it is
    necessary to frequently measure the ROM in the
  • Usually you do the measurement, but on some
    occasions another therapist measures for you.
  • If the test is unreliable, differences in
    performance may be due to the raters rather than
    due to the patients performance.

  • Think about reliability in terms of accuracy
  • How close are our measurements to the true
  • Observed score True score Error
  • The reliability refers to the error in the above
  • A more reliable test will have smaller error and
    the observed score will be closer to the true

Reliable Vs. Unreliable Test
Sources of Measurement Error
  • Individual Factors Health, motivation, mental
    efficiency, concentration, luck
  • Situational Factors Room environment
  • Subjectivity A problem with non-objective tests
  • Instrumental Factors Errors in equipment

Reliability Coefficient
  • Is a correlation coefficient
  • Ranges from -1.0 to 1.0
  • Reliabilities in the (-) range are not reported,
    and if obtained, may indicate a severe problem
    with the test.

Types of Reliability
  • 1. Test-Retest
  • 2. Alternate Form
  • 3. Internal Consistency
  • 4. Inter-Rater

Test-Retest Reliability
  • Results in a coefficient of test stability
  • Refers to how stable (or consistent) a test is
    from one administration to the next (by the same
  • It tells you how stable the test results are over
    a time period.

  • You are evaluating a child with a test that looks
    at tactile hypersensitivity. If the test is
    unreliable, then your test results will differ
    each time you use it.
  • The problem with this is that you cant tell if
    the test results are different due to a poorly
    designed test, or due to differences in the

Is Test-Retest Reliability Necessary for Every
  • No, in fact it isnt appropriate for every test.
  • If the test doesnt measure stable traits, then
    you wouldnt expect to have high test-retest
  • For example tests that measure attitudes would
    have lower test-retest reliability because
    attitudes vary over time.

How is Test-Retest Reliability Computed?
  • Need one group of subjects
  • Give the test to all the subjects
  • Wait a period of time
  • Give the same test to the same group (using the
    same examiner)
  • Correlate the results of the two test

How Long to Wait Between Administrations?
  • Depends on the nature of the test
  • For developmental tests wait less time (no more
    than a couple of weeks or so).
  • If you wait too long you cant tell if
    differences in scores are due to the test being
    unreliable or due to developmental factors.

  • For tests that measure stable characteristics you
    can wait longer.
  • You would probably not wait more than two months
    in any case.
  • The longer the time period, the more variation
    there will be in the scores from one
    administration to the next.

Alternate Form Reliability
  • Also called parallel-form or equivalent form
  • Tells you about the similarity between two forms
    of a test.
  • Only appropriate (or necessary) if you need to
    have multiple forms of the same test.
  • Not usually an issue for tests used in OT.

How to Compute Alternate Form Reliability
  • Construct two forms of a test
  • Forms need to have the same number of questions
  • Each item on one test must be related to an item
    on the other test
  • Tests should have the same means and SDs when
    given to a group.

  • Have one group of subjects
  • Half take form A of the test, then take form B
  • The other half take form B first, then form A.
  • This is called counterbalancing. It removes the
    variables of fatigue, inability to finish the
    test, etc.
  • Correlate scores between forms A and B to get
    reliability coefficient.

Internal Consistency
  • This type of reliability tells you if a test is
    homogeneous in its content
  • 1. split-half Performance on two halves of the
    test are compared.
  • Example test with 100 items
  • Even items 1st form
  • Odd items 2nd form
  • Calculate two scores for each person one for
    each form of the test
  • Correlate the two scores (odd and even)

  • Advantages
  • Each person only has to take one test.
  • Can determine reliability with one test, one
    administration, one group of subjects
  • Disadvantages
  • Reliability is affected by the number of items.
    The larger the number, the better the reliability
    (in general).
  • This is adjusted for using the Spearman-Brown
    Prophecy Formula.

  • 2. Kuder-Richardson (KR 20, KR 21)
  • Also a good way to obtain a reliability
    coefficient using only one test administration
    and one group of subjects.
  • These procedures do numerous split-half estimates
    (splitting the test up a different way each
    time). Then, the average of all these split half
    estimates is taken.

Inter-Rater Reliability
  • Estimates how consistent the test is when used by
    different raters.
  • Determined by
  • Percent agreement between raters
  • Correlation of raters scores
  • Kappa statistic (percent agreement that is
    corrected for chance)

Factors that Influence Reliability
  • Length of the test (longer test, higher
  • Range of scores in the sample (reliability
    higher if sample is more heterogeneous)
  • Difficulty of a test (items of average difficulty
    give the most information and make the test more

  • Reliability coefficient also affected by what
    method was used to estimate it.
  • Parallel form generally higher than test-retest
  • Length of time between tests affects test-retest
  • Internal consistency may be high, but dont tell
    you about stability of test over time or
    stability between raters.

How High Should Reliability Be?
  • Depends on the type of instrument.
  • Tests in the cognitive domain should have
    reliability coefficients in the .90s
  • Tests in the affective domain should have
    reliabilities in the .80s and above
  • Motor tests frequently in the .79 to .80 range.

What About Tests with Low Reliability?
  • Some tests may have overall reliability that is
    sufficient, but have some subtests with low
  • How should you use these tests?
  • Be careful of using them for placement decisions
  • Try to use another test in addition so that you
    can best-estimate the patients abilities.

Standard Error of Measurement (SEM)
  • Reliability is a statistic about a group. It
    doesnt tell you anything about an individuals
  • SEM is related to reliability and is used to make
    inferences about an individual.
  • We can use it to better estimate a patients true
    performance on a test.

Calculating SEM
  • SEM SD ?(1-reliability)
  • The higher the reliability, the smaller the SEM
  • The SEM can never be larger than the SD
  • If a test is perfectly unreliable, then the SEM
    is equal to the SD
  • The poorer the reliability, the larger the SEM.

Interpreting SEM
  • SEM is interpreted the same way that SDs are
  • Think about an individual taking a test many
  • These test results (observed scores) can be
    plotted on a graph and will eventually form a
    normal curve.
  • The SD for this distribution is the SEM

  • Since the SEM is like a SD, we can use the same
    percentages to interpret scores
  • 68 of the time the true score will be in the
    interval /- 1 SEM
  • 95 of the time the true score will be in the
    interval /- 2 SEM
  • 99.7 of the time the true score will be in the
    interval /- 3 SEM

  • SEM gives you a confidence interval around a
    persons score
  • Using this interval, you can avoid reporting one
    score (in some cases)
  • Instead, you report a range of scores, as is seen
    in a profile.
  • In a highly reliable test, the SEM is relatively
    smaller, and the confidence intervals around a
    persons score are smaller.

  • Conversely, when a test is very unreliable, the
    SEM is relatively larger.
  • This means that the confidence intervals around a
    persons score will be larger.
  • You are more uncertain of their true performance
    on the test.

  • The SEM is a powerful concept and will allow you
    to more accurately determine a test score.
  • SEM should be reported in the test manual, and
    you should be able to use it for any test or
Write a Comment
User Comments (0)