Interpreting Kappa in Observational Research: Baserate Matters - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Interpreting Kappa in Observational Research: Baserate Matters

Description:

Two coders (Eager Beaver and Slack Jack), blind to the script, are asked to code ... Accuracy of Eager Beaver (EB) with session (interval data) .99 .01 ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 41
Provided by: corneli5
Category:

less

Transcript and Presenter's Notes

Title: Interpreting Kappa in Observational Research: Baserate Matters


1
Interpreting Kappa in Observational Research
Baserate Matters
  • Cornelia Taylor Bruckner
  • Vanderbilt University

2
Acknowledgements
  • Paul Yoder
  • Craig Kennedy
  • Niels Waller
  • Andrew Tomarken
  • MRDD training grant
  • KC Quant core

3
Overview
  • Agreement is a proxy for accuracy
  • Agreement statistics 101
  • Chance agreement
  • Agreement matrix
  • Baserate
  • Kappa and baserate, a paradox
  • Estimating accuracy from kappa
  • Applied example

4
Framing as observational coding
  • I will be framing the talk today within
    observational measurement but the concepts apply
    to many other situations e.g.,
  • Agreement between clinicians on diagnosis
  • Agreement between reporters on child symptoms
    (e.g. mothers and fathers)

5
Rater accuracy A fictitious session
  • Madeline Scientist writes a script for an
    interval coded observation session where the
  • Presence or absence of target behavior in
    interval
  • Two coders (Eager Beaver and Slack Jack), blind
    to the script, are asked to code the session.
  • Accuracy of each coder with the script is
    calculated

6
Accuracy of Eager Beaver (EB) with session
(interval data)
7
Accuracy of Slack Jack (SJ) with session
(interval data)
8
Who has the best accuracy?
  • Eager Beaver of course.
  • Slack Jack was not very accurate
  • Notice that accuracy is about agreement with the
    occurrence and nonoccurrence of behavior.

9
We dont always know the truth
  • It is great when we know the true occurrence and
    nonoccurrence of behaviors
  • But, in the real world we deal with agreement
    between fallible observers

10
Agreement between raters
  • Point by point interobserver agreement is
    achieved when independent observers
  • see the same thing (behavior, event)
  • at the same time

11
Difference between agreement and accuracy
  • Agreement can be directly measured.
  • Accuracy can not be directly measured.
  • We dont know the truth of a session.
  • However, agreement is used as a proxy for
    accuracy
  • Accuracy can be estimated from agreement
  • The method for this estimation is the focus of
    todays talk

12
Percent agreement
  • Percent agreement
  • The proportion of intervals that were agreed upon
  • Agreements/agreementsdisagreements
  • Takes into account occurrence and nonoccurrence
    agreement
  • Varies from 0-100

13
Occurrence and Nonoccurrence agreement
  • Occurrence agreement
  • The proportion of intervals that either coder
    recorded the behavior that were agreed upon
  • Positive agreement
  • Non-occurrence agreement
  • The proportion of intervals that either coder
    recorded a nonoccurrence that were agreed upon
  • Negative agreement

14
Problem with agreement statistics
  • We assume that agreement is due to accuracy
  • Agreement statistics do not control for chance
    agreement
  • So agreement could be due only to chance

15
Chance agreement and point by point agreement
Nonoccurrence agreement
Occurrence agreement
16
Agreement matrix
17
Using a 2x2 table to check agreement on
individual codes
  • When IOA is computed on the total code set it is
    an omnibus measure of agreement
  • This does not inform us on agreement on any one
    code.
  • To know agreement on a particular code the
    confusion matrix needs to be collapsed into a 2x2
    matrix.

18
(No Transcript)
19
(No Transcript)
20
Baserate in A 2x2 table
Eager Beaver
All other emotions
Happy
Slack Jack
70
10
60
Happy
123
7
All other emotions
200

67
(6770)/(2200)
.34
21
Review
  • Defined accuracy
  • Described the relationship between chance
    agreement and IOA
  • Creating a 2x2 table
  • Calculating a best estimate of the base rate

22
Kappa
  • Kappa is an agreement statistic that controls for
    chance agreement
  • Before kappa there was a sense that we should
    control for chance but we did not know how
  • Cohens 1960 paper has been cited over 7000 times

23
Definition of Kappa
  • Kappa is the proportion of non-chance agreement
    observed out of all the non-chance agreement
  • K Po-Pe
  • 1 - Pe

24
Definition of Terms
  • Po The proportion of events for which there is
    observed agreement.
  • Same metric as percent agreement
  • Pe The proportion of events for which agreement
    would be expected by chance alone
  • Defined as the probability of two raters coding
    the same behavior at the same time by chance

25
Agreement matrix for EB and SJ with (chance
agreement)
Po .36.18 Pe .33 .15 k
(.54-.48)/(1-.48).12
26
What determines the value of kappa
  • Accuracy and base rate
  • Increasing accuracy increases observed agreement
    therefore kappa is a consistent estimator of
    accuracy if base rate is held constant
  • If accuracy is held constant, kappa will decrease
    as the estimated true base rate deviates from .5

27
Obtained kappa, across baserate, for 80 accuracy
Accuracy 80
28
Obtained kappa, across baserate, for 80 and 99
accuracy
Accuracy 99
Accuracy 80
29
Obtained kappa, across baserate, from 80 to 99
accuracy
Accuracy99
Accuracy95
Accuracy90
Accuracy85
Accuracy80
30
Bottom line
  • When we observe behaviors that are High or Low
    baserate our kappas will be low.
  • This is important for researchers studying low
    baserate behaviors
  • Many of the behaviors we observe in young
    children with developmental disabilities are very
    low baserate

31
Criterion values for IOA
  • Cohen never suggested using criterion values for
    kappa
  • Many professional organizations recommend
    criterions for IOA
  • e.g., The Council for Exceptional Children
    Division for Research Recommendations 2005
  • Data are collected on the reliability or
    inter-observer agreement (IOA) associated with
    each dependent variable, and IOA levels meet
    minimal standards (e.g., IOA 80 Kappa .60)

32
Criterion accuracy?
  • Setting a criterion for kappa independent of
    baserate is not useful
  • If we can estimate accuracy
  • And I am suggesting that we can
  • We need to consider what sufficient accuracy
    would be

33
Criterion accuracy cont.
  • If we consider 80 agreement sufficient than
  • Would we consider 80 accuracy sufficient?
  • If we used 80 accuracy as a criterion
  • Acceptable kappa could be as low as .19 depending
    on baserate

34
Why it is really important not to use criterion
kappas
  • There is a belief that the quality of data will
    be higher if kappa is higher.
  • This is only true if there is no associated loss
    of content or construct validity.
  • The processes of collapsing and redefining codes
    often result in a loss of validity.

35
Applied example
  • See handout for formulas and data

36
Use the table on the first page of your handout
to determine the accuracy of raters from
baserate and kappa
37
.32
.85
38
Recommendations
  • Calculate agreement for each code using a 2x2
    table
  • Use the table to determine the accuracy of
    observers from baserate and obtained kappa
  • Report kappa and accuracy

39
Software to calculate kappa
  • Comkappa, Developed by Bakeman to calculate
    kappa, SE of kappa, kappa max, and weighted
    kappa.
  • MOOSES, Developed by Jon Tapp. Calculates kappa
    on the total code set and individual codes. Can
    be used with live coding, video coding, and
    transcription.
  • SPSS

40
Challenge
  • The challenge is to change the standards of
    observational research that demand kappa's above
    a criteria of .6
  • Editors
  • PIs
  • Collaborators
Write a Comment
User Comments (0)
About PowerShow.com