Interpreting Kappa in Observational Research: Baserate Matters - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Interpreting Kappa in Observational Research: Baserate Matters

Description:

Two coders (Eager Beaver and Slack Jack), blind to the script, are asked to code ... Accuracy of Eager Beaver (EB) with session (interval data) .99 .01 ... – PowerPoint PPT presentation

Number of Views:116

Avg rating:3.0/5.0

Slides: 41

Provided by: corneli5

Category:

more less

Transcript and Presenter's Notes

Title: Interpreting Kappa in Observational Research: Baserate Matters

1
Interpreting Kappa in Observational Research
Baserate Matters

Cornelia Taylor Bruckner
Vanderbilt University

2
Acknowledgements

Paul Yoder
Craig Kennedy
Niels Waller
Andrew Tomarken
MRDD training grant
KC Quant core

3
Overview

Agreement is a proxy for accuracy
Agreement statistics 101
Chance agreement
Agreement matrix
Baserate
Kappa and baserate, a paradox
Estimating accuracy from kappa
Applied example

4
Framing as observational coding

I will be framing the talk today within
observational measurement but the concepts apply
to many other situations e.g.,
Agreement between clinicians on diagnosis
Agreement between reporters on child symptoms
(e.g. mothers and fathers)

5
Rater accuracy A fictitious session

Madeline Scientist writes a script for an
interval coded observation session where the
Presence or absence of target behavior in
interval
Two coders (Eager Beaver and Slack Jack), blind
to the script, are asked to code the session.
Accuracy of each coder with the script is
calculated

6
Accuracy of Eager Beaver (EB) with session
(interval data)
7
Accuracy of Slack Jack (SJ) with session
(interval data)
8
Who has the best accuracy?

Eager Beaver of course.
Slack Jack was not very accurate
Notice that accuracy is about agreement with the
occurrence and nonoccurrence of behavior.

9
We dont always know the truth

It is great when we know the true occurrence and
nonoccurrence of behaviors
But, in the real world we deal with agreement
between fallible observers

10
Agreement between raters

Point by point interobserver agreement is
achieved when independent observers
see the same thing (behavior, event)
at the same time

11
Difference between agreement and accuracy

Agreement can be directly measured.
Accuracy can not be directly measured.
We dont know the truth of a session.
However, agreement is used as a proxy for
accuracy
Accuracy can be estimated from agreement
The method for this estimation is the focus of
todays talk

12
Percent agreement

Percent agreement
The proportion of intervals that were agreed upon
Agreements/agreementsdisagreements
Takes into account occurrence and nonoccurrence
agreement
Varies from 0-100

13
Occurrence and Nonoccurrence agreement

Occurrence agreement
The proportion of intervals that either coder
recorded the behavior that were agreed upon
Positive agreement
Non-occurrence agreement
The proportion of intervals that either coder
recorded a nonoccurrence that were agreed upon
Negative agreement

14
Problem with agreement statistics

We assume that agreement is due to accuracy
Agreement statistics do not control for chance
agreement
So agreement could be due only to chance

15
Chance agreement and point by point agreement
Nonoccurrence agreement
Occurrence agreement
16
Agreement matrix
17
Using a 2x2 table to check agreement on
individual codes

When IOA is computed on the total code set it is
an omnibus measure of agreement
This does not inform us on agreement on any one
code.
To know agreement on a particular code the
confusion matrix needs to be collapsed into a 2x2
matrix.

18
(No Transcript)
19
(No Transcript)
20
Baserate in A 2x2 table
Eager Beaver
All other emotions
Happy
Slack Jack
70
10
60
Happy
123
7
All other emotions
200

67
(6770)/(2200)
.34
21
Review

Defined accuracy
Described the relationship between chance
agreement and IOA
Creating a 2x2 table
Calculating a best estimate of the base rate

22
Kappa

Kappa is an agreement statistic that controls for
chance agreement
Before kappa there was a sense that we should
control for chance but we did not know how
Cohens 1960 paper has been cited over 7000 times

23
Definition of Kappa

Kappa is the proportion of non-chance agreement
observed out of all the non-chance agreement
K Po-Pe
1 - Pe

24
Definition of Terms

Po The proportion of events for which there is
observed agreement.
Same metric as percent agreement
Pe The proportion of events for which agreement
would be expected by chance alone
Defined as the probability of two raters coding
the same behavior at the same time by chance

25
Agreement matrix for EB and SJ with (chance
agreement)
Po .36.18 Pe .33 .15 k
(.54-.48)/(1-.48).12
26
What determines the value of kappa

Accuracy and base rate
Increasing accuracy increases observed agreement
therefore kappa is a consistent estimator of
accuracy if base rate is held constant
If accuracy is held constant, kappa will decrease
as the estimated true base rate deviates from .5

27
Obtained kappa, across baserate, for 80 accuracy
Accuracy 80
28
Obtained kappa, across baserate, for 80 and 99
accuracy
Accuracy 99
Accuracy 80
29
Obtained kappa, across baserate, from 80 to 99
accuracy
Accuracy99
Accuracy95
Accuracy90
Accuracy85
Accuracy80
30
Bottom line

When we observe behaviors that are High or Low
baserate our kappas will be low.
This is important for researchers studying low
baserate behaviors
Many of the behaviors we observe in young
children with developmental disabilities are very
low baserate

31
Criterion values for IOA

Cohen never suggested using criterion values for
kappa
Many professional organizations recommend
criterions for IOA
e.g., The Council for Exceptional Children
Division for Research Recommendations 2005
Data are collected on the reliability or
inter-observer agreement (IOA) associated with
each dependent variable, and IOA levels meet
minimal standards (e.g., IOA 80 Kappa .60)

32
Criterion accuracy?

Setting a criterion for kappa independent of
baserate is not useful
If we can estimate accuracy
And I am suggesting that we can
We need to consider what sufficient accuracy
would be

33
Criterion accuracy cont.

If we consider 80 agreement sufficient than
Would we consider 80 accuracy sufficient?
If we used 80 accuracy as a criterion
Acceptable kappa could be as low as .19 depending
on baserate

34
Why it is really important not to use criterion
kappas

There is a belief that the quality of data will
be higher if kappa is higher.
This is only true if there is no associated loss
of content or construct validity.
The processes of collapsing and redefining codes
often result in a loss of validity.

35
Applied example

See handout for formulas and data

36
Use the table on the first page of your handout
to determine the accuracy of raters from
baserate and kappa
37
.32
.85
38
Recommendations

Calculate agreement for each code using a 2x2
table
Use the table to determine the accuracy of
observers from baserate and obtained kappa
Report kappa and accuracy

39
Software to calculate kappa

Comkappa, Developed by Bakeman to calculate
kappa, SE of kappa, kappa max, and weighted
kappa.
MOOSES, Developed by Jon Tapp. Calculates kappa
on the total code set and individual codes. Can
be used with live coding, video coding, and
transcription.
SPSS

40
Challenge