ACM email corpus annotation analysis - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

ACM email corpus annotation analysis

Description:

Annotated by 2 annotators with one or two of the following 10 labels ... Before running ML procedures, we need confidence in assigning labels to the messages. ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 23

Provided by: andrewro

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: ACM email corpus annotation analysis

1
ACM email corpus annotation analysis

Andrew Rosenberg
2/26/2004

2
Overview

Motivation
Corpus Description
Kappa Shortcomings
Kappa Augmentation
Classification of messages
Corpus annotation analysis
Next step Sharpening method
Summary

3
Motivation

The ACM email corpus annotation raises two
problems.
By allowing annotators to assign a message one or
two labels, there is no clear way to calculate an
annotation statistic.
An augmentation to the kappa statistic is
proposed
Interannotator reliability is low (K lt .3)
Annotator reeducation and/or annotation material
redesign are most likely necessary.
Available annotated data can be used,
hypothetically, to improve category assignment.

4
Corpus Description

312 email messages exchanged between the Columbia
chapter of the ACM.
Annotated by 2 annotators with one or two of the
following 10 labels
question, answer, broadcast, attachment
transmission, planning, planning scheduling,
planning-meeting scheduling, action item,
technical discussion, social chat

5
Kappa Shortcomings

Before running ML procedures, we need confidence
in assigning labels to the messages.
In order to compute kappa (below) we need to
count up the number of agreements.
How do you determine agreement with an optional
secondary label?
Ignore the secondary label?

6
Kappa Shortcomings (ctd.)

Ignoring the secondary label isnt acceptable for
two reasons.
It is inconsistent with the annotation
guidelines.
It ignores partial agreements.
a,ba - singleton matches secondary
ab,ca - primary matches secondary
ab,cb - secondary matches secondary
ab,ba - secondary matches primary, and vice
versa
Note The purpose is not to inflate the kappa
value, but to accurately assess the data.

7
Kappa Augmentation

When a labeler employs a secondary label,
consider it as a single annotation divided
between two categories
Select a value of p, where 0.5p1.0, based on
how heavily to weight the secondary label
Singleton annotations assigned a score of 1.0
Primary p
Secondary 1-p

8
Kappa Augmentation example

Annotator labels

Annotation Matrices with p0.6
Judge A a b c d
1 0.6 0.4
2 0.4 0.6
3 1
4 1
5 0.6 0.4
Total 1 2.6 1.4 0 5
A B
1 a,b b,d
2 b,a a,b
3 b b
4 c a,d
5 b,c c
Judge B a b c d
1 0.6 0.4
2 0.6 0.4
3 1
4 0.6 0.4
5 1
Total 1.2 2 1 0.8 5
9
Kappa Augmentation example (ctd.)
Annotation Matrices
Agreement matrix
Judge A a b c d
1 0.6 0.4
2 0.4 0.6
3 1
4 1
5 0.6 0.4
Total 1 2.6 1.4 0 5
a b c d
1 0 0.24 0 0
2 0.24 0.24 0 0
3 0 1.0 0 0
4 0 0 0 0
5 0 0 0.4 0
Total 0.24 1.48 0.4 0 2.12
Judge B a b c d
1 0.6 0.4
2 0.6 .4
3 1
4 0.6 0.4
5 1
Total 1.2 2 1 0.8 5
10
Kappa Augmentation example (ctd.)

To calculate p(E), use the relative frequencies
of each annotators label usage.

P(Topic) Judge A Judge B P(A)P(B)
a 0.2 0.24 0.048
b 0.52 0.4 0.208
c 0.28 0.2 0.056
d 0 0.16 0
p(E) 0.312

Kappa is then computed as originally

11
Classification of messages

This augmentation allows us to classify messages
based their individual kappa values at different
values of p.
Class 1 high kappa at all values of p.
Class 2 low kappa at all values of p.
Class 3 high kappa at p 1.0
Class 4 high kappa at p 0.5
Note mathematically kappa neednt be monotonic
w.r.t. p, but with 2 annotators it is.

12
Corpus Annotation Analysis

Agreement is low at all values of p
K(p1.0) 0.299
K(p0.5) 0.281
Other views of the data will provide some insight
into how to revise the annotation scheme.
Category distribution
Category co-occurrence
Category confusion
Class distribution
Category by class distribution

13
Corpus Annotation AnalysisCategory Distribution
total gr db
Question 175 86 89
Answer 169 90 79
Broadcast 132 23 109
Attachment Transmission 3 1 2
Planning Meeting Scheduling 63 32 31
Planning Scheduling 27 22 5
Planning 92 76 16
Action Item 19 10 9
Technical Discussion 31 22 9
Social Chat 36 29 7
14
Corpus Annotation AnalysisCategory Co-occurrence
Q A B A.T. P.M.S P.S. P. A.I T.D S.C
Question x 19 12 1 8 6 17 1 6 7
Answer x x 2 0 15 3 4 1 7 2
Broadcast x x x 0 2 2 8 0 0 1
Attachment Transmission x x x x 0 0 0 0 0 0
Planning Meeting Scheduling x x x x x 2 1 0 0 0
Planning Scheduling x x x x x x 0 0 0 0
Planning x x x x x x x 3 2 0
Action Item x x x x x x x x 1 0
Technical Discussion x x x x x x x x x 1
Social Chat x x x x x x x x x x
15
Corpus Annotation AnalysisCategory Confusion
Q A B A.T. P.M.S. P.S P A.I T.D. S.C.
Question 62 36 21 0 18 13 47 7 13 10
Answer x 60 15 0 24 7 19 5 17 3
Broadcast x x 14 0 12 13 52 3 8 22
Attachment Transmission x x x 0 0 0 1 0 0 1
Planning Meeting Scheduling x x x x 13 6 3 2 0 0
Planning Scheduling x x x x x 2 4 1 1 0
Planning x x x x x x 7 5 5 0
Action Item x x x x x x x 1 2 1
Technical Discussion x x x x x x x x 2 1
Social Chat x x x x x x x x x 4
16
Corpus Annotation AnalysisClass Distribution
Constant High (Class 1) 82 0.262821
Constant Low (Class 2) 150 0.480769
Low to High (Class 3) 40 0.128205
High to Low (Class 4) 40 0.128205
Total Messages 312
17
Corpus Annotation AnalysisCategory by Class
Distribution-1/2
Class 1const. high
Class 2const. low
Num messages Num messages Class Total
Question 52 0.29714
Answer 62 0.36686
Broadcast 16 0.12121
Attachment Transmission 0 0
Planning Meeting Scheduling 18 0.28571
Planning Scheduling 2 0.07407
Planning 8 0.08695
Action Item 0 0
Technical Discussion 2 0.06451
Social Chat 4 0.11111
Num messages Num messages Class Total
Question 37 0.21142
Answer 42 0.24852
Broadcast 92 0.69697
Attachment Transmission 3 1
Planning Meeting Scheduling 24 0.38095
Planning Scheduling 13 0.48148
Planning 60 0.65217
Action Item 14 0.73684
Technical Discussion 17 0.54838
Social Chat 22 0.61111
18
Corpus Annotation AnalysisCategory by Class
Distribution-2/2
Class 3low to high
Class 4high to low
Num messages Num messages Class Total
Question 46 0.26285
Answer 40 0.23668
Broadcast 6 0.04545
Attachment Transmission 0 0
Planning Meeting Scheduling 4 0.06349
Planning Scheduling 5 0.18518
Planning 5 0.05434
Action Item 4 0.21052
Technical Discussion 11 0.35483
Social Chat 64 0.16666
Num messages Num messages Class Total
Question 40 0.22857
Answer 25 0.14972
Broadcast 18 0.13636
Attachment Transmission 0 0
Planning Meeting Scheduling 17 0.26984
Planning Scheduling 7 0.25925
Planning 19 0.20652
Action Item 1 0.05263
Technical Discussion 1 0.03225
Social Chat 2 0.11111
19
Next step Sharpening method

In determining interannotator agreement with
kappa, etc., two available pieces of information
are overlooked
Some annotators are better than others
Some messages are easier to label than others
By limiting the contribution of known poor
annotators and difficult messages, we gain
confidence in the final category assignment of
each message.
How do we rank annotators? Messages?

20
Sharpening Method (ctd.)

Ranking Annotators
Calculate kappa between each annotator and the
rest of the group.
Better annotators have a higher agreement with
the group
Ranking messages
Variance (or -plog(p)) of label vector summed
over annotators.
Messages with high variance are more consistently
annotated

21
Sharpening Method (ctd.)

How do we use these ranks?
Weight the annotators based on their rank.
Recompute the message matrix with weighted
annotator contributions.
Weight the messages based on their rank.
Recompute the kappa values with weighted message
contributions.
Repeat these steps until the weights change
beneath a threshold.

22
Summary

The ACM email corpus annotation raises two
problems.
By allowing annotators to assign a message one or
two labels, there is no clear way to calculate an
annotation statistic.
An augmentation to the kappa statistic is
proposed
Interannotator reliability is low (K lt .3)
Annotator reeducation and/or annotation material
redesign are most likely necessary.
Available annotated data can be used,
hypothetically, to improve category assignment.